Transfer Learning: Leveraging Pre-trained Models
Learn how to use pre-trained models to achieve great results with limited data through transfer learning.
Transfer Learning: Leveraging Pre-trained Models
Training a deep network from scratch needs millions of examples. Transfer learning lets you use knowledge from one task to solve another - even with limited data.
The Core Idea
A model trained on ImageNet (14 million images) learns general features: - Early layers: edges, textures, colors - Middle layers: shapes, patterns - Deep layers: object parts, complex structures
These features transfer to new tasks. You don't start from scratch - you start from knowledge.
Transfer Learning Strategies
**1. Feature Extraction:** Use pre-trained model as fixed feature extractor. Only train the final classifier.
**2. Fine-tuning:** Start with pre-trained weights, then train entire model (or parts) on new data with low learning rate.
Feature Extraction (Keras)
```python from tensorflow.keras.applications import ResNet50 from tensorflow.keras.layers import Dense, GlobalAveragePooling2D from tensorflow.keras.models import Model
Load pre-trained ResNet50 without top layers base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
Freeze base model base_model.trainable = False
Add custom classifier x = GlobalAveragePooling2D()(base_model.output) x = Dense(256, activation='relu')(x) output = Dense(num_classes, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=output)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) model.fit(X_train, y_train, epochs=10, batch_size=32) ```
Fine-tuning
```python # After training classifier, unfreeze some layers base_model.trainable = True
Freeze early layers, train later ones for layer in base_model.layers[:100]: layer.trainable = False
Recompile with lower learning rate model.compile( optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5), loss='categorical_crossentropy', metrics=['accuracy'] )
Continue training model.fit(X_train, y_train, epochs=10, batch_size=32) ```
Transfer Learning for NLP
```python from transformers import BertTokenizer, TFBertForSequenceClassification import tensorflow as tf
Load pre-trained BERT tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
Prepare data encodings = tokenizer(texts, truncation=True, padding=True, max_length=128, return_tensors='tf')
Fine-tune model.compile( optimizer=tf.keras.optimizers.Adam(learning_rate=2e-5), loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'] )
model.fit( dict(encodings), labels, epochs=3, batch_size=16 ) ```
Popular Pre-trained Models
| Domain | Models | |--------|--------| | Images | ResNet, VGG, EfficientNet, ViT | | Text | BERT, GPT, RoBERTa, T5 | | Audio | Wav2Vec, Whisper |
When to Use Which Strategy
| Scenario | Strategy | |----------|----------| | Small dataset, similar task | Feature extraction | | Medium dataset, similar task | Fine-tune top layers | | Large dataset, similar task | Fine-tune all layers | | Different task | Feature extraction + new head |
Key Takeaway
Transfer learning is almost always better than training from scratch. For images, use ImageNet pre-trained CNNs. For text, use pre-trained transformers. Start with feature extraction (frozen base), then fine-tune if you have enough data. This is how you get state-of-the-art results with limited resources.