Transfer Learning: Leveraging Pre-trained Models
Learn how to use pre-trained models to achieve great results with limited data through transfer learning.
Transfer Learning: Leveraging Pre-trained Models
Training a deep network from scratch needs millions of examples. Transfer learning lets you use knowledge from one task to solve another - even with limited data.
The Core Idea
A model trained on ImageNet (14 million images) learns general features:
- Early layers: edges, textures, colors
- Middle layers: shapes, patterns
- Deep layers: object parts, complex structures
These features transfer to new tasks. You don't start from scratch - you start from knowledge.
Transfer Learning Strategies
1. Feature Extraction:
Use pre-trained model as fixed feature extractor. Only train the final classifier.
2. Fine-tuning:
Start with pre-trained weights, then train entire model (or parts) on new data with low learning rate.
Feature Extraction (Keras)
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
# Load pre-trained ResNet50 without top layers
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# Freeze base model
base_model.trainable = False
# Add custom classifier
x = GlobalAveragePooling2D()(base_model.output)
x = Dense(256, activation='relu')(x)
output = Dense(num_classes, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=output)
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, batch_size=32)
Fine-tuning
# After training classifier, unfreeze some layers
base_model.trainable = True
# Freeze early layers, train later ones
for layer in base_model.layers[:100]:
layer.trainable = False
# Recompile with lower learning rate
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
loss='categorical_crossentropy',
metrics=['accuracy']
)
# Continue training
model.fit(X_train, y_train, epochs=10, batch_size=32)
Transfer Learning for NLP
from transformers import BertTokenizer, TFBertForSequenceClassification
import tensorflow as tf
# Load pre-trained BERT
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
# Prepare data
encodings = tokenizer(texts, truncation=True, padding=True, max_length=128, return_tensors='tf')
# Fine-tune
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=2e-5),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy']
)
model.fit(
dict(encodings),
labels,
epochs=3,
batch_size=16
)
Popular Pre-trained Models
| Domain | Models |
|---|---|
| Images | ResNet, VGG, EfficientNet, ViT |
| Text | BERT, GPT, RoBERTa, T5 |
| Audio | Wav2Vec, Whisper |
When to Use Which Strategy
| Scenario | Strategy |
|---|---|
| Small dataset, similar task | Feature extraction |
| Medium dataset, similar task | Fine-tune top layers |
| Large dataset, similar task | Fine-tune all layers |
| Different task | Feature extraction + new head |
Key Takeaway
Transfer learning is almost always better than training from scratch. For images, use ImageNet pre-trained CNNs. For text, use pre-trained transformers. Start with feature extraction (frozen base), then fine-tune if you have enough data. This is how you get state-of-the-art results with limited resources.