AI8 min read
Transfer Learning
Use pre-trained models for your tasks.
Dr. Patricia Moore
December 18, 2025
0.0k0
Reuse powerful models.
What is Transfer Learning?
Using models trained on huge datasets for your specific task.
Like learning piano after learning keyboard - transfer skills!
Why Transfer Learning?
Problems it solves:
- Don't have millions of images
- Can't afford weeks of training
- Limited GPU resources
Pre-trained Models
Vision:
- VGG16, ResNet50, InceptionV3
- Trained on ImageNet (1.4M images, 1000 classes)
Language:
- BERT, GPT, RoBERTa
- Trained on billions of words
Using Pre-trained Model
from tensorflow.keras.applications import VGG16
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
# Load pre-trained model (without top layer)
base_model = VGG16(weights='imagenet', include_top=False)
# Freeze base layers
for layer in base_model.layers:
layer.trainable = False
# Add custom layers
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(256, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x) # 10 classes
model = Model(inputs=base_model.input, outputs=predictions)
Fine-Tuning Strategy
Step 1: Train only top layers
# Freeze all base layers
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='adam', loss='categorical_crossentropy')
model.fit(X_train, y_train, epochs=5)
Step 2: Unfreeze some layers
# Unfreeze last 10 layers
for layer in base_model.layers[-10:]:
layer.trainable = True
# Use lower learning rate
model.compile(optimizer=Adam(lr=0.0001), loss='categorical_crossentropy')
model.fit(X_train, y_train, epochs=10)
Real Example - Medical Images
# Classify X-rays with only 1000 images
from tensorflow.keras.applications import ResNet50
base = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
# Freeze most layers
for layer in base.layers[:-20]:
layer.trainable = False
# Custom classifier
x = GlobalAveragePooling2D()(base.output)
x = Dense(128, activation='relu')(x)
x = Dropout(0.5)(x)
output = Dense(2, activation='softmax')(x) # Normal vs Pneumonia
model = Model(base.input, output)
model.compile(optimizer=Adam(1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
# Train on small dataset
model.fit(X_train, y_train, epochs=20, validation_data=(X_val, y_val))
Transfer Learning for NLP
from transformers import BertTokenizer, TFBertForSequenceClassification
import tensorflow as tf
# Load pre-trained BERT
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenize text
texts = ["Great product!", "Terrible service"]
encodings = tokenizer(texts, truncation=True, padding=True, return_tensors='tf')
# Fine-tune on your data
model.compile(
optimizer=tf.keras.optimizers.Adam(1e-5),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy']
)
model.fit(encodings['input_ids'], labels, epochs=3)
Feature Extraction vs Fine-Tuning
Feature Extraction:
- Freeze all layers
- Use as feature extractor
- Fast, less data needed
Fine-Tuning:
- Unfreeze some layers
- Adapt to your data
- Better performance, needs more data
Domain Adaptation
When source and target differ:
# Trained on photos, using on sketches
# Use domain adaptation techniques
from tensorflow.keras.layers import Lambda
# Gradient reversal for domain adaptation
def gradient_reversal(x):
return x * -1
# Add domain classifier
domain_output = Lambda(gradient_reversal)(shared_features)
domain_output = Dense(1, activation='sigmoid')(domain_output)
Best Practices
- Start with frozen layers
- Use small learning rate when fine-tuning
- Augment data heavily
- Monitor validation loss closely
- Try different pre-trained models
Remember
- Transfer learning saves time and resources
- Always start with pre-trained models
- Fine-tune carefully with low learning rate
- Works amazingly well with small datasets
#AI#Advanced#Transfer Learning