Transfer Learning
Use pre-trained models for your tasks.
Reuse powerful models.
What is Transfer Learning?
Using models trained on huge datasets for your specific task.
Like learning piano after learning keyboard - transfer skills!
Why Transfer Learning?
**Problems it solves**: - Don't have millions of images - Can't afford weeks of training - Limited GPU resources
Pre-trained Models
**Vision**: - VGG16, ResNet50, InceptionV3 - Trained on ImageNet (1.4M images, 1000 classes)
**Language**: - BERT, GPT, RoBERTa - Trained on billions of words
Using Pre-trained Model
```python from tensorflow.keras.applications import VGG16 from tensorflow.keras.models import Model from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
Load pre-trained model (without top layer) base_model = VGG16(weights='imagenet', include_top=False)
Freeze base layers for layer in base_model.layers: layer.trainable = False
Add custom layers x = base_model.output x = GlobalAveragePooling2D()(x) x = Dense(256, activation='relu')(x) predictions = Dense(10, activation='softmax')(x) # 10 classes
model = Model(inputs=base_model.input, outputs=predictions) ```
Fine-Tuning Strategy
**Step 1**: Train only top layers ```python # Freeze all base layers for layer in base_model.layers: layer.trainable = False
model.compile(optimizer='adam', loss='categorical_crossentropy') model.fit(X_train, y_train, epochs=5) ```
**Step 2**: Unfreeze some layers ```python # Unfreeze last 10 layers for layer in base_model.layers[-10:]: layer.trainable = True
Use lower learning rate model.compile(optimizer=Adam(lr=0.0001), loss='categorical_crossentropy') model.fit(X_train, y_train, epochs=10) ```
Real Example - Medical Images
```python # Classify X-rays with only 1000 images from tensorflow.keras.applications import ResNet50
base = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))
Freeze most layers for layer in base.layers[:-20]: layer.trainable = False
Custom classifier x = GlobalAveragePooling2D()(base.output) x = Dense(128, activation='relu')(x) x = Dropout(0.5)(x) output = Dense(2, activation='softmax')(x) # Normal vs Pneumonia
model = Model(base.input, output) model.compile(optimizer=Adam(1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
Train on small dataset model.fit(X_train, y_train, epochs=20, validation_data=(X_val, y_val)) ```
Transfer Learning for NLP
```python from transformers import BertTokenizer, TFBertForSequenceClassification import tensorflow as tf
Load pre-trained BERT model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
Tokenize text texts = ["Great product!", "Terrible service"] encodings = tokenizer(texts, truncation=True, padding=True, return_tensors='tf')
Fine-tune on your data model.compile( optimizer=tf.keras.optimizers.Adam(1e-5), loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'] )
model.fit(encodings['input_ids'], labels, epochs=3) ```
Feature Extraction vs Fine-Tuning
**Feature Extraction**: - Freeze all layers - Use as feature extractor - Fast, less data needed
**Fine-Tuning**: - Unfreeze some layers - Adapt to your data - Better performance, needs more data
Domain Adaptation
When source and target differ:
```python # Trained on photos, using on sketches # Use domain adaptation techniques
from tensorflow.keras.layers import Lambda
Gradient reversal for domain adaptation def gradient_reversal(x): return x * -1
Add domain classifier domain_output = Lambda(gradient_reversal)(shared_features) domain_output = Dense(1, activation='sigmoid')(domain_output) ```
Best Practices
1. Start with frozen layers 2. Use small learning rate when fine-tuning 3. Augment data heavily 4. Monitor validation loss closely 5. Try different pre-trained models
Remember
- Transfer learning saves time and resources - Always start with pre-trained models - Fine-tune carefully with low learning rate - Works amazingly well with small datasets