ML10 min read

Transfer Learning: Leveraging Pre-trained Models

Learn how to use pre-trained models to achieve great results with limited data through transfer learning.

Sarah Chen
December 19, 2025
0.0k0

Transfer Learning: Leveraging Pre-trained Models

Training a deep network from scratch needs millions of examples. Transfer learning lets you use knowledge from one task to solve another - even with limited data.

The Core Idea

A model trained on ImageNet (14 million images) learns general features: - Early layers: edges, textures, colors - Middle layers: shapes, patterns - Deep layers: object parts, complex structures

These features transfer to new tasks. You don't start from scratch - you start from knowledge.

Transfer Learning Strategies

**1. Feature Extraction:** Use pre-trained model as fixed feature extractor. Only train the final classifier.

**2. Fine-tuning:** Start with pre-trained weights, then train entire model (or parts) on new data with low learning rate.

Feature Extraction (Keras)

```python from tensorflow.keras.applications import ResNet50 from tensorflow.keras.layers import Dense, GlobalAveragePooling2D from tensorflow.keras.models import Model

Load pre-trained ResNet50 without top layers base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

Freeze base model base_model.trainable = False

Add custom classifier x = GlobalAveragePooling2D()(base_model.output) x = Dense(256, activation='relu')(x) output = Dense(num_classes, activation='softmax')(x)

model = Model(inputs=base_model.input, outputs=output)

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) model.fit(X_train, y_train, epochs=10, batch_size=32) ```

Fine-tuning

```python # After training classifier, unfreeze some layers base_model.trainable = True

Freeze early layers, train later ones for layer in base_model.layers[:100]: layer.trainable = False

Recompile with lower learning rate model.compile( optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5), loss='categorical_crossentropy', metrics=['accuracy'] )

Continue training model.fit(X_train, y_train, epochs=10, batch_size=32) ```

Transfer Learning for NLP

```python from transformers import BertTokenizer, TFBertForSequenceClassification import tensorflow as tf

Load pre-trained BERT tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

Prepare data encodings = tokenizer(texts, truncation=True, padding=True, max_length=128, return_tensors='tf')

Fine-tune model.compile( optimizer=tf.keras.optimizers.Adam(learning_rate=2e-5), loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'] )

model.fit( dict(encodings), labels, epochs=3, batch_size=16 ) ```

Popular Pre-trained Models

| Domain | Models | |--------|--------| | Images | ResNet, VGG, EfficientNet, ViT | | Text | BERT, GPT, RoBERTa, T5 | | Audio | Wav2Vec, Whisper |

When to Use Which Strategy

| Scenario | Strategy | |----------|----------| | Small dataset, similar task | Feature extraction | | Medium dataset, similar task | Fine-tune top layers | | Large dataset, similar task | Fine-tune all layers | | Different task | Feature extraction + new head |

Key Takeaway

Transfer learning is almost always better than training from scratch. For images, use ImageNet pre-trained CNNs. For text, use pre-trained transformers. Start with feature extraction (frozen base), then fine-tune if you have enough data. This is how you get state-of-the-art results with limited resources.

#Machine Learning#Deep Learning#Transfer Learning#Advanced