ML10 min read

Transfer Learning: Leveraging Pre-trained Models

Learn how to use pre-trained models to achieve great results with limited data through transfer learning.

Sarah Chen
December 19, 2025
0.0k0

Transfer Learning: Leveraging Pre-trained Models

Training a deep network from scratch needs millions of examples. Transfer learning lets you use knowledge from one task to solve another - even with limited data.

The Core Idea

A model trained on ImageNet (14 million images) learns general features:

  • Early layers: edges, textures, colors
  • Middle layers: shapes, patterns
  • Deep layers: object parts, complex structures

These features transfer to new tasks. You don't start from scratch - you start from knowledge.

Transfer Learning Strategies

1. Feature Extraction:
Use pre-trained model as fixed feature extractor. Only train the final classifier.

2. Fine-tuning:
Start with pre-trained weights, then train entire model (or parts) on new data with low learning rate.

Feature Extraction (Keras)

from tensorflow.keras.applications import ResNet50
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model

# Load pre-trained ResNet50 without top layers
base_model = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Freeze base model
base_model.trainable = False

# Add custom classifier
x = GlobalAveragePooling2D()(base_model.output)
x = Dense(256, activation='relu')(x)
output = Dense(num_classes, activation='softmax')(x)

model = Model(inputs=base_model.input, outputs=output)

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=10, batch_size=32)

Fine-tuning

# After training classifier, unfreeze some layers
base_model.trainable = True

# Freeze early layers, train later ones
for layer in base_model.layers[:100]:
    layer.trainable = False

# Recompile with lower learning rate
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-5),
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

# Continue training
model.fit(X_train, y_train, epochs=10, batch_size=32)

Transfer Learning for NLP

from transformers import BertTokenizer, TFBertForSequenceClassification
import tensorflow as tf

# Load pre-trained BERT
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

# Prepare data
encodings = tokenizer(texts, truncation=True, padding=True, max_length=128, return_tensors='tf')

# Fine-tune
model.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=2e-5),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)

model.fit(
    dict(encodings),
    labels,
    epochs=3,
    batch_size=16
)

Popular Pre-trained Models

Domain Models
Images ResNet, VGG, EfficientNet, ViT
Text BERT, GPT, RoBERTa, T5
Audio Wav2Vec, Whisper

When to Use Which Strategy

Scenario Strategy
Small dataset, similar task Feature extraction
Medium dataset, similar task Fine-tune top layers
Large dataset, similar task Fine-tune all layers
Different task Feature extraction + new head

Key Takeaway

Transfer learning is almost always better than training from scratch. For images, use ImageNet pre-trained CNNs. For text, use pre-trained transformers. Start with feature extraction (frozen base), then fine-tune if you have enough data. This is how you get state-of-the-art results with limited resources.

#Machine Learning#Deep Learning#Transfer Learning#Advanced