AI8 min read

Transfer Learning

Use pre-trained models for your tasks.

Dr. Patricia Moore
December 18, 2025
0.0k0

Reuse powerful models.

What is Transfer Learning?

Using models trained on huge datasets for your specific task.

Like learning piano after learning keyboard - transfer skills!

Why Transfer Learning?

**Problems it solves**: - Don't have millions of images - Can't afford weeks of training - Limited GPU resources

Pre-trained Models

**Vision**: - VGG16, ResNet50, InceptionV3 - Trained on ImageNet (1.4M images, 1000 classes)

**Language**: - BERT, GPT, RoBERTa - Trained on billions of words

Using Pre-trained Model

```python from tensorflow.keras.applications import VGG16 from tensorflow.keras.models import Model from tensorflow.keras.layers import Dense, GlobalAveragePooling2D

Load pre-trained model (without top layer) base_model = VGG16(weights='imagenet', include_top=False)

Freeze base layers for layer in base_model.layers: layer.trainable = False

Add custom layers x = base_model.output x = GlobalAveragePooling2D()(x) x = Dense(256, activation='relu')(x) predictions = Dense(10, activation='softmax')(x) # 10 classes

model = Model(inputs=base_model.input, outputs=predictions) ```

Fine-Tuning Strategy

**Step 1**: Train only top layers ```python # Freeze all base layers for layer in base_model.layers: layer.trainable = False

model.compile(optimizer='adam', loss='categorical_crossentropy') model.fit(X_train, y_train, epochs=5) ```

**Step 2**: Unfreeze some layers ```python # Unfreeze last 10 layers for layer in base_model.layers[-10:]: layer.trainable = True

Use lower learning rate model.compile(optimizer=Adam(lr=0.0001), loss='categorical_crossentropy') model.fit(X_train, y_train, epochs=10) ```

Real Example - Medical Images

```python # Classify X-rays with only 1000 images from tensorflow.keras.applications import ResNet50

base = ResNet50(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

Freeze most layers for layer in base.layers[:-20]: layer.trainable = False

Custom classifier x = GlobalAveragePooling2D()(base.output) x = Dense(128, activation='relu')(x) x = Dropout(0.5)(x) output = Dense(2, activation='softmax')(x) # Normal vs Pneumonia

model = Model(base.input, output) model.compile(optimizer=Adam(1e-4), loss='categorical_crossentropy', metrics=['accuracy'])

Train on small dataset model.fit(X_train, y_train, epochs=20, validation_data=(X_val, y_val)) ```

Transfer Learning for NLP

```python from transformers import BertTokenizer, TFBertForSequenceClassification import tensorflow as tf

Load pre-trained BERT model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

Tokenize text texts = ["Great product!", "Terrible service"] encodings = tokenizer(texts, truncation=True, padding=True, return_tensors='tf')

Fine-tune on your data model.compile( optimizer=tf.keras.optimizers.Adam(1e-5), loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=['accuracy'] )

model.fit(encodings['input_ids'], labels, epochs=3) ```

Feature Extraction vs Fine-Tuning

**Feature Extraction**: - Freeze all layers - Use as feature extractor - Fast, less data needed

**Fine-Tuning**: - Unfreeze some layers - Adapt to your data - Better performance, needs more data

Domain Adaptation

When source and target differ:

```python # Trained on photos, using on sketches # Use domain adaptation techniques

from tensorflow.keras.layers import Lambda

Gradient reversal for domain adaptation def gradient_reversal(x): return x * -1

Add domain classifier domain_output = Lambda(gradient_reversal)(shared_features) domain_output = Dense(1, activation='sigmoid')(domain_output) ```

Best Practices

1. Start with frozen layers 2. Use small learning rate when fine-tuning 3. Augment data heavily 4. Monitor validation loss closely 5. Try different pre-trained models

Remember

- Transfer learning saves time and resources - Always start with pre-trained models - Fine-tune carefully with low learning rate - Works amazingly well with small datasets

#AI#Advanced#Transfer Learning