AI7 min read
Meta-Learning (Learning to Learn)
Train models to learn quickly from few examples.
Dr. Patricia Moore
December 18, 2025
0.0k0
AI that learns how to learn.
What is Meta-Learning?
Training models to adapt quickly to new tasks.
Goal: Learn from just a few examples
Like learning your 5th language is easier than your 1st!
Few-Shot Learning
Learn from very few examples:
1-shot: 1 example per class
5-shot: 5 examples per class
Zero-shot: 0 examples (just description)
Why Meta-Learning?
Problem: Deep learning needs lots of data
Solution: Train on many tasks, adapt quickly to new task
Example Scenario
Training: Learn 1000 different tasks
Testing: New task with only 5 examples
Result: Model adapts quickly!
Model-Agnostic Meta-Learning (MAML)
Most popular meta-learning algorithm:
import torch
import torch.nn as nn
class MAML:
def __init__(self, model, inner_lr=0.01, meta_lr=0.001):
self.model = model
self.inner_lr = inner_lr
self.meta_optimizer = torch.optim.Adam(model.parameters(), lr=meta_lr)
def inner_loop(self, support_x, support_y):
"""Adapt to a single task"""
# Clone model
adapted_model = copy.deepcopy(self.model)
inner_optimizer = torch.optim.SGD(adapted_model.parameters(), lr=self.inner_lr)
# Few gradient steps on support set
for _ in range(5):
predictions = adapted_model(support_x)
loss = F.cross_entropy(predictions, support_y)
inner_optimizer.zero_grad()
loss.backward()
inner_optimizer.step()
return adapted_model
def outer_loop(self, tasks):
"""Update meta-parameters"""
meta_loss = 0
for task in tasks:
support_x, support_y, query_x, query_y = task
# Inner loop: adapt to task
adapted_model = self.inner_loop(support_x, support_y)
# Evaluate on query set
predictions = adapted_model(query_x)
loss = F.cross_entropy(predictions, query_y)
meta_loss += loss
# Update original model
self.meta_optimizer.zero_grad()
meta_loss.backward()
self.meta_optimizer.step()
return meta_loss
# Usage
model = SimpleCNN()
maml = MAML(model)
# Training
for epoch in range(100):
# Sample batch of tasks
tasks = sample_tasks(batch_size=32)
# Meta-update
loss = maml.outer_loop(tasks)
print(f"Epoch {epoch}, Loss: {loss}")
# Testing: Adapt to new task with few examples
adapted = maml.inner_loop(new_task_support_x, new_task_support_y)
accuracy = test(adapted, new_task_query_x, new_task_query_y)
Prototypical Networks
Learn prototypes (representations) for each class:
class PrototypicalNetwork(nn.Module):
def __init__(self, encoder):
super().__init__()
self.encoder = encoder
def forward(self, support_x, support_y, query_x):
# Encode all examples
support_features = self.encoder(support_x)
query_features = self.encoder(query_x)
# Calculate prototype for each class
unique_labels = torch.unique(support_y)
prototypes = []
for label in unique_labels:
# Average of all examples in class
class_examples = support_features[support_y == label]
prototype = class_examples.mean(dim=0)
prototypes.append(prototype)
prototypes = torch.stack(prototypes)
# Classify query based on nearest prototype
distances = torch.cdist(query_features, prototypes)
predictions = (-distances).softmax(dim=1)
return predictions
# Usage
encoder = SimpleCNN()
model = PrototypicalNetwork(encoder)
# 5-way 1-shot learning
support_x = ... # [5 classes × 1 example × features]
support_y = ... # [5]
query_x = ... # [test examples]
predictions = model(support_x, support_y, query_x)
Matching Networks
Attention-based few-shot learning:
class MatchingNetwork(nn.Module):
def __init__(self, encoder):
super().__init__()
self.encoder = encoder
self.attention = nn.MultiheadAttention(embed_dim=128, num_heads=1)
def forward(self, support_x, support_y, query_x):
# Encode
support_emb = self.encoder(support_x)
query_emb = self.encoder(query_x)
# Attention from query to support set
attended, weights = self.attention(
query_emb.unsqueeze(0),
support_emb.unsqueeze(0),
support_emb.unsqueeze(0)
)
# Weighted combination of support labels
predictions = weights @ support_y
return predictions
Omniglot Dataset
Standard meta-learning benchmark:
from torchvision.datasets import Omniglot
# 1623 characters from 50 alphabets
dataset = Omniglot(root='./data', download=True)
# Create N-way K-shot tasks
def create_task(N=5, K=1):
# Sample N classes
classes = np.random.choice(len(dataset.classes), N, replace=False)
support_x, support_y = [], []
query_x, query_y = [], []
for i, cls in enumerate(classes):
# Sample K examples for support
examples = sample_class_examples(cls, K+15)
support_x.append(examples[:K])
support_y.extend([i] * K)
# Rest for query
query_x.append(examples[K:])
query_y.extend([i] * 15)
return support_x, support_y, query_x, query_y
Meta-Learning for NLP
# Few-shot text classification
# Support set (few examples)
support_texts = [
"This product is amazing!", # Positive
"Terrible service", # Negative
]
support_labels = [1, 0]
# Query (new text to classify)
query_text = "Best purchase ever!"
# MAML or Prototypical Networks on BERT embeddings
embeddings = bert_encode(support_texts + [query_text])
model = PrototypicalNetwork(embedding_dim=768)
prediction = model(embeddings[:2], support_labels, embeddings[2:])
Applications
- Few-shot classification: New categories with few examples
- Personalization: Adapt to user with few interactions
- Robot learning: Learn new tasks quickly
- Drug discovery: Predict properties with limited data
Challenges
- Still requires many tasks for training
- Computational cost
- Task distribution shift
Remember
- Meta-learning = learning to learn
- Enables few-shot learning
- MAML most versatile
- Prototypical Networks simple and effective
#AI#Advanced#Meta-Learning