Meta-Learning (Learning to Learn)
Train models to learn quickly from few examples.
AI that learns how to learn.
What is Meta-Learning?
Training models to adapt quickly to new tasks.
**Goal**: Learn from just a few examples
Like learning your 5th language is easier than your 1st!
Few-Shot Learning
Learn from very few examples:
**1-shot**: 1 example per class **5-shot**: 5 examples per class **Zero-shot**: 0 examples (just description)
Why Meta-Learning?
**Problem**: Deep learning needs lots of data
**Solution**: Train on many tasks, adapt quickly to new task
Example Scenario
**Training**: Learn 1000 different tasks **Testing**: New task with only 5 examples **Result**: Model adapts quickly!
Model-Agnostic Meta-Learning (MAML)
Most popular meta-learning algorithm:
```python import torch import torch.nn as nn
class MAML: def __init__(self, model, inner_lr=0.01, meta_lr=0.001): self.model = model self.inner_lr = inner_lr self.meta_optimizer = torch.optim.Adam(model.parameters(), lr=meta_lr) def inner_loop(self, support_x, support_y): """Adapt to a single task""" # Clone model adapted_model = copy.deepcopy(self.model) inner_optimizer = torch.optim.SGD(adapted_model.parameters(), lr=self.inner_lr) # Few gradient steps on support set for _ in range(5): predictions = adapted_model(support_x) loss = F.cross_entropy(predictions, support_y) inner_optimizer.zero_grad() loss.backward() inner_optimizer.step() return adapted_model def outer_loop(self, tasks): """Update meta-parameters""" meta_loss = 0 for task in tasks: support_x, support_y, query_x, query_y = task # Inner loop: adapt to task adapted_model = self.inner_loop(support_x, support_y) # Evaluate on query set predictions = adapted_model(query_x) loss = F.cross_entropy(predictions, query_y) meta_loss += loss # Update original model self.meta_optimizer.zero_grad() meta_loss.backward() self.meta_optimizer.step() return meta_loss
Usage model = SimpleCNN() maml = MAML(model)
Training for epoch in range(100): # Sample batch of tasks tasks = sample_tasks(batch_size=32) # Meta-update loss = maml.outer_loop(tasks) print(f"Epoch {epoch}, Loss: {loss}")
Testing: Adapt to new task with few examples adapted = maml.inner_loop(new_task_support_x, new_task_support_y) accuracy = test(adapted, new_task_query_x, new_task_query_y) ```
Prototypical Networks
Learn prototypes (representations) for each class:
```python class PrototypicalNetwork(nn.Module): def __init__(self, encoder): super().__init__() self.encoder = encoder def forward(self, support_x, support_y, query_x): # Encode all examples support_features = self.encoder(support_x) query_features = self.encoder(query_x) # Calculate prototype for each class unique_labels = torch.unique(support_y) prototypes = [] for label in unique_labels: # Average of all examples in class class_examples = support_features[support_y == label] prototype = class_examples.mean(dim=0) prototypes.append(prototype) prototypes = torch.stack(prototypes) # Classify query based on nearest prototype distances = torch.cdist(query_features, prototypes) predictions = (-distances).softmax(dim=1) return predictions
Usage encoder = SimpleCNN() model = PrototypicalNetwork(encoder)
5-way 1-shot learning support_x = ... # [5 classes × 1 example × features] support_y = ... # [5] query_x = ... # [test examples]
predictions = model(support_x, support_y, query_x) ```
Matching Networks
Attention-based few-shot learning:
```python class MatchingNetwork(nn.Module): def __init__(self, encoder): super().__init__() self.encoder = encoder self.attention = nn.MultiheadAttention(embed_dim=128, num_heads=1) def forward(self, support_x, support_y, query_x): # Encode support_emb = self.encoder(support_x) query_emb = self.encoder(query_x) # Attention from query to support set attended, weights = self.attention( query_emb.unsqueeze(0), support_emb.unsqueeze(0), support_emb.unsqueeze(0) ) # Weighted combination of support labels predictions = weights @ support_y return predictions ```
Omniglot Dataset
Standard meta-learning benchmark:
```python from torchvision.datasets import Omniglot
1623 characters from 50 alphabets dataset = Omniglot(root='./data', download=True)
Create N-way K-shot tasks def create_task(N=5, K=1): # Sample N classes classes = np.random.choice(len(dataset.classes), N, replace=False) support_x, support_y = [], [] query_x, query_y = [], [] for i, cls in enumerate(classes): # Sample K examples for support examples = sample_class_examples(cls, K+15) support_x.append(examples[:K]) support_y.extend([i] * K) # Rest for query query_x.append(examples[K:]) query_y.extend([i] * 15) return support_x, support_y, query_x, query_y ```
Meta-Learning for NLP
```python # Few-shot text classification
Support set (few examples) support_texts = [ "This product is amazing!", # Positive "Terrible service", # Negative ] support_labels = [1, 0]
Query (new text to classify) query_text = "Best purchase ever!"
MAML or Prototypical Networks on BERT embeddings embeddings = bert_encode(support_texts + [query_text]) model = PrototypicalNetwork(embedding_dim=768) prediction = model(embeddings[:2], support_labels, embeddings[2:]) ```
Applications
- **Few-shot classification**: New categories with few examples - **Personalization**: Adapt to user with few interactions - **Robot learning**: Learn new tasks quickly - **Drug discovery**: Predict properties with limited data
Challenges
- Still requires many tasks for training - Computational cost - Task distribution shift
Remember
- Meta-learning = learning to learn - Enables few-shot learning - MAML most versatile - Prototypical Networks simple and effective