AI7 min read

Meta-Learning (Learning to Learn)

Train models to learn quickly from few examples.

Dr. Patricia Moore
December 18, 2025
0.0k0

AI that learns how to learn.

What is Meta-Learning?

Training models to adapt quickly to new tasks.

**Goal**: Learn from just a few examples

Like learning your 5th language is easier than your 1st!

Few-Shot Learning

Learn from very few examples:

**1-shot**: 1 example per class **5-shot**: 5 examples per class **Zero-shot**: 0 examples (just description)

Why Meta-Learning?

**Problem**: Deep learning needs lots of data

**Solution**: Train on many tasks, adapt quickly to new task

Example Scenario

**Training**: Learn 1000 different tasks **Testing**: New task with only 5 examples **Result**: Model adapts quickly!

Model-Agnostic Meta-Learning (MAML)

Most popular meta-learning algorithm:

```python import torch import torch.nn as nn

class MAML: def __init__(self, model, inner_lr=0.01, meta_lr=0.001): self.model = model self.inner_lr = inner_lr self.meta_optimizer = torch.optim.Adam(model.parameters(), lr=meta_lr) def inner_loop(self, support_x, support_y): """Adapt to a single task""" # Clone model adapted_model = copy.deepcopy(self.model) inner_optimizer = torch.optim.SGD(adapted_model.parameters(), lr=self.inner_lr) # Few gradient steps on support set for _ in range(5): predictions = adapted_model(support_x) loss = F.cross_entropy(predictions, support_y) inner_optimizer.zero_grad() loss.backward() inner_optimizer.step() return adapted_model def outer_loop(self, tasks): """Update meta-parameters""" meta_loss = 0 for task in tasks: support_x, support_y, query_x, query_y = task # Inner loop: adapt to task adapted_model = self.inner_loop(support_x, support_y) # Evaluate on query set predictions = adapted_model(query_x) loss = F.cross_entropy(predictions, query_y) meta_loss += loss # Update original model self.meta_optimizer.zero_grad() meta_loss.backward() self.meta_optimizer.step() return meta_loss

Usage model = SimpleCNN() maml = MAML(model)

Training for epoch in range(100): # Sample batch of tasks tasks = sample_tasks(batch_size=32) # Meta-update loss = maml.outer_loop(tasks) print(f"Epoch {epoch}, Loss: {loss}")

Testing: Adapt to new task with few examples adapted = maml.inner_loop(new_task_support_x, new_task_support_y) accuracy = test(adapted, new_task_query_x, new_task_query_y) ```

Prototypical Networks

Learn prototypes (representations) for each class:

```python class PrototypicalNetwork(nn.Module): def __init__(self, encoder): super().__init__() self.encoder = encoder def forward(self, support_x, support_y, query_x): # Encode all examples support_features = self.encoder(support_x) query_features = self.encoder(query_x) # Calculate prototype for each class unique_labels = torch.unique(support_y) prototypes = [] for label in unique_labels: # Average of all examples in class class_examples = support_features[support_y == label] prototype = class_examples.mean(dim=0) prototypes.append(prototype) prototypes = torch.stack(prototypes) # Classify query based on nearest prototype distances = torch.cdist(query_features, prototypes) predictions = (-distances).softmax(dim=1) return predictions

Usage encoder = SimpleCNN() model = PrototypicalNetwork(encoder)

5-way 1-shot learning support_x = ... # [5 classes × 1 example × features] support_y = ... # [5] query_x = ... # [test examples]

predictions = model(support_x, support_y, query_x) ```

Matching Networks

Attention-based few-shot learning:

```python class MatchingNetwork(nn.Module): def __init__(self, encoder): super().__init__() self.encoder = encoder self.attention = nn.MultiheadAttention(embed_dim=128, num_heads=1) def forward(self, support_x, support_y, query_x): # Encode support_emb = self.encoder(support_x) query_emb = self.encoder(query_x) # Attention from query to support set attended, weights = self.attention( query_emb.unsqueeze(0), support_emb.unsqueeze(0), support_emb.unsqueeze(0) ) # Weighted combination of support labels predictions = weights @ support_y return predictions ```

Omniglot Dataset

Standard meta-learning benchmark:

```python from torchvision.datasets import Omniglot

1623 characters from 50 alphabets dataset = Omniglot(root='./data', download=True)

Create N-way K-shot tasks def create_task(N=5, K=1): # Sample N classes classes = np.random.choice(len(dataset.classes), N, replace=False) support_x, support_y = [], [] query_x, query_y = [], [] for i, cls in enumerate(classes): # Sample K examples for support examples = sample_class_examples(cls, K+15) support_x.append(examples[:K]) support_y.extend([i] * K) # Rest for query query_x.append(examples[K:]) query_y.extend([i] * 15) return support_x, support_y, query_x, query_y ```

Meta-Learning for NLP

```python # Few-shot text classification

Support set (few examples) support_texts = [ "This product is amazing!", # Positive "Terrible service", # Negative ] support_labels = [1, 0]

Query (new text to classify) query_text = "Best purchase ever!"

MAML or Prototypical Networks on BERT embeddings embeddings = bert_encode(support_texts + [query_text]) model = PrototypicalNetwork(embedding_dim=768) prediction = model(embeddings[:2], support_labels, embeddings[2:]) ```

Applications

- **Few-shot classification**: New categories with few examples - **Personalization**: Adapt to user with few interactions - **Robot learning**: Learn new tasks quickly - **Drug discovery**: Predict properties with limited data

Challenges

- Still requires many tasks for training - Computational cost - Task distribution shift

Remember

- Meta-learning = learning to learn - Enables few-shot learning - MAML most versatile - Prototypical Networks simple and effective

#AI#Advanced#Meta-Learning