AI7 min read

Meta-Learning (Learning to Learn)

Train models to learn quickly from few examples.

Dr. Patricia Moore
December 18, 2025
0.0k0

AI that learns how to learn.

What is Meta-Learning?

Training models to adapt quickly to new tasks.

Goal: Learn from just a few examples

Like learning your 5th language is easier than your 1st!

Few-Shot Learning

Learn from very few examples:

1-shot: 1 example per class
5-shot: 5 examples per class
Zero-shot: 0 examples (just description)

Why Meta-Learning?

Problem: Deep learning needs lots of data

Solution: Train on many tasks, adapt quickly to new task

Example Scenario

Training: Learn 1000 different tasks
Testing: New task with only 5 examples
Result: Model adapts quickly!

Model-Agnostic Meta-Learning (MAML)

Most popular meta-learning algorithm:

import torch
import torch.nn as nn

class MAML:
    def __init__(self, model, inner_lr=0.01, meta_lr=0.001):
        self.model = model
        self.inner_lr = inner_lr
        self.meta_optimizer = torch.optim.Adam(model.parameters(), lr=meta_lr)
    
    def inner_loop(self, support_x, support_y):
        """Adapt to a single task"""
        # Clone model
        adapted_model = copy.deepcopy(self.model)
        inner_optimizer = torch.optim.SGD(adapted_model.parameters(), lr=self.inner_lr)
        
        # Few gradient steps on support set
        for _ in range(5):
            predictions = adapted_model(support_x)
            loss = F.cross_entropy(predictions, support_y)
            
            inner_optimizer.zero_grad()
            loss.backward()
            inner_optimizer.step()
        
        return adapted_model
    
    def outer_loop(self, tasks):
        """Update meta-parameters"""
        meta_loss = 0
        
        for task in tasks:
            support_x, support_y, query_x, query_y = task
            
            # Inner loop: adapt to task
            adapted_model = self.inner_loop(support_x, support_y)
            
            # Evaluate on query set
            predictions = adapted_model(query_x)
            loss = F.cross_entropy(predictions, query_y)
            meta_loss += loss
        
        # Update original model
        self.meta_optimizer.zero_grad()
        meta_loss.backward()
        self.meta_optimizer.step()
        
        return meta_loss

# Usage
model = SimpleCNN()
maml = MAML(model)

# Training
for epoch in range(100):
    # Sample batch of tasks
    tasks = sample_tasks(batch_size=32)
    
    # Meta-update
    loss = maml.outer_loop(tasks)
    print(f"Epoch {epoch}, Loss: {loss}")

# Testing: Adapt to new task with few examples
adapted = maml.inner_loop(new_task_support_x, new_task_support_y)
accuracy = test(adapted, new_task_query_x, new_task_query_y)

Prototypical Networks

Learn prototypes (representations) for each class:

class PrototypicalNetwork(nn.Module):
    def __init__(self, encoder):
        super().__init__()
        self.encoder = encoder
    
    def forward(self, support_x, support_y, query_x):
        # Encode all examples
        support_features = self.encoder(support_x)
        query_features = self.encoder(query_x)
        
        # Calculate prototype for each class
        unique_labels = torch.unique(support_y)
        prototypes = []
        
        for label in unique_labels:
            # Average of all examples in class
            class_examples = support_features[support_y == label]
            prototype = class_examples.mean(dim=0)
            prototypes.append(prototype)
        
        prototypes = torch.stack(prototypes)
        
        # Classify query based on nearest prototype
        distances = torch.cdist(query_features, prototypes)
        predictions = (-distances).softmax(dim=1)
        
        return predictions

# Usage
encoder = SimpleCNN()
model = PrototypicalNetwork(encoder)

# 5-way 1-shot learning
support_x = ...  # [5 classes × 1 example × features]
support_y = ...  # [5]
query_x = ...    # [test examples]

predictions = model(support_x, support_y, query_x)

Matching Networks

Attention-based few-shot learning:

class MatchingNetwork(nn.Module):
    def __init__(self, encoder):
        super().__init__()
        self.encoder = encoder
        self.attention = nn.MultiheadAttention(embed_dim=128, num_heads=1)
    
    def forward(self, support_x, support_y, query_x):
        # Encode
        support_emb = self.encoder(support_x)
        query_emb = self.encoder(query_x)
        
        # Attention from query to support set
        attended, weights = self.attention(
            query_emb.unsqueeze(0),
            support_emb.unsqueeze(0),
            support_emb.unsqueeze(0)
        )
        
        # Weighted combination of support labels
        predictions = weights @ support_y
        
        return predictions

Omniglot Dataset

Standard meta-learning benchmark:

from torchvision.datasets import Omniglot

# 1623 characters from 50 alphabets
dataset = Omniglot(root='./data', download=True)

# Create N-way K-shot tasks
def create_task(N=5, K=1):
    # Sample N classes
    classes = np.random.choice(len(dataset.classes), N, replace=False)
    
    support_x, support_y = [], []
    query_x, query_y = [], []
    
    for i, cls in enumerate(classes):
        # Sample K examples for support
        examples = sample_class_examples(cls, K+15)
        support_x.append(examples[:K])
        support_y.extend([i] * K)
        
        # Rest for query
        query_x.append(examples[K:])
        query_y.extend([i] * 15)
    
    return support_x, support_y, query_x, query_y

Meta-Learning for NLP

# Few-shot text classification

# Support set (few examples)
support_texts = [
    "This product is amazing!",  # Positive
    "Terrible service",           # Negative
]
support_labels = [1, 0]

# Query (new text to classify)
query_text = "Best purchase ever!"

# MAML or Prototypical Networks on BERT embeddings
embeddings = bert_encode(support_texts + [query_text])
model = PrototypicalNetwork(embedding_dim=768)
prediction = model(embeddings[:2], support_labels, embeddings[2:])

Applications

  • Few-shot classification: New categories with few examples
  • Personalization: Adapt to user with few interactions
  • Robot learning: Learn new tasks quickly
  • Drug discovery: Predict properties with limited data

Challenges

  • Still requires many tasks for training
  • Computational cost
  • Task distribution shift

Remember

  • Meta-learning = learning to learn
  • Enables few-shot learning
  • MAML most versatile
  • Prototypical Networks simple and effective
#AI#Advanced#Meta-Learning