AI7 min read

BERT and Pre-trained Language Models

Use BERT for NLP tasks.

Dr. James Rodriguez
December 18, 2025
0.0k0

Powerful pre-trained language understanding.

What is BERT?

Bidirectional Encoder Representations from Transformers.

Key: Reads text both ways (left-to-right and right-to-left)!

Why BERT?

Pre-trained: Learned from millions of books
Context-aware: Understands word meaning from context
Fine-tunable: Adapt to your specific task

Using BERT

from transformers import BertTokenizer, BertForSequenceClassification
import torch

# Load pre-trained model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained(
    'bert-base-uncased',
    num_labels=2  # Binary classification
)

# Tokenize text
text = "This movie is fantastic!"
inputs = tokenizer(
    text,
    padding=True,
    truncation=True,
    return_tensors='pt'
)

# Get prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    print(predictions)

Fine-tuning BERT

from transformers import BertForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset

# Load dataset
dataset = load_dataset('imdb')

# Tokenize dataset
def tokenize_function(examples):
    return tokenizer(
        examples['text'],
        padding='max_length',
        truncation=True
    )

tokenized_datasets = dataset.map(tokenize_function, batched=True)

# Training arguments
training_args = TrainingArguments(
    output_dir='./results',
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir='./logs',
)

# Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets['train'],
    eval_dataset=tokenized_datasets['test']
)

# Train
trainer.train()

BERT for Different Tasks

Text Classification

from transformers import pipeline

classifier = pipeline('sentiment-analysis')
result = classifier("I love this product!")
print(result)

Named Entity Recognition

ner = pipeline('ner')
result = ner("John Smith lives in New York")
print(result)

Question Answering

qa = pipeline('question-answering')
result = qa(
    question="Where does John live?",
    context="John Smith lives in New York and works at Google."
)
print(result)

BERT Variants

RoBERTa: Robustly optimized BERT
ALBERT: Lighter BERT
DistilBERT: Faster, smaller BERT
ELECTRA: More efficient pre-training
DeBERTa: Improved attention mechanism

Best Practices

Batch size: Use smaller batches (8-16)
Learning rate: Start with 2e-5 to 5e-5
Epochs: Usually 2-4 epochs enough
Max length: 512 tokens maximum

Remember

  • BERT is pre-trained on massive text
  • Fine-tune for your specific task
  • Very powerful for NLP tasks
  • Consider lighter versions for speed
#AI#Advanced#BERT