AI7 min read
BERT and Pre-trained Language Models
Use BERT for NLP tasks.
Dr. James Rodriguez
December 18, 2025
0.0k0
Powerful pre-trained language understanding.
What is BERT?
Bidirectional Encoder Representations from Transformers.
Key: Reads text both ways (left-to-right and right-to-left)!
Why BERT?
Pre-trained: Learned from millions of books
Context-aware: Understands word meaning from context
Fine-tunable: Adapt to your specific task
Using BERT
from transformers import BertTokenizer, BertForSequenceClassification
import torch
# Load pre-trained model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained(
'bert-base-uncased',
num_labels=2 # Binary classification
)
# Tokenize text
text = "This movie is fantastic!"
inputs = tokenizer(
text,
padding=True,
truncation=True,
return_tensors='pt'
)
# Get prediction
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)
Fine-tuning BERT
from transformers import BertForSequenceClassification, Trainer, TrainingArguments
from datasets import load_dataset
# Load dataset
dataset = load_dataset('imdb')
# Tokenize dataset
def tokenize_function(examples):
return tokenizer(
examples['text'],
padding='max_length',
truncation=True
)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
# Training arguments
training_args = TrainingArguments(
output_dir='./results',
num_train_epochs=3,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
warmup_steps=500,
weight_decay=0.01,
logging_dir='./logs',
)
# Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets['train'],
eval_dataset=tokenized_datasets['test']
)
# Train
trainer.train()
BERT for Different Tasks
Text Classification
from transformers import pipeline
classifier = pipeline('sentiment-analysis')
result = classifier("I love this product!")
print(result)
Named Entity Recognition
ner = pipeline('ner')
result = ner("John Smith lives in New York")
print(result)
Question Answering
qa = pipeline('question-answering')
result = qa(
question="Where does John live?",
context="John Smith lives in New York and works at Google."
)
print(result)
BERT Variants
RoBERTa: Robustly optimized BERT
ALBERT: Lighter BERT
DistilBERT: Faster, smaller BERT
ELECTRA: More efficient pre-training
DeBERTa: Improved attention mechanism
Best Practices
Batch size: Use smaller batches (8-16)
Learning rate: Start with 2e-5 to 5e-5
Epochs: Usually 2-4 epochs enough
Max length: 512 tokens maximum
Remember
- BERT is pre-trained on massive text
- Fine-tune for your specific task
- Very powerful for NLP tasks
- Consider lighter versions for speed
#AI#Advanced#BERT