AI7 min read

BERT and Pre-trained Language Models

Use BERT for NLP tasks.

Dr. James Rodriguez
December 18, 2025
0.0k0

Powerful pre-trained language understanding.

What is BERT?

Bidirectional Encoder Representations from Transformers.

**Key**: Reads text both ways (left-to-right and right-to-left)!

Why BERT?

**Pre-trained**: Learned from millions of books **Context-aware**: Understands word meaning from context **Fine-tunable**: Adapt to your specific task

Using BERT

```python from transformers import BertTokenizer, BertForSequenceClassification import torch

Load pre-trained model tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForSequenceClassification.from_pretrained( 'bert-base-uncased', num_labels=2 # Binary classification )

Tokenize text text = "This movie is fantastic!" inputs = tokenizer( text, padding=True, truncation=True, return_tensors='pt' )

Get prediction with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) print(predictions) ```

Fine-tuning BERT

```python from transformers import BertForSequenceClassification, Trainer, TrainingArguments from datasets import load_dataset

Load dataset dataset = load_dataset('imdb')

Tokenize dataset def tokenize_function(examples): return tokenizer( examples['text'], padding='max_length', truncation=True )

tokenized_datasets = dataset.map(tokenize_function, batched=True)

Training arguments training_args = TrainingArguments( output_dir='./results', num_train_epochs=3, per_device_train_batch_size=16, per_device_eval_batch_size=16, warmup_steps=500, weight_decay=0.01, logging_dir='./logs', )

Trainer trainer = Trainer( model=model, args=training_args, train_dataset=tokenized_datasets['train'], eval_dataset=tokenized_datasets['test'] )

Train trainer.train() ```

BERT for Different Tasks

### Text Classification ```python from transformers import pipeline

classifier = pipeline('sentiment-analysis') result = classifier("I love this product!") print(result) ```

### Named Entity Recognition ```python ner = pipeline('ner') result = ner("John Smith lives in New York") print(result) ```

### Question Answering ```python qa = pipeline('question-answering') result = qa( question="Where does John live?", context="John Smith lives in New York and works at Google." ) print(result) ```

BERT Variants

**RoBERTa**: Robustly optimized BERT **ALBERT**: Lighter BERT **DistilBERT**: Faster, smaller BERT **ELECTRA**: More efficient pre-training **DeBERTa**: Improved attention mechanism

Best Practices

**Batch size**: Use smaller batches (8-16) **Learning rate**: Start with 2e-5 to 5e-5 **Epochs**: Usually 2-4 epochs enough **Max length**: 512 tokens maximum

Remember

- BERT is pre-trained on massive text - Fine-tune for your specific task - Very powerful for NLP tasks - Consider lighter versions for speed

#AI#Advanced#BERT