BERT and Pre-trained Language Models
Use BERT for NLP tasks.
Powerful pre-trained language understanding.
What is BERT?
Bidirectional Encoder Representations from Transformers.
**Key**: Reads text both ways (left-to-right and right-to-left)!
Why BERT?
**Pre-trained**: Learned from millions of books **Context-aware**: Understands word meaning from context **Fine-tunable**: Adapt to your specific task
Using BERT
```python from transformers import BertTokenizer, BertForSequenceClassification import torch
Load pre-trained model tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertForSequenceClassification.from_pretrained( 'bert-base-uncased', num_labels=2 # Binary classification )
Tokenize text text = "This movie is fantastic!" inputs = tokenizer( text, padding=True, truncation=True, return_tensors='pt' )
Get prediction with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) print(predictions) ```
Fine-tuning BERT
```python from transformers import BertForSequenceClassification, Trainer, TrainingArguments from datasets import load_dataset
Load dataset dataset = load_dataset('imdb')
Tokenize dataset def tokenize_function(examples): return tokenizer( examples['text'], padding='max_length', truncation=True )
tokenized_datasets = dataset.map(tokenize_function, batched=True)
Training arguments training_args = TrainingArguments( output_dir='./results', num_train_epochs=3, per_device_train_batch_size=16, per_device_eval_batch_size=16, warmup_steps=500, weight_decay=0.01, logging_dir='./logs', )
Trainer trainer = Trainer( model=model, args=training_args, train_dataset=tokenized_datasets['train'], eval_dataset=tokenized_datasets['test'] )
Train trainer.train() ```
BERT for Different Tasks
### Text Classification ```python from transformers import pipeline
classifier = pipeline('sentiment-analysis') result = classifier("I love this product!") print(result) ```
### Named Entity Recognition ```python ner = pipeline('ner') result = ner("John Smith lives in New York") print(result) ```
### Question Answering ```python qa = pipeline('question-answering') result = qa( question="Where does John live?", context="John Smith lives in New York and works at Google." ) print(result) ```
BERT Variants
**RoBERTa**: Robustly optimized BERT **ALBERT**: Lighter BERT **DistilBERT**: Faster, smaller BERT **ELECTRA**: More efficient pre-training **DeBERTa**: Improved attention mechanism
Best Practices
**Batch size**: Use smaller batches (8-16) **Learning rate**: Start with 2e-5 to 5e-5 **Epochs**: Usually 2-4 epochs enough **Max length**: 512 tokens maximum
Remember
- BERT is pre-trained on massive text - Fine-tune for your specific task - Very powerful for NLP tasks - Consider lighter versions for speed