AI6 min read
Word Embeddings
Represent words as vectors in AI.
Robert Anderson
December 18, 2025
0.0k0
Words as numbers.
What are Word Embeddings?
Representing words as vectors (lists of numbers).
Similar words have similar vectors!
Why Better than One-Hot?
One-Hot: Each word is unique, no relationships
- cat = [1, 0, 0, 0]
- dog = [0, 1, 0, 0]
- king = [0, 0, 1, 0]
Embeddings: Capture relationships
- cat and dog are similar (both animals)
- king and queen are similar (both royalty)
Word2Vec
Popular embedding method:
from gensim.models import Word2Vec
sentences = [
['cat', 'sits', 'on', 'mat'],
['dog', 'plays', 'in', 'park'],
['cat', 'and', 'dog', 'play']
]
# Train Word2Vec
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1)
# Get vector for word
vector = model.wv['cat']
print(vector)
# Find similar words
similar = model.wv.most_similar('cat', topn=3)
print(similar) # [('dog', 0.87), ...]
Cool Math with Embeddings
# king - man + woman ≈ queen
result = model.wv.most_similar(
positive=['king', 'woman'],
negative=['man']
)
print(result[0]) # 'queen'
Pre-trained Embeddings
Use embeddings trained on millions of documents:
GloVe: Stanford's embeddings
FastText: Facebook's embeddings
BERT: Google's contextual embeddings
import gensim.downloader as api
# Load pre-trained GloVe
glove = api.load("glove-wiki-gigaword-100")
# Use immediately
similar = glove.most_similar('python')
print(similar)
Using in Neural Networks
from tensorflow.keras.layers import Embedding
model = Sequential([
Embedding(input_dim=10000, output_dim=128),
LSTM(64),
Dense(1, activation='sigmoid')
])
BERT - Contextual Embeddings
Unlike Word2Vec, BERT understands context:
"Bank" in "river bank" vs "money bank" gets different vectors!
from transformers import BertTokenizer, BertModel
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased')
text = "The bank by the river"
inputs = tokenizer(text, return_tensors='pt')
outputs = model(**inputs)
Applications
- Text classification
- Machine translation
- Sentiment analysis
- Question answering
Remember
- Embeddings capture word meanings
- Pre-trained embeddings save time
- BERT understands context
#AI#Intermediate#NLP