Naive Bayes Classifier Explained
Learn Naive Bayes - a simple but powerful probabilistic classifier based on Bayes theorem.
Naive Bayes Classifier Explained
Naive Bayes is fast, simple, and surprisingly effective. It's based on probability theory and a "naive" assumption.
The Intuition
You get an email with words: "free", "winner", "click"
What's the probability it's spam?
Naive Bayes calculates:
P(spam | these words) vs P(not spam | these words)
Whichever is higher wins!
Bayes' Theorem
P(A|B) = P(B|A) × P(A)
─────────────
P(B)
For classification:
P(class|features) ∝ P(features|class) × P(class)
- P(class): Prior probability (how common is spam?)
- P(features|class): Likelihood (how common are these features in spam?)
- P(class|features): Posterior (what we want to know!)
The "Naive" Assumption
Naive Bayes assumes features are independent given the class.
P(free, winner, click | spam) = P(free|spam) × P(winner|spam) × P(click|spam)
This is "naive" because features often ARE related. But it works anyway!
Example: Spam Classification
Training data:
Email 1: "free money now" → Spam
Email 2: "meeting tomorrow" → Not Spam
Email 3: "free gift winner" → Spam
Email 4: "project update" → Not Spam
Calculate probabilities:
P(spam) = 2/4 = 0.5
P(not spam) = 2/4 = 0.5
P(free | spam) = 2/2 = 1.0 (appears in both spam)
P(free | not spam) = 0/2 = 0.0
P(winner | spam) = 1/2 = 0.5
P(winner | not spam) = 0/2 = 0.0
New email: "free winner"
P(spam | free, winner) ∝ 0.5 × 1.0 × 0.5 = 0.25
P(not spam | free, winner) ∝ 0.5 × 0.0 × 0.0 = 0.0
Verdict: SPAM!
Types of Naive Bayes
1. Gaussian Naive Bayes
For continuous features. Assumes normal distribution.
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
2. Multinomial Naive Bayes
For discrete counts (word frequencies).
from sklearn.naive_bayes import MultinomialNB
# Great for text classification!
model = MultinomialNB()
model.fit(X_train_counts, y_train)
3. Bernoulli Naive Bayes
For binary features (word present/absent).
from sklearn.naive_bayes import BernoulliNB
model = BernoulliNB()
model.fit(X_train_binary, y_train)
Text Classification Example
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
# Sample data
texts = [
"free money click now",
"meeting at 3pm tomorrow",
"winner free gift claim",
"project deadline friday",
"cheap pills online",
"team lunch next week"
]
labels = ['spam', 'ham', 'spam', 'ham', 'spam', 'ham']
# Convert text to word counts
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)
# Split
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3)
# Train
model = MultinomialNB()
model.fit(X_train, y_train)
# Predict new email
new_email = vectorizer.transform(["free winner prize"])
prediction = model.predict(new_email)
probability = model.predict_proba(new_email)
print(f"Prediction: {prediction[0]}")
print(f"Probabilities: {probability}")
Laplace Smoothing
What if a word never appears in spam during training?
P(word | spam) = 0 → Everything multiplies to 0!
Solution: Add a small count (Laplace smoothing):
# alpha = smoothing parameter (default 1.0)
model = MultinomialNB(alpha=1.0)
P(word | spam) = (count + 1) / (total + vocabulary_size)
Pros and Cons
Pros ✅
- Very fast: Training and prediction are quick
- Handles many features: Scales well with high dimensions
- Works with small data: Doesn't need much training data
- Good baseline: Often surprisingly competitive
- Probabilistic: Gives probability estimates
Cons ❌
- Independence assumption: Often violated in practice
- Zero frequency problem: Needs smoothing
- Continuous features: Assumes Gaussian (may not be true)
When to Use Naive Bayes
Great for:
- Text classification (spam, sentiment, topic)
- Real-time prediction (very fast)
- Multi-class problems
- When you have little training data
Less suitable for:
- Complex feature interactions
- When independence assumption is badly violated
Comparison with Other Classifiers
| Aspect | Naive Bayes | Logistic Regression | SVM |
|---|---|---|---|
| Speed | Very fast | Fast | Slower |
| Training data needed | Less | Medium | More |
| Feature independence | Assumes yes | No assumption | No assumption |
| Interpretability | Good | Good | Poor |
Key Takeaways
- Based on Bayes' theorem - calculates P(class|features)
- "Naive" assumption - features are independent (often wrong, still works!)
- Three types: Gaussian, Multinomial, Bernoulli
- Great for text - spam filtering, sentiment analysis
- Fast and simple - excellent baseline model
Despite its simplicity, Naive Bayes often performs surprisingly well, especially for text classification. Always try it as a baseline!