ML7 min read

Naive Bayes Classifier Explained

Learn Naive Bayes - a simple but powerful probabilistic classifier based on Bayes theorem.

Sarah Chen
December 19, 2025
0.0k0

Naive Bayes Classifier Explained

Naive Bayes is fast, simple, and surprisingly effective. It's based on probability theory and a "naive" assumption.

The Intuition

You get an email with words: "free", "winner", "click"

What's the probability it's spam?

Naive Bayes calculates:

P(spam | these words) vs P(not spam | these words)

Whichever is higher wins!

Bayes' Theorem

P(A|B) = P(B|A) × P(A)
         ─────────────
              P(B)

For classification:

P(class|features) ∝ P(features|class) × P(class)
  • P(class): Prior probability (how common is spam?)
  • P(features|class): Likelihood (how common are these features in spam?)
  • P(class|features): Posterior (what we want to know!)

The "Naive" Assumption

Naive Bayes assumes features are independent given the class.

P(free, winner, click | spam) = P(free|spam) × P(winner|spam) × P(click|spam)

This is "naive" because features often ARE related. But it works anyway!

Example: Spam Classification

Training data:

Email 1: "free money now" → Spam
Email 2: "meeting tomorrow" → Not Spam
Email 3: "free gift winner" → Spam
Email 4: "project update" → Not Spam

Calculate probabilities:

P(spam) = 2/4 = 0.5
P(not spam) = 2/4 = 0.5

P(free | spam) = 2/2 = 1.0  (appears in both spam)
P(free | not spam) = 0/2 = 0.0

P(winner | spam) = 1/2 = 0.5
P(winner | not spam) = 0/2 = 0.0

New email: "free winner"

P(spam | free, winner) ∝ 0.5 × 1.0 × 0.5 = 0.25
P(not spam | free, winner) ∝ 0.5 × 0.0 × 0.0 = 0.0

Verdict: SPAM!

Types of Naive Bayes

1. Gaussian Naive Bayes

For continuous features. Assumes normal distribution.

from sklearn.naive_bayes import GaussianNB

model = GaussianNB()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

2. Multinomial Naive Bayes

For discrete counts (word frequencies).

from sklearn.naive_bayes import MultinomialNB

# Great for text classification!
model = MultinomialNB()
model.fit(X_train_counts, y_train)

3. Bernoulli Naive Bayes

For binary features (word present/absent).

from sklearn.naive_bayes import BernoulliNB

model = BernoulliNB()
model.fit(X_train_binary, y_train)

Text Classification Example

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split

# Sample data
texts = [
    "free money click now",
    "meeting at 3pm tomorrow",
    "winner free gift claim",
    "project deadline friday",
    "cheap pills online",
    "team lunch next week"
]
labels = ['spam', 'ham', 'spam', 'ham', 'spam', 'ham']

# Convert text to word counts
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# Split
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3)

# Train
model = MultinomialNB()
model.fit(X_train, y_train)

# Predict new email
new_email = vectorizer.transform(["free winner prize"])
prediction = model.predict(new_email)
probability = model.predict_proba(new_email)

print(f"Prediction: {prediction[0]}")
print(f"Probabilities: {probability}")

Laplace Smoothing

What if a word never appears in spam during training?

P(word | spam) = 0 → Everything multiplies to 0!

Solution: Add a small count (Laplace smoothing):

# alpha = smoothing parameter (default 1.0)
model = MultinomialNB(alpha=1.0)
P(word | spam) = (count + 1) / (total + vocabulary_size)

Pros and Cons

Pros ✅

  • Very fast: Training and prediction are quick
  • Handles many features: Scales well with high dimensions
  • Works with small data: Doesn't need much training data
  • Good baseline: Often surprisingly competitive
  • Probabilistic: Gives probability estimates

Cons ❌

  • Independence assumption: Often violated in practice
  • Zero frequency problem: Needs smoothing
  • Continuous features: Assumes Gaussian (may not be true)

When to Use Naive Bayes

Great for:

  • Text classification (spam, sentiment, topic)
  • Real-time prediction (very fast)
  • Multi-class problems
  • When you have little training data

Less suitable for:

  • Complex feature interactions
  • When independence assumption is badly violated

Comparison with Other Classifiers

Aspect Naive Bayes Logistic Regression SVM
Speed Very fast Fast Slower
Training data needed Less Medium More
Feature independence Assumes yes No assumption No assumption
Interpretability Good Good Poor

Key Takeaways

  1. Based on Bayes' theorem - calculates P(class|features)
  2. "Naive" assumption - features are independent (often wrong, still works!)
  3. Three types: Gaussian, Multinomial, Bernoulli
  4. Great for text - spam filtering, sentiment analysis
  5. Fast and simple - excellent baseline model

Despite its simplicity, Naive Bayes often performs surprisingly well, especially for text classification. Always try it as a baseline!

#Machine Learning#Naive Bayes#Classification#Beginner