ML7 min read

Naive Bayes Classifier Explained

Learn Naive Bayes - a simple but powerful probabilistic classifier based on Bayes theorem.

Sarah Chen
December 19, 2025
0.0k0

Naive Bayes Classifier Explained

Naive Bayes is fast, simple, and surprisingly effective. It's based on probability theory and a "naive" assumption.

The Intuition

You get an email with words: "free", "winner", "click"

What's the probability it's spam?

Naive Bayes calculates: ``` P(spam | these words) vs P(not spam | these words) ```

Whichever is higher wins!

Bayes' Theorem

``` P(A|B) = P(B|A) × P(A) ───────────── P(B) ```

For classification: ``` P(class|features) ∝ P(features|class) × P(class) ```

- **P(class):** Prior probability (how common is spam?) - **P(features|class):** Likelihood (how common are these features in spam?) - **P(class|features):** Posterior (what we want to know!)

The "Naive" Assumption

Naive Bayes assumes features are **independent** given the class.

``` P(free, winner, click | spam) = P(free|spam) × P(winner|spam) × P(click|spam) ```

This is "naive" because features often ARE related. But it works anyway!

Example: Spam Classification

Training data: ``` Email 1: "free money now" → Spam Email 2: "meeting tomorrow" → Not Spam Email 3: "free gift winner" → Spam Email 4: "project update" → Not Spam ```

Calculate probabilities: ``` P(spam) = 2/4 = 0.5 P(not spam) = 2/4 = 0.5

P(free | spam) = 2/2 = 1.0 (appears in both spam) P(free | not spam) = 0/2 = 0.0

P(winner | spam) = 1/2 = 0.5 P(winner | not spam) = 0/2 = 0.0 ```

New email: "free winner" ``` P(spam | free, winner) ∝ 0.5 × 1.0 × 0.5 = 0.25 P(not spam | free, winner) ∝ 0.5 × 0.0 × 0.0 = 0.0

Verdict: SPAM! ```

Types of Naive Bayes

### 1. Gaussian Naive Bayes For continuous features. Assumes normal distribution.

```python from sklearn.naive_bayes import GaussianNB

model = GaussianNB() model.fit(X_train, y_train) predictions = model.predict(X_test) ```

### 2. Multinomial Naive Bayes For discrete counts (word frequencies).

```python from sklearn.naive_bayes import MultinomialNB

Great for text classification! model = MultinomialNB() model.fit(X_train_counts, y_train) ```

### 3. Bernoulli Naive Bayes For binary features (word present/absent).

```python from sklearn.naive_bayes import BernoulliNB

model = BernoulliNB() model.fit(X_train_binary, y_train) ```

Text Classification Example

```python from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import CountVectorizer from sklearn.model_selection import train_test_split

Sample data texts = [ "free money click now", "meeting at 3pm tomorrow", "winner free gift claim", "project deadline friday", "cheap pills online", "team lunch next week" ] labels = ['spam', 'ham', 'spam', 'ham', 'spam', 'ham']

Convert text to word counts vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts)

Split X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3)

Train model = MultinomialNB() model.fit(X_train, y_train)

Predict new email new_email = vectorizer.transform(["free winner prize"]) prediction = model.predict(new_email) probability = model.predict_proba(new_email)

print(f"Prediction: {prediction[0]}") print(f"Probabilities: {probability}") ```

Laplace Smoothing

What if a word never appears in spam during training? ``` P(word | spam) = 0 → Everything multiplies to 0! ```

**Solution:** Add a small count (Laplace smoothing):

```python # alpha = smoothing parameter (default 1.0) model = MultinomialNB(alpha=1.0) ```

``` P(word | spam) = (count + 1) / (total + vocabulary_size) ```

Pros and Cons

### Pros ✅ - **Very fast:** Training and prediction are quick - **Handles many features:** Scales well with high dimensions - **Works with small data:** Doesn't need much training data - **Good baseline:** Often surprisingly competitive - **Probabilistic:** Gives probability estimates

### Cons ❌ - **Independence assumption:** Often violated in practice - **Zero frequency problem:** Needs smoothing - **Continuous features:** Assumes Gaussian (may not be true)

When to Use Naive Bayes

**Great for:** - Text classification (spam, sentiment, topic) - Real-time prediction (very fast) - Multi-class problems - When you have little training data

**Less suitable for:** - Complex feature interactions - When independence assumption is badly violated

Comparison with Other Classifiers

| Aspect | Naive Bayes | Logistic Regression | SVM | |--------|-------------|--------------------| --- | | Speed | Very fast | Fast | Slower | | Training data needed | Less | Medium | More | | Feature independence | Assumes yes | No assumption | No assumption | | Interpretability | Good | Good | Poor |

Key Takeaways

1. **Based on Bayes' theorem** - calculates P(class|features) 2. **"Naive" assumption** - features are independent (often wrong, still works!) 3. **Three types:** Gaussian, Multinomial, Bernoulli 4. **Great for text** - spam filtering, sentiment analysis 5. **Fast and simple** - excellent baseline model

Despite its simplicity, Naive Bayes often performs surprisingly well, especially for text classification. Always try it as a baseline!

#Machine Learning#Naive Bayes#Classification#Beginner