Naive Bayes Classifier Explained
Learn Naive Bayes - a simple but powerful probabilistic classifier based on Bayes theorem.
Naive Bayes Classifier Explained
Naive Bayes is fast, simple, and surprisingly effective. It's based on probability theory and a "naive" assumption.
The Intuition
You get an email with words: "free", "winner", "click"
What's the probability it's spam?
Naive Bayes calculates: ``` P(spam | these words) vs P(not spam | these words) ```
Whichever is higher wins!
Bayes' Theorem
``` P(A|B) = P(B|A) × P(A) ───────────── P(B) ```
For classification: ``` P(class|features) ∝ P(features|class) × P(class) ```
- **P(class):** Prior probability (how common is spam?) - **P(features|class):** Likelihood (how common are these features in spam?) - **P(class|features):** Posterior (what we want to know!)
The "Naive" Assumption
Naive Bayes assumes features are **independent** given the class.
``` P(free, winner, click | spam) = P(free|spam) × P(winner|spam) × P(click|spam) ```
This is "naive" because features often ARE related. But it works anyway!
Example: Spam Classification
Training data: ``` Email 1: "free money now" → Spam Email 2: "meeting tomorrow" → Not Spam Email 3: "free gift winner" → Spam Email 4: "project update" → Not Spam ```
Calculate probabilities: ``` P(spam) = 2/4 = 0.5 P(not spam) = 2/4 = 0.5
P(free | spam) = 2/2 = 1.0 (appears in both spam) P(free | not spam) = 0/2 = 0.0
P(winner | spam) = 1/2 = 0.5 P(winner | not spam) = 0/2 = 0.0 ```
New email: "free winner" ``` P(spam | free, winner) ∝ 0.5 × 1.0 × 0.5 = 0.25 P(not spam | free, winner) ∝ 0.5 × 0.0 × 0.0 = 0.0
Verdict: SPAM! ```
Types of Naive Bayes
### 1. Gaussian Naive Bayes For continuous features. Assumes normal distribution.
```python from sklearn.naive_bayes import GaussianNB
model = GaussianNB() model.fit(X_train, y_train) predictions = model.predict(X_test) ```
### 2. Multinomial Naive Bayes For discrete counts (word frequencies).
```python from sklearn.naive_bayes import MultinomialNB
Great for text classification! model = MultinomialNB() model.fit(X_train_counts, y_train) ```
### 3. Bernoulli Naive Bayes For binary features (word present/absent).
```python from sklearn.naive_bayes import BernoulliNB
model = BernoulliNB() model.fit(X_train_binary, y_train) ```
Text Classification Example
```python from sklearn.naive_bayes import MultinomialNB from sklearn.feature_extraction.text import CountVectorizer from sklearn.model_selection import train_test_split
Sample data texts = [ "free money click now", "meeting at 3pm tomorrow", "winner free gift claim", "project deadline friday", "cheap pills online", "team lunch next week" ] labels = ['spam', 'ham', 'spam', 'ham', 'spam', 'ham']
Convert text to word counts vectorizer = CountVectorizer() X = vectorizer.fit_transform(texts)
Split X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3)
Train model = MultinomialNB() model.fit(X_train, y_train)
Predict new email new_email = vectorizer.transform(["free winner prize"]) prediction = model.predict(new_email) probability = model.predict_proba(new_email)
print(f"Prediction: {prediction[0]}") print(f"Probabilities: {probability}") ```
Laplace Smoothing
What if a word never appears in spam during training? ``` P(word | spam) = 0 → Everything multiplies to 0! ```
**Solution:** Add a small count (Laplace smoothing):
```python # alpha = smoothing parameter (default 1.0) model = MultinomialNB(alpha=1.0) ```
``` P(word | spam) = (count + 1) / (total + vocabulary_size) ```
Pros and Cons
### Pros ✅ - **Very fast:** Training and prediction are quick - **Handles many features:** Scales well with high dimensions - **Works with small data:** Doesn't need much training data - **Good baseline:** Often surprisingly competitive - **Probabilistic:** Gives probability estimates
### Cons ❌ - **Independence assumption:** Often violated in practice - **Zero frequency problem:** Needs smoothing - **Continuous features:** Assumes Gaussian (may not be true)
When to Use Naive Bayes
**Great for:** - Text classification (spam, sentiment, topic) - Real-time prediction (very fast) - Multi-class problems - When you have little training data
**Less suitable for:** - Complex feature interactions - When independence assumption is badly violated
Comparison with Other Classifiers
| Aspect | Naive Bayes | Logistic Regression | SVM | |--------|-------------|--------------------| --- | | Speed | Very fast | Fast | Slower | | Training data needed | Less | Medium | More | | Feature independence | Assumes yes | No assumption | No assumption | | Interpretability | Good | Good | Poor |
Key Takeaways
1. **Based on Bayes' theorem** - calculates P(class|features) 2. **"Naive" assumption** - features are independent (often wrong, still works!) 3. **Three types:** Gaussian, Multinomial, Bernoulli 4. **Great for text** - spam filtering, sentiment analysis 5. **Fast and simple** - excellent baseline model
Despite its simplicity, Naive Bayes often performs surprisingly well, especially for text classification. Always try it as a baseline!