Learn Logistic Regression - the fundamental algorithm for classification problems in machine learning.

Logistic Regression: Classification Basics

Despite the confusing name, Logistic Regression is for **classification**, not regression. It predicts categories.

The Problem

Linear Regression predicts numbers. But what if you need: - Spam or Not Spam? - Will customer churn? Yes/No - Is transaction fraud? Yes/No

You need probabilities and categories, not raw numbers.

The Solution: Sigmoid Function

Logistic Regression uses the sigmoid function to squash any number into 0-1 range:

``` 1 σ(x) = ───────── 1 + e^(-x) ```

``` Output 1 │ ════════ │ ╱ 0.5 │──────╋────────── │ ╱ 0 │════ └──────────────── x ```

Now we can interpret the output as probability!

How It Works

### Step 1: Linear combination (like linear regression) ```python z = w1*x1 + w2*x2 + ... + bias ```

### Step 2: Sigmoid to get probability ```python probability = sigmoid(z) = 1 / (1 + exp(-z)) ```

### Step 3: Threshold for final prediction ```python prediction = 1 if probability >= 0.5 else 0 ```

Code Example

```python from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split import numpy as np

Sample data: study hours, previous score -> pass/fail X = np.array([[2, 50], [3, 55], [5, 65], [6, 70], [8, 85], [10, 90], [1, 45], [2, 48], [4, 60], [7, 78]]) y = np.array([0, 0, 1, 1, 1, 1, 0, 0, 1, 1]) # 0=fail, 1=pass

Split data X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

Train model model = LogisticRegression() model.fit(X_train, y_train)

Predict new_student = [[5, 62]] # 5 hours study, 62 previous score probability = model.predict_proba(new_student) prediction = model.predict(new_student)

print(f"Probability of passing: {probability[0][1]:.2%}") print(f"Prediction: {'Pass' if prediction[0] == 1 else 'Fail'}") ```

Output: ``` Probability of passing: 73.45% Prediction: Pass ```

Understanding the Output

```python model.predict_proba([[5, 62]]) # Returns: [[0.2655, 0.7345]] # ↑ Prob(Fail) ↑ Prob(Pass) ```

Probabilities always sum to 1.

Multiclass Classification

What if you have more than 2 classes? (Cat, Dog, Bird)

```python model = LogisticRegression(multi_class='multinomial') model.fit(X, y) # y can be [0, 1, 2] for three classes

Predictions model.predict_proba([[features]]) # Returns: [[0.15, 0.75, 0.10]] # Probabilities for each class ```

Evaluation Metrics

### Accuracy ```python from sklearn.metrics import accuracy_score accuracy = accuracy_score(y_true, y_pred) # % correct ```

### Confusion Matrix ``` Predicted 0 1 Actual 0 [TN, FP] 1 [FN, TP] ```

- **TN**: Correctly predicted negative - **TP**: Correctly predicted positive - **FP**: False alarm (predicted positive, was negative) - **FN**: Missed it (predicted negative, was positive)

### Precision & Recall ```python precision = TP / (TP + FP) # Of predicted positives, how many correct? recall = TP / (TP + FN) # Of actual positives, how many found? ```

Choosing the Threshold

Default threshold is 0.5, but you can change it:

```python # Lower threshold = more positive predictions threshold = 0.3 predictions = (model.predict_proba(X)[:, 1] >= threshold).astype(int) ```

**When to lower threshold:** When missing positives is costly (disease detection) **When to raise threshold:** When false alarms are costly (spam filter)

Logistic vs Linear Regression

| Aspect | Linear Regression | Logistic Regression | |--------|------------------|---------------------| | Output | Any number | Probability (0-1) | | Use case | Predict values | Predict categories | | Function | Straight line | S-curve (sigmoid) | | Example | House price | Spam/Not spam |

Pros and Cons

**Pros:** - Simple and fast - Gives probabilities, not just predictions - Interpretable coefficients - Works well for linearly separable data

**Cons:** - Assumes linear decision boundary - Can't capture complex patterns - Struggles with highly correlated features

Key Takeaway

Logistic Regression is the go-to algorithm for classification. It's simple, interpretable, and often works surprisingly well. Even when you use fancier models, Logistic Regression is a great baseline to compare against.

Classification Basics: Logistic Regression