ML8 min read

Classification Basics: Logistic Regression

Learn Logistic Regression - the fundamental algorithm for classification problems in machine learning.

Sarah Chen
December 19, 2025
0.0k0

Logistic Regression: Classification Basics

Despite the confusing name, Logistic Regression is for classification, not regression. It predicts categories.

The Problem

Linear Regression predicts numbers. But what if you need:

  • Spam or Not Spam?
  • Will customer churn? Yes/No
  • Is transaction fraud? Yes/No

You need probabilities and categories, not raw numbers.

The Solution: Sigmoid Function

Logistic Regression uses the sigmoid function to squash any number into 0-1 range:

         1
σ(x) = ─────────
       1 + e^(-x)
Output
  1 │          ════════
    │        ╱
0.5 │──────╋──────────
    │    ╱
  0 │════
    └──────────────── x

Now we can interpret the output as probability!

How It Works

Step 1: Linear combination (like linear regression)

z = w1*x1 + w2*x2 + ... + bias

Step 2: Sigmoid to get probability

probability = sigmoid(z) = 1 / (1 + exp(-z))

Step 3: Threshold for final prediction

prediction = 1 if probability >= 0.5 else 0

Code Example

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import numpy as np

# Sample data: study hours, previous score -> pass/fail
X = np.array([[2, 50], [3, 55], [5, 65], [6, 70], [8, 85], [10, 90],
              [1, 45], [2, 48], [4, 60], [7, 78]])
y = np.array([0, 0, 1, 1, 1, 1, 0, 0, 1, 1])  # 0=fail, 1=pass

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Train model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict
new_student = [[5, 62]]  # 5 hours study, 62 previous score
probability = model.predict_proba(new_student)
prediction = model.predict(new_student)

print(f"Probability of passing: {probability[0][1]:.2%}")
print(f"Prediction: {'Pass' if prediction[0] == 1 else 'Fail'}")

Output:

Probability of passing: 73.45%
Prediction: Pass

Understanding the Output

model.predict_proba([[5, 62]])
# Returns: [[0.2655, 0.7345]]
#           ↑ Prob(Fail)  ↑ Prob(Pass)

Probabilities always sum to 1.

Multiclass Classification

What if you have more than 2 classes? (Cat, Dog, Bird)

model = LogisticRegression(multi_class='multinomial')
model.fit(X, y)  # y can be [0, 1, 2] for three classes

# Predictions
model.predict_proba([[features]])
# Returns: [[0.15, 0.75, 0.10]]  # Probabilities for each class

Evaluation Metrics

Accuracy

from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_true, y_pred)  # % correct

Confusion Matrix

                 Predicted
                 0    1
Actual    0    [TN,  FP]
          1    [FN,  TP]
  • TN: Correctly predicted negative
  • TP: Correctly predicted positive
  • FP: False alarm (predicted positive, was negative)
  • FN: Missed it (predicted negative, was positive)

Precision & Recall

precision = TP / (TP + FP)  # Of predicted positives, how many correct?
recall = TP / (TP + FN)     # Of actual positives, how many found?

Choosing the Threshold

Default threshold is 0.5, but you can change it:

# Lower threshold = more positive predictions
threshold = 0.3
predictions = (model.predict_proba(X)[:, 1] >= threshold).astype(int)

When to lower threshold: When missing positives is costly (disease detection)
When to raise threshold: When false alarms are costly (spam filter)

Logistic vs Linear Regression

Aspect Linear Regression Logistic Regression
Output Any number Probability (0-1)
Use case Predict values Predict categories
Function Straight line S-curve (sigmoid)
Example House price Spam/Not spam

Pros and Cons

Pros:

  • Simple and fast
  • Gives probabilities, not just predictions
  • Interpretable coefficients
  • Works well for linearly separable data

Cons:

  • Assumes linear decision boundary
  • Can't capture complex patterns
  • Struggles with highly correlated features

Key Takeaway

Logistic Regression is the go-to algorithm for classification. It's simple, interpretable, and often works surprisingly well. Even when you use fancier models, Logistic Regression is a great baseline to compare against.

#Machine Learning#Logistic Regression#Classification#Beginner