ML8 min read

Your First ML Model: Linear Regression Explained

Learn Linear Regression from scratch - the simplest and most important ML algorithm to understand.

Sarah Chen
December 19, 2025
0.0k0

Linear Regression: Your First ML Model

Linear Regression is the "Hello World" of machine learning. Simple, powerful, and teaches you core concepts.

What Is Linear Regression?

Finding the best straight line through your data points.

Price
  │    
  │         ∙  ∙
  │      ∙ ───────  ← Best fit line
  │   ∙  ∙
  │ ∙
  └────────────── Size

The line lets you predict: "If house is X sqft, price is probably Y."

The Math (Keep It Simple)

The equation of a line:

y = mx + b

In ML terms:

prediction = (weight × feature) + bias
  • weight (m): How much the feature affects the prediction
  • bias (b): The baseline value

Multiple Features

With more features, we just add more weights:

price = (w1 × sqft) + (w2 × bedrooms) + (w3 × age) + bias

The model learns the best weights automatically.

How Does It Learn?

Step 1: Start with random weights

weights = [0.5, 0.3, -0.1]  # Random guess

Step 2: Make predictions

predicted_price = weights[0]*sqft + weights[1]*beds + weights[2]*age + bias

Step 3: Calculate error

error = actual_price - predicted_price

Step 4: Adjust weights to reduce error

This is called gradient descent (more on this later).

Step 5: Repeat until error is small

Code Example

from sklearn.linear_model import LinearRegression
import numpy as np

# Training data
X = np.array([[1400, 3], [1600, 3], [1700, 2], [1875, 3], [1100, 2]])  # sqft, beds
y = np.array([245000, 312000, 279000, 308000, 199000])  # prices

# Create and train model
model = LinearRegression()
model.fit(X, y)

# See what it learned
print(f"Weights: {model.coef_}")      # How much each feature matters
print(f"Bias: {model.intercept_}")    # Baseline

# Make prediction
new_house = [[1500, 3]]  # 1500 sqft, 3 beds
predicted = model.predict(new_house)
print(f"Predicted price: ${predicted[0]:,.0f}")

Output:

Weights: [156.2, 8432.1]    # sqft matters a lot!
Bias: -18432.5
Predicted price: $268,756

Interpreting the Weights

If weight for sqft = 156.2:
"Each additional square foot adds ~$156 to the price"

If weight for bedrooms = 8432.1:
"Each additional bedroom adds ~$8,432 to the price"

This interpretability is Linear Regression's superpower!

Measuring Performance

Mean Squared Error (MSE)

errors = predictions - actual_values
mse = mean(errors ** 2)

Lower is better. Penalizes big errors heavily.

R² Score

from sklearn.metrics import r2_score
score = r2_score(y_true, y_pred)
  • R² = 1.0: Perfect predictions
  • R² = 0.0: No better than guessing the average
  • R² < 0: Worse than guessing!

When Linear Regression Works Well

✅ Relationship is actually linear
✅ Features are independent (not too correlated)
✅ No extreme outliers
✅ You need interpretable results

When It Fails

❌ Non-linear relationships (use polynomial or other models)
❌ Classification problems (use Logistic Regression instead)
❌ Complex patterns (use decision trees, neural networks)

Assumptions to Know

  1. Linearity: Relationship is roughly linear
  2. Independence: Data points are independent
  3. Homoscedasticity: Error spread is constant
  4. Normality: Errors are normally distributed (for inference)

Don't worry too much about these for predictions. They matter more for statistical analysis.

Quick Summary

Aspect Linear Regression
Type Supervised, Regression
Output Continuous number
Formula y = w₁x₁ + w₂x₂ + ... + b
Learns Optimal weights
Pro Simple, interpretable
Con Only linear relationships

Linear Regression is your foundation. Even if you use complex models later, understanding this helps you understand them all.

#Machine Learning#Linear Regression#Beginner