Learn Linear Regression from scratch - the simplest and most important ML algorithm to understand.

Linear Regression: Your First ML Model

Linear Regression is the "Hello World" of machine learning. Simple, powerful, and teaches you core concepts.

What Is Linear Regression?

Finding the best straight line through your data points.

``` Price │ │ ∙ ∙ │ ∙ ─────── ← Best fit line │ ∙ ∙ │ ∙ └────────────── Size ```

The line lets you predict: "If house is X sqft, price is probably Y."

The Math (Keep It Simple)

The equation of a line: ``` y = mx + b ```

In ML terms: ``` prediction = (weight × feature) + bias ```

- **weight (m)**: How much the feature affects the prediction - **bias (b)**: The baseline value

Multiple Features

With more features, we just add more weights:

``` price = (w1 × sqft) + (w2 × bedrooms) + (w3 × age) + bias ```

The model learns the best weights automatically.

How Does It Learn?

### Step 1: Start with random weights ```python weights = [0.5, 0.3, -0.1] # Random guess ```

### Step 2: Make predictions ```python predicted_price = weights[0]*sqft + weights[1]*beds + weights[2]*age + bias ```

### Step 3: Calculate error ```python error = actual_price - predicted_price ```

### Step 4: Adjust weights to reduce error This is called **gradient descent** (more on this later).

### Step 5: Repeat until error is small

Code Example

```python from sklearn.linear_model import LinearRegression import numpy as np

Training data X = np.array([[1400, 3], [1600, 3], [1700, 2], [1875, 3], [1100, 2]]) # sqft, beds y = np.array([245000, 312000, 279000, 308000, 199000]) # prices

Create and train model model = LinearRegression() model.fit(X, y)

See what it learned print(f"Weights: {model.coef_}") # How much each feature matters print(f"Bias: {model.intercept_}") # Baseline

Make prediction new_house = [[1500, 3]] # 1500 sqft, 3 beds predicted = model.predict(new_house) print(f"Predicted price: ${predicted[0]:,.0f}") ```

Output: ``` Weights: [156.2, 8432.1] # sqft matters a lot! Bias: -18432.5 Predicted price: $268,756 ```

Interpreting the Weights

If weight for sqft = 156.2: "Each additional square foot adds ~$156 to the price"

If weight for bedrooms = 8432.1: "Each additional bedroom adds ~$8,432 to the price"

**This interpretability is Linear Regression's superpower!**

Measuring Performance

### Mean Squared Error (MSE) ```python errors = predictions - actual_values mse = mean(errors ** 2) ``` Lower is better. Penalizes big errors heavily.

### R² Score ```python from sklearn.metrics import r2_score score = r2_score(y_true, y_pred) ``` - R² = 1.0: Perfect predictions - R² = 0.0: No better than guessing the average - R² < 0: Worse than guessing!

When Linear Regression Works Well

✅ Relationship is actually linear ✅ Features are independent (not too correlated) ✅ No extreme outliers ✅ You need interpretable results

When It Fails

❌ Non-linear relationships (use polynomial or other models) ❌ Classification problems (use Logistic Regression instead) ❌ Complex patterns (use decision trees, neural networks)

Assumptions to Know

1. **Linearity**: Relationship is roughly linear 2. **Independence**: Data points are independent 3. **Homoscedasticity**: Error spread is constant 4. **Normality**: Errors are normally distributed (for inference)

Don't worry too much about these for predictions. They matter more for statistical analysis.

Quick Summary

| Aspect | Linear Regression | |--------|------------------| | Type | Supervised, Regression | | Output | Continuous number | | Formula | y = w₁x₁ + w₂x₂ + ... + b | | Learns | Optimal weights | | Pro | Simple, interpretable | | Con | Only linear relationships |

Linear Regression is your foundation. Even if you use complex models later, understanding this helps you understand them all.

Your First ML Model: Linear Regression Explained