ML5 min read

Features and Labels: The Building Blocks of ML

Understand features and labels - the fundamental concepts you need before building any ML model.

Sarah Chen
December 19, 2025
0.0k0

Features and Labels: ML Building Blocks

Every ML problem boils down to: given these **features**, predict this **label**.

What Are Features?

Features are the input variables—the information you give the model to make predictions.

**Example: Predicting house prices**

Features might be: - Square footage - Number of bedrooms - Location - Year built - Has garage?

Each feature is a piece of information that might help predict the price.

What Are Labels?

Labels are what you're trying to predict—the output.

| Problem | Label | |---------|-------| | House price prediction | Price ($) | | Email spam detection | Spam or Not Spam | | Disease diagnosis | Disease type | | Customer churn | Will leave? Yes/No |

Features vs Labels

``` Features (X) Label (y) ───────────── ───────── [sqft, beds, location] → [price] [email_text, sender] → [spam/not_spam] [age, symptoms, tests] → [diagnosis] ```

In code: ```python X = data[['sqft', 'bedrooms', 'location']] # Features y = data['price'] # Label

model.fit(X, y) # Learn: features → label ```

Good Features Matter More Than Fancy Algorithms

A simple model with great features beats a complex model with poor features.

**Feature Engineering** = Creating good features from raw data

Example - Predicting flight delays: - Raw data: departure_time = "2025-03-15 14:30:00" - Better features: - hour_of_day = 14 - day_of_week = Saturday - is_holiday = False - month = March

Types of Features

### Numerical Numbers that have mathematical meaning. ```python age = 25 temperature = 72.5 income = 50000 ```

### Categorical Categories or groups. ```python color = "red" country = "USA" size = "medium" ```

### Binary Yes/No, True/False. ```python is_member = True has_insurance = False ```

What Makes a Good Feature?

### 1. Predictive Power Does it actually help predict the label? - Height probably helps predict basketball skill - Shoe size probably doesn't

### 2. Available at Prediction Time You need the feature when making predictions! - Predicting "will customer buy?" - Can't use "did customer buy" as a feature 😅

### 3. Not Too Many Missing Values Features with 50% missing data cause problems.

### 4. Not Redundant Don't include both "age" and "birth_year"—same information.

Common Mistakes

### Mistake 1: Data Leakage ```python # Predicting if patient has diabetes # BAD: insulin_dosage as feature (reveals the answer!) # GOOD: age, weight, family_history ```

### Mistake 2: Using Future Information ```python # Predicting tomorrow's stock price # BAD: tomorrow's trading volume (you don't have it yet!) # GOOD: historical prices, today's volume ```

Quick Vocab

- **Feature Matrix (X)**: All features for all samples - **Target Vector (y)**: All labels - **Feature Engineering**: Creating new features - **Feature Selection**: Choosing which features to use

Summary

| Term | What It Is | Example | |------|-----------|---------| | Feature | Input variable | Age, income, location | | Label | Output to predict | Price, category | | Sample | One data point | One house, one customer |

Remember: **Garbage features in = Garbage predictions out**

Spend time on your features. They're often more important than which algorithm you choose.

#Machine Learning#Features#Labels#Beginner