ML5 min read

Features and Labels: The Building Blocks of ML

Understand features and labels - the fundamental concepts you need before building any ML model.

Sarah Chen
December 19, 2025
0.0k0

Features and Labels: ML Building Blocks

Every ML problem boils down to: given these features, predict this label.

What Are Features?

Features are the input variables—the information you give the model to make predictions.

Example: Predicting house prices

Features might be:

  • Square footage
  • Number of bedrooms
  • Location
  • Year built
  • Has garage?

Each feature is a piece of information that might help predict the price.

What Are Labels?

Labels are what you're trying to predict—the output.

Problem Label
House price prediction Price ($)
Email spam detection Spam or Not Spam
Disease diagnosis Disease type
Customer churn Will leave? Yes/No

Features vs Labels

Features (X)              Label (y)
─────────────            ─────────
[sqft, beds, location] → [price]
[email_text, sender]   → [spam/not_spam]
[age, symptoms, tests] → [diagnosis]

In code:

X = data[['sqft', 'bedrooms', 'location']]  # Features
y = data['price']  # Label

model.fit(X, y)  # Learn: features → label

Good Features Matter More Than Fancy Algorithms

A simple model with great features beats a complex model with poor features.

Feature Engineering = Creating good features from raw data

Example - Predicting flight delays:

  • Raw data: departure_time = "2025-03-15 14:30:00"
  • Better features:
    • hour_of_day = 14
    • day_of_week = Saturday
    • is_holiday = False
    • month = March

Types of Features

Numerical

Numbers that have mathematical meaning.

age = 25
temperature = 72.5
income = 50000

Categorical

Categories or groups.

color = "red"
country = "USA"
size = "medium"

Binary

Yes/No, True/False.

is_member = True
has_insurance = False

What Makes a Good Feature?

1. Predictive Power

Does it actually help predict the label?

  • Height probably helps predict basketball skill
  • Shoe size probably doesn't

2. Available at Prediction Time

You need the feature when making predictions!

  • Predicting "will customer buy?"
  • Can't use "did customer buy" as a feature 😅

3. Not Too Many Missing Values

Features with 50% missing data cause problems.

4. Not Redundant

Don't include both "age" and "birth_year"—same information.

Common Mistakes

Mistake 1: Data Leakage

# Predicting if patient has diabetes
# BAD: insulin_dosage as feature (reveals the answer!)
# GOOD: age, weight, family_history

Mistake 2: Using Future Information

# Predicting tomorrow's stock price
# BAD: tomorrow's trading volume (you don't have it yet!)
# GOOD: historical prices, today's volume

Quick Vocab

  • Feature Matrix (X): All features for all samples
  • Target Vector (y): All labels
  • Feature Engineering: Creating new features
  • Feature Selection: Choosing which features to use

Summary

Term What It Is Example
Feature Input variable Age, income, location
Label Output to predict Price, category
Sample One data point One house, one customer

Remember: Garbage features in = Garbage predictions out

Spend time on your features. They're often more important than which algorithm you choose.

#Machine Learning#Features#Labels#Beginner