ML11 min read

Convolutional Neural Networks (CNNs) Explained

Learn how CNNs work for image recognition, the role of convolutions, pooling, and how to build your first CNN.

Sarah Chen
December 19, 2025
0.0k0

Convolutional Neural Networks (CNNs) Explained

Regular neural networks don't understand spatial structure. Feed an image as a flat array and you lose the fact that nearby pixels are related. CNNs solve this.

The Problem with Regular Networks

A 224x224 color image has 224 × 224 × 3 = 150,528 inputs. A fully connected layer with 1000 neurons = 150 million parameters. That's insane and ignores spatial patterns.

How Convolution Works

Instead of connecting every pixel to every neuron, slide a small filter across the image:

Image:          Filter (3x3):    Output:
1 2 3 4         1 0 1           
2 3 4 5   *     0 1 0     =     Result
3 4 5 6         1 0 1
4 5 6 7

The filter detects patterns (edges, textures, shapes). Multiple filters = multiple pattern detectors.

CNN Architecture

import tensorflow as tf
from tensorflow.keras import layers, models

model = models.Sequential([
    # Convolution + Pooling Block 1
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)),
    layers.MaxPooling2D((2, 2)),
    
    # Convolution + Pooling Block 2
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    # Convolution + Pooling Block 3
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    
    # Flatten and Dense layers
    layers.Flatten(),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(10, activation='softmax')
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

Key Layers

Convolutional Layer:

  • Applies filters to detect features
  • Parameters: number of filters, filter size, stride, padding

Pooling Layer:

  • Reduces spatial dimensions
  • MaxPooling takes the maximum value in each region
  • Makes features translation-invariant

Flatten:

  • Converts 2D feature maps to 1D vector

What CNNs Learn

Each layer learns different features:

Layer What It Learns
Early layers Edges, colors, simple textures
Middle layers Shapes, patterns, object parts
Deep layers Complex objects, faces, scenes

Practical Example: Image Classification

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Data augmentation
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True
)

train_generator = train_datagen.flow_from_directory(
    'data/train',
    target_size=(224, 224),
    batch_size=32,
    class_mode='categorical'
)

# Train
history = model.fit(
    train_generator,
    epochs=20,
    validation_data=val_generator
)

Key Takeaway

CNNs use convolutions to efficiently process images by exploiting spatial structure. They learn hierarchical features automatically - edges in early layers, complex objects in deeper layers. Use them for any image-related task. Start with a simple architecture, add data augmentation, and consider transfer learning for faster results.

#Machine Learning#Deep Learning#CNN#Computer Vision#Advanced