ML11 min read

Convolutional Neural Networks (CNNs) Explained

Learn how CNNs work for image recognition, the role of convolutions, pooling, and how to build your first CNN.

Sarah Chen
December 19, 2025
0.0k0

Convolutional Neural Networks (CNNs) Explained

Regular neural networks don't understand spatial structure. Feed an image as a flat array and you lose the fact that nearby pixels are related. CNNs solve this.

The Problem with Regular Networks

A 224x224 color image has 224 × 224 × 3 = 150,528 inputs. A fully connected layer with 1000 neurons = 150 million parameters. That's insane and ignores spatial patterns.

How Convolution Works

Instead of connecting every pixel to every neuron, slide a small filter across the image:

``` Image: Filter (3x3): Output: 1 2 3 4 1 0 1 2 3 4 5 * 0 1 0 = Result 3 4 5 6 1 0 1 4 5 6 7 ```

The filter detects patterns (edges, textures, shapes). Multiple filters = multiple pattern detectors.

CNN Architecture

```python import tensorflow as tf from tensorflow.keras import layers, models

model = models.Sequential([ # Convolution + Pooling Block 1 layers.Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)), layers.MaxPooling2D((2, 2)), # Convolution + Pooling Block 2 layers.Conv2D(64, (3, 3), activation='relu'), layers.MaxPooling2D((2, 2)), # Convolution + Pooling Block 3 layers.Conv2D(128, (3, 3), activation='relu'), layers.MaxPooling2D((2, 2)), # Flatten and Dense layers layers.Flatten(), layers.Dense(256, activation='relu'), layers.Dropout(0.5), layers.Dense(10, activation='softmax') ])

model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) ```

Key Layers

**Convolutional Layer:** - Applies filters to detect features - Parameters: number of filters, filter size, stride, padding

**Pooling Layer:** - Reduces spatial dimensions - MaxPooling takes the maximum value in each region - Makes features translation-invariant

**Flatten:** - Converts 2D feature maps to 1D vector

What CNNs Learn

Each layer learns different features:

| Layer | What It Learns | |-------|---------------| | Early layers | Edges, colors, simple textures | | Middle layers | Shapes, patterns, object parts | | Deep layers | Complex objects, faces, scenes |

Practical Example: Image Classification

```python from tensorflow.keras.preprocessing.image import ImageDataGenerator

Data augmentation train_datagen = ImageDataGenerator( rescale=1./255, rotation_range=20, width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True )

train_generator = train_datagen.flow_from_directory( 'data/train', target_size=(224, 224), batch_size=32, class_mode='categorical' )

Train history = model.fit( train_generator, epochs=20, validation_data=val_generator ) ```

Key Takeaway

CNNs use convolutions to efficiently process images by exploiting spatial structure. They learn hierarchical features automatically - edges in early layers, complex objects in deeper layers. Use them for any image-related task. Start with a simple architecture, add data augmentation, and consider transfer learning for faster results.

#Machine Learning#Deep Learning#CNN#Computer Vision#Advanced