Convolutional Neural Networks (CNNs) Explained
Learn how CNNs work for image recognition, the role of convolutions, pooling, and how to build your first CNN.
Convolutional Neural Networks (CNNs) Explained
Regular neural networks don't understand spatial structure. Feed an image as a flat array and you lose the fact that nearby pixels are related. CNNs solve this.
The Problem with Regular Networks
A 224x224 color image has 224 × 224 × 3 = 150,528 inputs. A fully connected layer with 1000 neurons = 150 million parameters. That's insane and ignores spatial patterns.
How Convolution Works
Instead of connecting every pixel to every neuron, slide a small filter across the image:
``` Image: Filter (3x3): Output: 1 2 3 4 1 0 1 2 3 4 5 * 0 1 0 = Result 3 4 5 6 1 0 1 4 5 6 7 ```
The filter detects patterns (edges, textures, shapes). Multiple filters = multiple pattern detectors.
CNN Architecture
```python import tensorflow as tf from tensorflow.keras import layers, models
model = models.Sequential([ # Convolution + Pooling Block 1 layers.Conv2D(32, (3, 3), activation='relu', input_shape=(224, 224, 3)), layers.MaxPooling2D((2, 2)), # Convolution + Pooling Block 2 layers.Conv2D(64, (3, 3), activation='relu'), layers.MaxPooling2D((2, 2)), # Convolution + Pooling Block 3 layers.Conv2D(128, (3, 3), activation='relu'), layers.MaxPooling2D((2, 2)), # Flatten and Dense layers layers.Flatten(), layers.Dense(256, activation='relu'), layers.Dropout(0.5), layers.Dense(10, activation='softmax') ])
model.compile( optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'] ) ```
Key Layers
**Convolutional Layer:** - Applies filters to detect features - Parameters: number of filters, filter size, stride, padding
**Pooling Layer:** - Reduces spatial dimensions - MaxPooling takes the maximum value in each region - Makes features translation-invariant
**Flatten:** - Converts 2D feature maps to 1D vector
What CNNs Learn
Each layer learns different features:
| Layer | What It Learns | |-------|---------------| | Early layers | Edges, colors, simple textures | | Middle layers | Shapes, patterns, object parts | | Deep layers | Complex objects, faces, scenes |
Practical Example: Image Classification
```python from tensorflow.keras.preprocessing.image import ImageDataGenerator
Data augmentation train_datagen = ImageDataGenerator( rescale=1./255, rotation_range=20, width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True )
train_generator = train_datagen.flow_from_directory( 'data/train', target_size=(224, 224), batch_size=32, class_mode='categorical' )
Train history = model.fit( train_generator, epochs=20, validation_data=val_generator ) ```
Key Takeaway
CNNs use convolutions to efficiently process images by exploiting spatial structure. They learn hierarchical features automatically - edges in early layers, complex objects in deeper layers. Use them for any image-related task. Start with a simple architecture, add data augmentation, and consider transfer learning for faster results.