Autoencoders: Learning Compressed Representations
Learn how autoencoders compress data into latent representations and their applications in denoising and generation.
Autoencoders: Learning Compressed Representations
Autoencoders learn to compress data into a smaller representation, then reconstruct it. This "bottleneck" forces them to learn the most important features.
The Architecture
``` Input ──> [Encoder] ──> Latent Code ──> [Decoder] ──> Reconstruction (784) (256→64) (32) (64→256) (784) ```
The encoder compresses, the decoder reconstructs. The latent code captures the essence.
Simple Autoencoder
```python from tensorflow.keras.layers import Input, Dense from tensorflow.keras.models import Model
Encoder input_layer = Input(shape=(784,)) encoded = Dense(256, activation='relu')(input_layer) encoded = Dense(64, activation='relu')(encoded) latent = Dense(32, activation='relu')(encoded) # Bottleneck
Decoder decoded = Dense(64, activation='relu')(latent) decoded = Dense(256, activation='relu')(decoded) output = Dense(784, activation='sigmoid')(decoded)
Full autoencoder autoencoder = Model(input_layer, output)
Just the encoder encoder = Model(input_layer, latent)
autoencoder.compile(optimizer='adam', loss='mse') autoencoder.fit(X_train, X_train, epochs=50, batch_size=256) # Input = Target ```
Convolutional Autoencoder
For images, use convolutions:
```python from tensorflow.keras.layers import Conv2D, MaxPooling2D, UpSampling2D
Encoder input_img = Input(shape=(28, 28, 1)) x = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img) x = MaxPooling2D((2, 2), padding='same')(x) x = Conv2D(16, (3, 3), activation='relu', padding='same')(x) encoded = MaxPooling2D((2, 2), padding='same')(x)
Decoder x = Conv2D(16, (3, 3), activation='relu', padding='same')(encoded) x = UpSampling2D((2, 2))(x) x = Conv2D(32, (3, 3), activation='relu', padding='same')(x) x = UpSampling2D((2, 2))(x) decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
autoencoder = Model(input_img, decoded) autoencoder.compile(optimizer='adam', loss='binary_crossentropy') ```
Denoising Autoencoder
Train on noisy input, reconstruct clean output:
```python import numpy as np
Add noise to training data noise_factor = 0.3 X_train_noisy = X_train + noise_factor * np.random.normal(size=X_train.shape) X_train_noisy = np.clip(X_train_noisy, 0., 1.)
Train to reconstruct clean images from noisy ones autoencoder.fit( X_train_noisy, X_train, # Noisy input, clean target epochs=50, batch_size=256, validation_data=(X_test_noisy, X_test) ) ```
Variational Autoencoder (VAE)
VAE learns a probability distribution in latent space, enabling generation:
```python from tensorflow.keras.layers import Lambda import tensorflow.keras.backend as K
Sampling function def sampling(args): z_mean, z_log_var = args epsilon = K.random_normal(shape=(K.shape(z_mean)[0], latent_dim)) return z_mean + K.exp(0.5 * z_log_var) * epsilon
Encoder outputs mean and variance z_mean = Dense(latent_dim)(encoded) z_log_var = Dense(latent_dim)(encoded)
Sample from distribution z = Lambda(sampling)([z_mean, z_log_var])
VAE loss = reconstruction loss + KL divergence reconstruction_loss = K.mean(K.square(input_layer - output)) kl_loss = -0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var)) vae_loss = reconstruction_loss + kl_loss ```
Applications
| Application | How It Works | |-------------|--------------| | Dimensionality Reduction | Use encoder output as features | | Denoising | Train on noisy data | | Anomaly Detection | High reconstruction error = anomaly | | Generation (VAE) | Sample from latent space | | Image Compression | Encode → store → decode |
Anomaly Detection Example
```python # Train on normal data only autoencoder.fit(normal_data, normal_data, epochs=50)
Calculate reconstruction error reconstructions = autoencoder.predict(test_data) errors = np.mean(np.square(test_data - reconstructions), axis=1)
Anomalies have high error threshold = np.percentile(errors, 95) anomalies = errors > threshold ```
Key Takeaway
Autoencoders learn compressed representations by reconstructing their input through a bottleneck. They're great for dimensionality reduction, denoising, and anomaly detection. VAEs add probabilistic sampling for generation. Use them when you need to learn meaningful features in an unsupervised way.