Anomaly Detection
Find unusual patterns in data.
Spot the unusual.
What is Anomaly Detection?
Finding data points that don't fit the pattern.
**Examples**: - Credit card fraud - Manufacturing defects - Network intrusion - Equipment failure
Statistical Method - Z-Score
```python import numpy as np
def detect_anomalies(data, threshold=3): mean = np.mean(data) std = np.std(data) anomalies = [] for i, value in enumerate(data): z_score = (value - mean) / std if abs(z_score) > threshold: anomalies.append(i) return anomalies
Find anomalies sales = [100, 105, 98, 102, 500, 97] # 500 is anomaly anomalies = detect_anomalies(sales) print(f"Anomalies at: {anomalies}") # [4] ```
Isolation Forest
Popular ML method:
```python from sklearn.ensemble import IsolationForest
Train on normal data model = IsolationForest(contamination=0.1) # Expect 10% anomalies model.fit(X_train)
Predict: -1 = anomaly, 1 = normal predictions = model.predict(X_test)
anomalies = X_test[predictions == -1] print(f"Found {len(anomalies)} anomalies") ```
One-Class SVM
```python from sklearn.svm import OneClassSVM
Train only on normal data model = OneClassSVM(nu=0.1) # nu = expected anomaly rate model.fit(X_normal)
Detect anomalies in new data predictions = model.predict(X_test) # -1 = anomaly, 1 = normal ```
Autoencoder Approach
Neural network that reconstructs input:
```python from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense
Build autoencoder input_layer = Input(shape=(features,)) encoded = Dense(32, activation='relu')(input_layer) encoded = Dense(16, activation='relu')(encoded) decoded = Dense(32, activation='relu')(encoded) decoded = Dense(features, activation='linear')(decoded)
autoencoder = Model(input_layer, decoded) autoencoder.compile(optimizer='adam', loss='mse')
Train on normal data autoencoder.fit(X_normal, X_normal, epochs=50)
Anomalies = high reconstruction error reconstructed = autoencoder.predict(X_test) errors = np.mean(np.abs(X_test - reconstructed), axis=1)
threshold = np.percentile(errors, 95) anomalies = errors > threshold ```
Local Outlier Factor
Finds points far from neighbors:
```python from sklearn.neighbors import LocalOutlierFactor
lof = LocalOutlierFactor(n_neighbors=20) predictions = lof.fit_predict(X)
-1 = anomaly, 1 = normal ```
Real Example - Credit Card Fraud
```python from sklearn.ensemble import IsolationForest
Transaction features: amount, location, time, etc. model = IsolationForest(contamination=0.01) # 1% fraud model.fit(transactions)
Check new transaction new_transaction = [[250, 2, 1545]] # [amount, location_id, hour] is_fraud = model.predict(new_transaction)
if is_fraud == -1: print("⚠️ Potential fraud detected!") ```
Evaluation
Hard because anomalies are rare!
Use precision, recall, F1-score
Remember
- Need mostly normal data for training - Isolation Forest often best choice - High false positives common - Combine with business rules