AI6 min read
Anomaly Detection
Find unusual patterns in data.
Robert Anderson
December 18, 2025
0.0k0
Spot the unusual.
What is Anomaly Detection?
Finding data points that don't fit the pattern.
Examples:
- Credit card fraud
- Manufacturing defects
- Network intrusion
- Equipment failure
Statistical Method - Z-Score
import numpy as np
def detect_anomalies(data, threshold=3):
mean = np.mean(data)
std = np.std(data)
anomalies = []
for i, value in enumerate(data):
z_score = (value - mean) / std
if abs(z_score) > threshold:
anomalies.append(i)
return anomalies
# Find anomalies
sales = [100, 105, 98, 102, 500, 97] # 500 is anomaly
anomalies = detect_anomalies(sales)
print(f"Anomalies at: {anomalies}") # [4]
Isolation Forest
Popular ML method:
from sklearn.ensemble import IsolationForest
# Train on normal data
model = IsolationForest(contamination=0.1) # Expect 10% anomalies
model.fit(X_train)
# Predict: -1 = anomaly, 1 = normal
predictions = model.predict(X_test)
anomalies = X_test[predictions == -1]
print(f"Found {len(anomalies)} anomalies")
One-Class SVM
from sklearn.svm import OneClassSVM
# Train only on normal data
model = OneClassSVM(nu=0.1) # nu = expected anomaly rate
model.fit(X_normal)
# Detect anomalies in new data
predictions = model.predict(X_test)
# -1 = anomaly, 1 = normal
Autoencoder Approach
Neural network that reconstructs input:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
# Build autoencoder
input_layer = Input(shape=(features,))
encoded = Dense(32, activation='relu')(input_layer)
encoded = Dense(16, activation='relu')(encoded)
decoded = Dense(32, activation='relu')(encoded)
decoded = Dense(features, activation='linear')(decoded)
autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adam', loss='mse')
# Train on normal data
autoencoder.fit(X_normal, X_normal, epochs=50)
# Anomalies = high reconstruction error
reconstructed = autoencoder.predict(X_test)
errors = np.mean(np.abs(X_test - reconstructed), axis=1)
threshold = np.percentile(errors, 95)
anomalies = errors > threshold
Local Outlier Factor
Finds points far from neighbors:
from sklearn.neighbors import LocalOutlierFactor
lof = LocalOutlierFactor(n_neighbors=20)
predictions = lof.fit_predict(X)
# -1 = anomaly, 1 = normal
Real Example - Credit Card Fraud
from sklearn.ensemble import IsolationForest
# Transaction features: amount, location, time, etc.
model = IsolationForest(contamination=0.01) # 1% fraud
model.fit(transactions)
# Check new transaction
new_transaction = [[250, 2, 1545]] # [amount, location_id, hour]
is_fraud = model.predict(new_transaction)
if is_fraud == -1:
print("⚠️ Potential fraud detected!")
Evaluation
Hard because anomalies are rare!
Use precision, recall, F1-score
Remember
- Need mostly normal data for training
- Isolation Forest often best choice
- High false positives common
- Combine with business rules
#AI#Intermediate#Anomaly