AI6 min read

Anomaly Detection

Find unusual patterns in data.

Robert Anderson
December 18, 2025
0.0k0

Spot the unusual.

What is Anomaly Detection?

Finding data points that don't fit the pattern.

Examples:

  • Credit card fraud
  • Manufacturing defects
  • Network intrusion
  • Equipment failure

Statistical Method - Z-Score

import numpy as np

def detect_anomalies(data, threshold=3):
    mean = np.mean(data)
    std = np.std(data)
    
    anomalies = []
    for i, value in enumerate(data):
        z_score = (value - mean) / std
        if abs(z_score) > threshold:
            anomalies.append(i)
    
    return anomalies

# Find anomalies
sales = [100, 105, 98, 102, 500, 97]  # 500 is anomaly
anomalies = detect_anomalies(sales)
print(f"Anomalies at: {anomalies}")  # [4]

Isolation Forest

Popular ML method:

from sklearn.ensemble import IsolationForest

# Train on normal data
model = IsolationForest(contamination=0.1)  # Expect 10% anomalies
model.fit(X_train)

# Predict: -1 = anomaly, 1 = normal
predictions = model.predict(X_test)

anomalies = X_test[predictions == -1]
print(f"Found {len(anomalies)} anomalies")

One-Class SVM

from sklearn.svm import OneClassSVM

# Train only on normal data
model = OneClassSVM(nu=0.1)  # nu = expected anomaly rate
model.fit(X_normal)

# Detect anomalies in new data
predictions = model.predict(X_test)
# -1 = anomaly, 1 = normal

Autoencoder Approach

Neural network that reconstructs input:

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense

# Build autoencoder
input_layer = Input(shape=(features,))
encoded = Dense(32, activation='relu')(input_layer)
encoded = Dense(16, activation='relu')(encoded)
decoded = Dense(32, activation='relu')(encoded)
decoded = Dense(features, activation='linear')(decoded)

autoencoder = Model(input_layer, decoded)
autoencoder.compile(optimizer='adam', loss='mse')

# Train on normal data
autoencoder.fit(X_normal, X_normal, epochs=50)

# Anomalies = high reconstruction error
reconstructed = autoencoder.predict(X_test)
errors = np.mean(np.abs(X_test - reconstructed), axis=1)

threshold = np.percentile(errors, 95)
anomalies = errors > threshold

Local Outlier Factor

Finds points far from neighbors:

from sklearn.neighbors import LocalOutlierFactor

lof = LocalOutlierFactor(n_neighbors=20)
predictions = lof.fit_predict(X)

# -1 = anomaly, 1 = normal

Real Example - Credit Card Fraud

from sklearn.ensemble import IsolationForest

# Transaction features: amount, location, time, etc.
model = IsolationForest(contamination=0.01)  # 1% fraud
model.fit(transactions)

# Check new transaction
new_transaction = [[250, 2, 1545]]  # [amount, location_id, hour]
is_fraud = model.predict(new_transaction)

if is_fraud == -1:
    print("⚠️ Potential fraud detected!")

Evaluation

Hard because anomalies are rare!

Use precision, recall, F1-score

Remember

  • Need mostly normal data for training
  • Isolation Forest often best choice
  • High false positives common
  • Combine with business rules
#AI#Intermediate#Anomaly