Spot the unusual.

What is Anomaly Detection?

Finding data points that don't fit the pattern.

**Examples**: - Credit card fraud - Manufacturing defects - Network intrusion - Equipment failure

Statistical Method - Z-Score

```python import numpy as np

def detect_anomalies(data, threshold=3): mean = np.mean(data) std = np.std(data) anomalies = [] for i, value in enumerate(data): z_score = (value - mean) / std if abs(z_score) > threshold: anomalies.append(i) return anomalies

Find anomalies sales = [100, 105, 98, 102, 500, 97] # 500 is anomaly anomalies = detect_anomalies(sales) print(f"Anomalies at: {anomalies}") # [4] ```

Isolation Forest

Popular ML method:

```python from sklearn.ensemble import IsolationForest

Train on normal data model = IsolationForest(contamination=0.1) # Expect 10% anomalies model.fit(X_train)

Predict: -1 = anomaly, 1 = normal predictions = model.predict(X_test)

anomalies = X_test[predictions == -1] print(f"Found {len(anomalies)} anomalies") ```

One-Class SVM

```python from sklearn.svm import OneClassSVM

Train only on normal data model = OneClassSVM(nu=0.1) # nu = expected anomaly rate model.fit(X_normal)

Detect anomalies in new data predictions = model.predict(X_test) # -1 = anomaly, 1 = normal ```

Autoencoder Approach

Neural network that reconstructs input:

```python from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Dense

Build autoencoder input_layer = Input(shape=(features,)) encoded = Dense(32, activation='relu')(input_layer) encoded = Dense(16, activation='relu')(encoded) decoded = Dense(32, activation='relu')(encoded) decoded = Dense(features, activation='linear')(decoded)

autoencoder = Model(input_layer, decoded) autoencoder.compile(optimizer='adam', loss='mse')

Train on normal data autoencoder.fit(X_normal, X_normal, epochs=50)

Anomalies = high reconstruction error reconstructed = autoencoder.predict(X_test) errors = np.mean(np.abs(X_test - reconstructed), axis=1)

threshold = np.percentile(errors, 95) anomalies = errors > threshold ```

Local Outlier Factor

Finds points far from neighbors:

```python from sklearn.neighbors import LocalOutlierFactor

lof = LocalOutlierFactor(n_neighbors=20) predictions = lof.fit_predict(X)

-1 = anomaly, 1 = normal ```

Real Example - Credit Card Fraud

```python from sklearn.ensemble import IsolationForest

Transaction features: amount, location, time, etc. model = IsolationForest(contamination=0.01) # 1% fraud model.fit(transactions)

Check new transaction new_transaction = [[250, 2, 1545]] # [amount, location_id, hour] is_fraud = model.predict(new_transaction)

if is_fraud == -1: print("⚠️ Potential fraud detected!") ```

Evaluation

Hard because anomalies are rare!

Use precision, recall, F1-score

Remember

- Need mostly normal data for training - Isolation Forest often best choice - High false positives common - Combine with business rules

Anomaly Detection