Understand AI decisions.

Why Explainability Matters

**Trust**: Users need to understand decisions **Debugging**: Find model mistakes **Compliance**: Laws require explanations (GDPR, etc.) **Fairness**: Detect bias

Types of Explanations

**Global**: How does model work overall? **Local**: Why this specific prediction?

LIME (Local Interpretable Model-agnostic Explanations)

Explain any model:

```python from lime.lime_tabular import LimeTabularExplainer import sklearn

Train black-box model model = RandomForestClassifier() model.fit(X_train, y_train)

Create explainer explainer = LimeTabularExplainer( X_train.values, feature_names=X_train.columns, class_names=['Rejected', 'Approved'], mode='classification' )

Explain single prediction i = 100 explanation = explainer.explain_instance( X_test.iloc[i].values, model.predict_proba, num_features=5 )

Show explanation explanation.show_in_notebook()

Get feature contributions explanation.as_list() # [('age > 30', 0.45), ('income > 50000', 0.32), ...] ```

SHAP (SHapley Additive exPlanations)

Based on game theory:

```python import shap

Train model model = XGBClassifier() model.fit(X_train, y_train)

Create explainer explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X_test)

Visualizations

1. Force plot (single prediction) shap.force_plot( explainer.expected_value, shap_values[0], X_test.iloc[0] )

2. Summary plot (all predictions) shap.summary_plot(shap_values, X_test)

3. Dependence plot (feature effect) shap.dependence_plot('age', shap_values, X_test)

4. Waterfall plot shap.waterfall_plot(shap.Explanation( values=shap_values[0], base_values=explainer.expected_value, data=X_test.iloc[0] )) ```

Grad-CAM for Images

Visualize what CNN focuses on:

```python import tensorflow as tf from tensorflow.keras.models import Model

def make_gradcam_heatmap(img_array, model, last_conv_layer_name): # Gradient model grad_model = Model( inputs=[model.inputs], outputs=[model.get_layer(last_conv_layer_name).output, model.output] ) with tf.GradientTape() as tape: conv_outputs, predictions = grad_model(img_array) class_channel = predictions[:, np.argmax(predictions[0])] # Gradient of class w.r.t. conv layer grads = tape.gradient(class_channel, conv_outputs) # Global average pooling pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2)) # Weight conv outputs by gradients conv_outputs = conv_outputs[0] heatmap = conv_outputs @ pooled_grads[..., tf.newaxis] heatmap = tf.squeeze(heatmap) # Normalize heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap) return heatmap.numpy()

Use it heatmap = make_gradcam_heatmap(img, model, 'conv5_block3_out')

Overlay on image import matplotlib.pyplot as plt import cv2

plt.imshow(img) plt.imshow(heatmap, alpha=0.4, cmap='jet') plt.show()

Shows: Model focused on cat's face to classify it as "cat" ```

Attention Visualization

For Transformers:

```python from transformers import BertTokenizer, BertModel import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained('bert-base-uncased', output_attentions=True)

text = "The restaurant in Austin was amazing" inputs = tokenizer(text, return_tensors='pt')

Get attention weights outputs = model(**inputs) attention = outputs.attentions # (layers, heads, tokens, tokens)

Visualize attention from last layer import seaborn as sns

tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0]) last_layer_attention = attention[-1][0].mean(dim=0).detach().numpy()

plt.figure(figsize=(10, 10)) sns.heatmap(last_layer_attention, xticklabels=tokens, yticklabels=tokens) plt.title("Attention Weights") plt.show()

Shows: "amazing" attends to "restaurant" and "Austin" ```

Counterfactual Explanations

"What changes would change the prediction?"

```python from dice_ml import Data, Model, Dice

Setup d = Data(dataframe=df, continuous_features=['age', 'income'], outcome_name='approved') m = Model(model=model, backend='sklearn') exp = Dice(d, m)

Generate counterfactuals query_instance = df[df['approved'] == 0].iloc[0] dice_exp = exp.generate_counterfactuals( query_instance, total_CFs=3, desired_class=1 )

dice_exp.visualize_as_dataframe()

Output: # "If age was 35 instead of 28, would be approved" # "If income was $60k instead of $45k, would be approved" ```

Anchors

High-precision rules:

```python from anchor import anchor_tabular

Create explainer explainer = anchor_tabular.AnchorTabularExplainer( class_names=['Rejected', 'Approved'], feature_names=X_train.columns, train_data=X_train.values )

Get anchor (rule that guarantees prediction) explanation = explainer.explain_instance( X_test.iloc[0].values, model.predict, threshold=0.95 )

print(explanation.anchor) # "IF age > 30 AND income > 50000 THEN Approved (95% confidence)" ```

Feature Attribution

```python from alibi.explainers import IntegratedGradients

For neural networks ig = IntegratedGradients(model)

Get attributions explanation = ig.explain(X_test[0:1])

Plot feature importance attributions = explanation.attributions[0] plt.barh(feature_names, attributions) plt.xlabel('Attribution') plt.title('Feature Importance') plt.show() ```

Model Cards

Document model details:

```python # model_card.md

Model Details - Developed by: AI Team, Company Name - Model date: December 2025 - Model type: XGBoost Classifier - Version: 1.2

Intended Use - Primary use: Credit approval decisions - Primary users: Loan officers in Denver office - Out-of-scope: Not for medical decisions

Training Data - Dataset: Customer data from 2020-2024 - Size: 100,000 samples - Features: Age, income, credit score, employment

Performance - Accuracy: 87% - Precision: 85% - Recall: 82% - Tested on: 20,000 holdout samples

Limitations - Lower accuracy for customers < 25 years old - May not generalize to other regions - Requires annual retraining

Ethical Considerations - Regular bias audits performed - Explanations provided for all decisions - Human review for edge cases ```

Tools & Libraries

- **LIME**: Model-agnostic local explanations - **SHAP**: Global and local explanations - **Captum**: PyTorch interpretability - **InterpretML**: Microsoft's library - **What-If Tool**: Interactive visualizations

Best Practices

1. Use multiple explanation methods 2. Validate explanations with domain experts 3. Document limitations clearly 4. Provide explanations to end users 5. Regular bias audits

Remember

- Explainability builds trust - Different stakeholders need different explanations - No single perfect method - Combine global and local explanations

Explainable AI (XAI)