Explainable AI (XAI)
Make AI decisions interpretable and transparent.
Understand AI decisions.
Why Explainability Matters
**Trust**: Users need to understand decisions **Debugging**: Find model mistakes **Compliance**: Laws require explanations (GDPR, etc.) **Fairness**: Detect bias
Types of Explanations
**Global**: How does model work overall? **Local**: Why this specific prediction?
LIME (Local Interpretable Model-agnostic Explanations)
Explain any model:
```python from lime.lime_tabular import LimeTabularExplainer import sklearn
Train black-box model model = RandomForestClassifier() model.fit(X_train, y_train)
Create explainer explainer = LimeTabularExplainer( X_train.values, feature_names=X_train.columns, class_names=['Rejected', 'Approved'], mode='classification' )
Explain single prediction i = 100 explanation = explainer.explain_instance( X_test.iloc[i].values, model.predict_proba, num_features=5 )
Show explanation explanation.show_in_notebook()
Get feature contributions explanation.as_list() # [('age > 30', 0.45), ('income > 50000', 0.32), ...] ```
SHAP (SHapley Additive exPlanations)
Based on game theory:
```python import shap
Train model model = XGBClassifier() model.fit(X_train, y_train)
Create explainer explainer = shap.TreeExplainer(model) shap_values = explainer.shap_values(X_test)
Visualizations
1. Force plot (single prediction) shap.force_plot( explainer.expected_value, shap_values[0], X_test.iloc[0] )
2. Summary plot (all predictions) shap.summary_plot(shap_values, X_test)
3. Dependence plot (feature effect) shap.dependence_plot('age', shap_values, X_test)
4. Waterfall plot shap.waterfall_plot(shap.Explanation( values=shap_values[0], base_values=explainer.expected_value, data=X_test.iloc[0] )) ```
Grad-CAM for Images
Visualize what CNN focuses on:
```python import tensorflow as tf from tensorflow.keras.models import Model
def make_gradcam_heatmap(img_array, model, last_conv_layer_name): # Gradient model grad_model = Model( inputs=[model.inputs], outputs=[model.get_layer(last_conv_layer_name).output, model.output] ) with tf.GradientTape() as tape: conv_outputs, predictions = grad_model(img_array) class_channel = predictions[:, np.argmax(predictions[0])] # Gradient of class w.r.t. conv layer grads = tape.gradient(class_channel, conv_outputs) # Global average pooling pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2)) # Weight conv outputs by gradients conv_outputs = conv_outputs[0] heatmap = conv_outputs @ pooled_grads[..., tf.newaxis] heatmap = tf.squeeze(heatmap) # Normalize heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap) return heatmap.numpy()
Use it heatmap = make_gradcam_heatmap(img, model, 'conv5_block3_out')
Overlay on image import matplotlib.pyplot as plt import cv2
plt.imshow(img) plt.imshow(heatmap, alpha=0.4, cmap='jet') plt.show()
Shows: Model focused on cat's face to classify it as "cat" ```
Attention Visualization
For Transformers:
```python from transformers import BertTokenizer, BertModel import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') model = BertModel.from_pretrained('bert-base-uncased', output_attentions=True)
text = "The restaurant in Austin was amazing" inputs = tokenizer(text, return_tensors='pt')
Get attention weights outputs = model(**inputs) attention = outputs.attentions # (layers, heads, tokens, tokens)
Visualize attention from last layer import seaborn as sns
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0]) last_layer_attention = attention[-1][0].mean(dim=0).detach().numpy()
plt.figure(figsize=(10, 10)) sns.heatmap(last_layer_attention, xticklabels=tokens, yticklabels=tokens) plt.title("Attention Weights") plt.show()
Shows: "amazing" attends to "restaurant" and "Austin" ```
Counterfactual Explanations
"What changes would change the prediction?"
```python from dice_ml import Data, Model, Dice
Setup d = Data(dataframe=df, continuous_features=['age', 'income'], outcome_name='approved') m = Model(model=model, backend='sklearn') exp = Dice(d, m)
Generate counterfactuals query_instance = df[df['approved'] == 0].iloc[0] dice_exp = exp.generate_counterfactuals( query_instance, total_CFs=3, desired_class=1 )
dice_exp.visualize_as_dataframe()
Output: # "If age was 35 instead of 28, would be approved" # "If income was $60k instead of $45k, would be approved" ```
Anchors
High-precision rules:
```python from anchor import anchor_tabular
Create explainer explainer = anchor_tabular.AnchorTabularExplainer( class_names=['Rejected', 'Approved'], feature_names=X_train.columns, train_data=X_train.values )
Get anchor (rule that guarantees prediction) explanation = explainer.explain_instance( X_test.iloc[0].values, model.predict, threshold=0.95 )
print(explanation.anchor) # "IF age > 30 AND income > 50000 THEN Approved (95% confidence)" ```
Feature Attribution
```python from alibi.explainers import IntegratedGradients
For neural networks ig = IntegratedGradients(model)
Get attributions explanation = ig.explain(X_test[0:1])
Plot feature importance attributions = explanation.attributions[0] plt.barh(feature_names, attributions) plt.xlabel('Attribution') plt.title('Feature Importance') plt.show() ```
Model Cards
Document model details:
```python # model_card.md
Model Details - **Developed by**: AI Team, Company Name - **Model date**: December 2025 - **Model type**: XGBoost Classifier - **Version**: 1.2
Intended Use - **Primary use**: Credit approval decisions - **Primary users**: Loan officers in Denver office - **Out-of-scope**: Not for medical decisions
Training Data - **Dataset**: Customer data from 2020-2024 - **Size**: 100,000 samples - **Features**: Age, income, credit score, employment
Performance - **Accuracy**: 87% - **Precision**: 85% - **Recall**: 82% - **Tested on**: 20,000 holdout samples
Limitations - Lower accuracy for customers < 25 years old - May not generalize to other regions - Requires annual retraining
Ethical Considerations - Regular bias audits performed - Explanations provided for all decisions - Human review for edge cases ```
Tools & Libraries
- **LIME**: Model-agnostic local explanations - **SHAP**: Global and local explanations - **Captum**: PyTorch interpretability - **InterpretML**: Microsoft's library - **What-If Tool**: Interactive visualizations
Best Practices
1. Use multiple explanation methods 2. Validate explanations with domain experts 3. Document limitations clearly 4. Provide explanations to end users 5. Regular bias audits
Remember
- Explainability builds trust - Different stakeholders need different explanations - No single perfect method - Combine global and local explanations