AI8 min read
Explainable AI (XAI)
Make AI decisions interpretable and transparent.
Dr. Patricia Moore
December 18, 2025
0.0k0
Understand AI decisions.
Why Explainability Matters
Trust: Users need to understand decisions
Debugging: Find model mistakes
Compliance: Laws require explanations (GDPR, etc.)
Fairness: Detect bias
Types of Explanations
Global: How does model work overall?
Local: Why this specific prediction?
LIME (Local Interpretable Model-agnostic Explanations)
Explain any model:
from lime.lime_tabular import LimeTabularExplainer
import sklearn
# Train black-box model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Create explainer
explainer = LimeTabularExplainer(
X_train.values,
feature_names=X_train.columns,
class_names=['Rejected', 'Approved'],
mode='classification'
)
# Explain single prediction
i = 100
explanation = explainer.explain_instance(
X_test.iloc[i].values,
model.predict_proba,
num_features=5
)
# Show explanation
explanation.show_in_notebook()
# Get feature contributions
explanation.as_list()
# [('age > 30', 0.45), ('income > 50000', 0.32), ...]
SHAP (SHapley Additive exPlanations)
Based on game theory:
import shap
# Train model
model = XGBClassifier()
model.fit(X_train, y_train)
# Create explainer
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
# Visualizations
# 1. Force plot (single prediction)
shap.force_plot(
explainer.expected_value,
shap_values[0],
X_test.iloc[0]
)
# 2. Summary plot (all predictions)
shap.summary_plot(shap_values, X_test)
# 3. Dependence plot (feature effect)
shap.dependence_plot('age', shap_values, X_test)
# 4. Waterfall plot
shap.waterfall_plot(shap.Explanation(
values=shap_values[0],
base_values=explainer.expected_value,
data=X_test.iloc[0]
))
Grad-CAM for Images
Visualize what CNN focuses on:
import tensorflow as tf
from tensorflow.keras.models import Model
def make_gradcam_heatmap(img_array, model, last_conv_layer_name):
# Gradient model
grad_model = Model(
inputs=[model.inputs],
outputs=[model.get_layer(last_conv_layer_name).output, model.output]
)
with tf.GradientTape() as tape:
conv_outputs, predictions = grad_model(img_array)
class_channel = predictions[:, np.argmax(predictions[0])]
# Gradient of class w.r.t. conv layer
grads = tape.gradient(class_channel, conv_outputs)
# Global average pooling
pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
# Weight conv outputs by gradients
conv_outputs = conv_outputs[0]
heatmap = conv_outputs @ pooled_grads[..., tf.newaxis]
heatmap = tf.squeeze(heatmap)
# Normalize
heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
return heatmap.numpy()
# Use it
heatmap = make_gradcam_heatmap(img, model, 'conv5_block3_out')
# Overlay on image
import matplotlib.pyplot as plt
import cv2
plt.imshow(img)
plt.imshow(heatmap, alpha=0.4, cmap='jet')
plt.show()
# Shows: Model focused on cat's face to classify it as "cat"
Attention Visualization
For Transformers:
from transformers import BertTokenizer, BertModel
import torch
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased', output_attentions=True)
text = "The restaurant in Austin was amazing"
inputs = tokenizer(text, return_tensors='pt')
# Get attention weights
outputs = model(**inputs)
attention = outputs.attentions # (layers, heads, tokens, tokens)
# Visualize attention from last layer
import seaborn as sns
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
last_layer_attention = attention[-1][0].mean(dim=0).detach().numpy()
plt.figure(figsize=(10, 10))
sns.heatmap(last_layer_attention, xticklabels=tokens, yticklabels=tokens)
plt.title("Attention Weights")
plt.show()
# Shows: "amazing" attends to "restaurant" and "Austin"
Counterfactual Explanations
"What changes would change the prediction?"
from dice_ml import Data, Model, Dice
# Setup
d = Data(dataframe=df, continuous_features=['age', 'income'], outcome_name='approved')
m = Model(model=model, backend='sklearn')
exp = Dice(d, m)
# Generate counterfactuals
query_instance = df[df['approved'] == 0].iloc[0]
dice_exp = exp.generate_counterfactuals(
query_instance,
total_CFs=3,
desired_class=1
)
dice_exp.visualize_as_dataframe()
# Output:
# "If age was 35 instead of 28, would be approved"
# "If income was $60k instead of $45k, would be approved"
Anchors
High-precision rules:
from anchor import anchor_tabular
# Create explainer
explainer = anchor_tabular.AnchorTabularExplainer(
class_names=['Rejected', 'Approved'],
feature_names=X_train.columns,
train_data=X_train.values
)
# Get anchor (rule that guarantees prediction)
explanation = explainer.explain_instance(
X_test.iloc[0].values,
model.predict,
threshold=0.95
)
print(explanation.anchor)
# "IF age > 30 AND income > 50000 THEN Approved (95% confidence)"
Feature Attribution
from alibi.explainers import IntegratedGradients
# For neural networks
ig = IntegratedGradients(model)
# Get attributions
explanation = ig.explain(X_test[0:1])
# Plot feature importance
attributions = explanation.attributions[0]
plt.barh(feature_names, attributions)
plt.xlabel('Attribution')
plt.title('Feature Importance')
plt.show()
Model Cards
Document model details:
# model_card.md
## Model Details
- **Developed by**: AI Team, Company Name
- **Model date**: December 2025
- **Model type**: XGBoost Classifier
- **Version**: 1.2
## Intended Use
- **Primary use**: Credit approval decisions
- **Primary users**: Loan officers in Denver office
- **Out-of-scope**: Not for medical decisions
## Training Data
- **Dataset**: Customer data from 2020-2024
- **Size**: 100,000 samples
- **Features**: Age, income, credit score, employment
## Performance
- **Accuracy**: 87%
- **Precision**: 85%
- **Recall**: 82%
- **Tested on**: 20,000 holdout samples
## Limitations
- Lower accuracy for customers < 25 years old
- May not generalize to other regions
- Requires annual retraining
## Ethical Considerations
- Regular bias audits performed
- Explanations provided for all decisions
- Human review for edge cases
Tools & Libraries
- LIME: Model-agnostic local explanations
- SHAP: Global and local explanations
- Captum: PyTorch interpretability
- InterpretML: Microsoft's library
- What-If Tool: Interactive visualizations
Best Practices
- Use multiple explanation methods
- Validate explanations with domain experts
- Document limitations clearly
- Provide explanations to end users
- Regular bias audits
Remember
- Explainability builds trust
- Different stakeholders need different explanations
- No single perfect method
- Combine global and local explanations
#AI#Advanced#XAI