AI8 min read

Explainable AI (XAI)

Make AI decisions interpretable and transparent.

Dr. Patricia Moore
December 18, 2025
0.0k0

Understand AI decisions.

Why Explainability Matters

Trust: Users need to understand decisions
Debugging: Find model mistakes
Compliance: Laws require explanations (GDPR, etc.)
Fairness: Detect bias

Types of Explanations

Global: How does model work overall?
Local: Why this specific prediction?

LIME (Local Interpretable Model-agnostic Explanations)

Explain any model:

from lime.lime_tabular import LimeTabularExplainer
import sklearn

# Train black-box model
model = RandomForestClassifier()
model.fit(X_train, y_train)

# Create explainer
explainer = LimeTabularExplainer(
    X_train.values,
    feature_names=X_train.columns,
    class_names=['Rejected', 'Approved'],
    mode='classification'
)

# Explain single prediction
i = 100
explanation = explainer.explain_instance(
    X_test.iloc[i].values,
    model.predict_proba,
    num_features=5
)

# Show explanation
explanation.show_in_notebook()

# Get feature contributions
explanation.as_list()
# [('age > 30', 0.45), ('income > 50000', 0.32), ...]

SHAP (SHapley Additive exPlanations)

Based on game theory:

import shap

# Train model
model = XGBClassifier()
model.fit(X_train, y_train)

# Create explainer
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# Visualizations

# 1. Force plot (single prediction)
shap.force_plot(
    explainer.expected_value,
    shap_values[0],
    X_test.iloc[0]
)

# 2. Summary plot (all predictions)
shap.summary_plot(shap_values, X_test)

# 3. Dependence plot (feature effect)
shap.dependence_plot('age', shap_values, X_test)

# 4. Waterfall plot
shap.waterfall_plot(shap.Explanation(
    values=shap_values[0],
    base_values=explainer.expected_value,
    data=X_test.iloc[0]
))

Grad-CAM for Images

Visualize what CNN focuses on:

import tensorflow as tf
from tensorflow.keras.models import Model

def make_gradcam_heatmap(img_array, model, last_conv_layer_name):
    # Gradient model
    grad_model = Model(
        inputs=[model.inputs],
        outputs=[model.get_layer(last_conv_layer_name).output, model.output]
    )
    
    with tf.GradientTape() as tape:
        conv_outputs, predictions = grad_model(img_array)
        class_channel = predictions[:, np.argmax(predictions[0])]
    
    # Gradient of class w.r.t. conv layer
    grads = tape.gradient(class_channel, conv_outputs)
    
    # Global average pooling
    pooled_grads = tf.reduce_mean(grads, axis=(0, 1, 2))
    
    # Weight conv outputs by gradients
    conv_outputs = conv_outputs[0]
    heatmap = conv_outputs @ pooled_grads[..., tf.newaxis]
    heatmap = tf.squeeze(heatmap)
    
    # Normalize
    heatmap = tf.maximum(heatmap, 0) / tf.math.reduce_max(heatmap)
    return heatmap.numpy()

# Use it
heatmap = make_gradcam_heatmap(img, model, 'conv5_block3_out')

# Overlay on image
import matplotlib.pyplot as plt
import cv2

plt.imshow(img)
plt.imshow(heatmap, alpha=0.4, cmap='jet')
plt.show()

# Shows: Model focused on cat's face to classify it as "cat"

Attention Visualization

For Transformers:

from transformers import BertTokenizer, BertModel
import torch

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertModel.from_pretrained('bert-base-uncased', output_attentions=True)

text = "The restaurant in Austin was amazing"
inputs = tokenizer(text, return_tensors='pt')

# Get attention weights
outputs = model(**inputs)
attention = outputs.attentions  # (layers, heads, tokens, tokens)

# Visualize attention from last layer
import seaborn as sns

tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
last_layer_attention = attention[-1][0].mean(dim=0).detach().numpy()

plt.figure(figsize=(10, 10))
sns.heatmap(last_layer_attention, xticklabels=tokens, yticklabels=tokens)
plt.title("Attention Weights")
plt.show()

# Shows: "amazing" attends to "restaurant" and "Austin"

Counterfactual Explanations

"What changes would change the prediction?"

from dice_ml import Data, Model, Dice

# Setup
d = Data(dataframe=df, continuous_features=['age', 'income'], outcome_name='approved')
m = Model(model=model, backend='sklearn')
exp = Dice(d, m)

# Generate counterfactuals
query_instance = df[df['approved'] == 0].iloc[0]
dice_exp = exp.generate_counterfactuals(
    query_instance,
    total_CFs=3,
    desired_class=1
)

dice_exp.visualize_as_dataframe()

# Output:
# "If age was 35 instead of 28, would be approved"
# "If income was $60k instead of $45k, would be approved"

Anchors

High-precision rules:

from anchor import anchor_tabular

# Create explainer
explainer = anchor_tabular.AnchorTabularExplainer(
    class_names=['Rejected', 'Approved'],
    feature_names=X_train.columns,
    train_data=X_train.values
)

# Get anchor (rule that guarantees prediction)
explanation = explainer.explain_instance(
    X_test.iloc[0].values,
    model.predict,
    threshold=0.95
)

print(explanation.anchor)
# "IF age > 30 AND income > 50000 THEN Approved (95% confidence)"

Feature Attribution

from alibi.explainers import IntegratedGradients

# For neural networks
ig = IntegratedGradients(model)

# Get attributions
explanation = ig.explain(X_test[0:1])

# Plot feature importance
attributions = explanation.attributions[0]
plt.barh(feature_names, attributions)
plt.xlabel('Attribution')
plt.title('Feature Importance')
plt.show()

Model Cards

Document model details:

# model_card.md

## Model Details
- **Developed by**: AI Team, Company Name
- **Model date**: December 2025
- **Model type**: XGBoost Classifier
- **Version**: 1.2

## Intended Use
- **Primary use**: Credit approval decisions
- **Primary users**: Loan officers in Denver office
- **Out-of-scope**: Not for medical decisions

## Training Data
- **Dataset**: Customer data from 2020-2024
- **Size**: 100,000 samples
- **Features**: Age, income, credit score, employment

## Performance
- **Accuracy**: 87%
- **Precision**: 85%
- **Recall**: 82%
- **Tested on**: 20,000 holdout samples

## Limitations
- Lower accuracy for customers < 25 years old
- May not generalize to other regions
- Requires annual retraining

## Ethical Considerations
- Regular bias audits performed
- Explanations provided for all decisions
- Human review for edge cases

Tools & Libraries

  • LIME: Model-agnostic local explanations
  • SHAP: Global and local explanations
  • Captum: PyTorch interpretability
  • InterpretML: Microsoft's library
  • What-If Tool: Interactive visualizations

Best Practices

  1. Use multiple explanation methods
  2. Validate explanations with domain experts
  3. Document limitations clearly
  4. Provide explanations to end users
  5. Regular bias audits

Remember

  • Explainability builds trust
  • Different stakeholders need different explanations
  • No single perfect method
  • Combine global and local explanations
#AI#Advanced#XAI