XGBoost + SHAP: Building Explainable ML Models That Work in Production
How to train XGBoost models that achieve 90%+ accuracy and explain their predictions using SHAP — with a complete pipeline from feature engineering to production API deployment.

XGBoost remains one of the most reliable models for tabular data — it's fast, handles missing values gracefully, and consistently outperforms deep learning on structured datasets. Combined with SHAP, it becomes truly production-ready: you get accuracy and explainability.
This post walks through the complete pipeline from the F1 2025 Performance Analytics system, which achieved 92.4% R² for lap-time prediction.
Why XGBoost Still Wins on Tabular Data
Despite the deep learning boom, XGBoost and its variants (LightGBM, CatBoost) consistently win Kaggle tabular competitions. The reasons are practical:
- Handles missing values natively — no imputation needed
- Built-in L1/L2 regularization prevents overfitting
- Fast training on CPU — no GPU required for most tabular datasets
- Excellent calibration with
eval_metric='logloss' - Works well with heterogeneous features (mix of continuous, categorical, binary)
Training: The Full Pipeline
import xgboost as xgb
from sklearn.model_selection import cross_val_score, KFold
from sklearn.preprocessing import LabelEncoder
import pandas as pd
import numpy as np
def train_xgboost_pipeline(X_train: pd.DataFrame,
y_train: pd.Series,
X_val: pd.DataFrame,
y_val: pd.Series) -> xgb.XGBRegressor:
model = xgb.XGBRegressor(
n_estimators=500,
max_depth=6,
learning_rate=0.05,
subsample=0.8,
colsample_bytree=0.8,
reg_alpha=0.1, # L1 regularization
reg_lambda=1.0, # L2 regularization
min_child_weight=3,
early_stopping_rounds=50,
eval_metric="rmse",
random_state=42,
n_jobs=-1
)
model.fit(
X_train, y_train,
eval_set=[(X_val, y_val)],
verbose=100
)
# Cross-validation for robust evaluation
cv_scores = cross_val_score(
model, X_train, y_train,
cv=KFold(n_splits=5, shuffle=True, random_state=42),
scoring="r2"
)
print(f"CV R² scores: {cv_scores}")
print(f"Mean R²: {cv_scores.mean():.4f} ± {cv_scores.std():.4f}")
return model
SHAP: Why the Model Predicts What It Predicts
SHAP (SHapley Additive exPlanations) assigns each feature a contribution value for each individual prediction. It answers: "for this specific prediction, how much did each feature push the output up or down?"
import shap
import matplotlib.pyplot as plt
def explain_model(model: xgb.XGBRegressor,
X: pd.DataFrame) -> shap.Explainer:
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X)
# Summary plot — global feature importance
shap.summary_plot(shap_values, X, show=False)
plt.savefig("shap_summary.png", dpi=150, bbox_inches="tight")
plt.close()
# Mean absolute SHAP values — ranked importance
importance_df = pd.DataFrame({
"feature": X.columns,
"mean_abs_shap": np.abs(shap_values).mean(axis=0)
}).sort_values("mean_abs_shap", ascending=False)
return explainer, importance_df
def explain_single_prediction(explainer, X_row: pd.DataFrame):
"""Explain why the model made a specific prediction."""
shap_values = explainer.shap_values(X_row)
shap.waterfall_plot(
shap.Explanation(
values=shap_values[0],
base_values=explainer.expected_value,
data=X_row.iloc[0],
feature_names=X_row.columns.tolist()
)
)
What SHAP Revealed About F1 Lap Times
The SHAP analysis of the F1 model validated domain expertise and revealed surprises:
- Tire age (top feature) — contributes ~0.4s per lap after lap 15
- Fuel load (second) — contributes ~0.07s per lap per kg
- Track temperature (third) — nonlinear, peaks at 38°C
- Sector 2 consistency (surprise) — high variance in S2 predicts overall slower pace better than any single sector time
Serving SHAP Explanations via API
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class PredictRequest(BaseModel):
features: dict
@app.post("/predict-explain")
async def predict_with_explanation(req: PredictRequest):
X = pd.DataFrame([req.features])
loop = asyncio.get_event_loop()
prediction = await loop.run_in_executor(None, model.predict, X)
shap_vals = await loop.run_in_executor(None, explainer.shap_values, X)
# Top 3 contributing features for this prediction
feature_contributions = dict(zip(
X.columns,
shap_vals[0].tolist()
))
top_factors = sorted(
feature_contributions.items(),
key=lambda x: abs(x[1]),
reverse=True
)[:3]
return {
"prediction": float(prediction[0]),
"explanation": {
"base_value": float(explainer.expected_value),
"top_factors": [{"feature": k, "impact": v} for k, v in top_factors]
}
}


