Feature Engineering for Production ML: From Raw Data to Deployable Signals
A practical guide to building production-grade feature engineering pipelines — covering schema validation, rolling features, drift detection, and reproducible experimentation workflows.

Feature engineering is where most ML projects actually win or lose. State-of-the-art models underperform weak models trained on excellent features. Yet most tutorials cover feature engineering as a Jupyter notebook exercise — not as a production engineering challenge.
This post covers the feature pipeline architecture I've built across multiple production ML systems, including the F1 race strategy platform and the TTA Engine at Edza.ai.
The 5 Layers of a Production Feature Pipeline
- Ingestion + Schema Validation — catch bad data before it poisons models
- Cleaning + Normalization — handle nulls, outliers, unit consistency
- Feature Construction — domain-specific transformations
- Feature Store — versioned, shareable, consistent across train and serve
- Drift Monitoring — detect when the real world stops matching training data
Layer 1: Schema Validation Before Anything Else
Every feature pipeline should start with schema validation. Garbage in, garbage out — and garbage is silent without explicit checks:
import pandas as pd
from dataclasses import dataclass
from typing import Dict, Type
@dataclass
class FeatureSchema:
columns: Dict[str, Type]
required: list[str]
value_ranges: Dict[str, tuple]
def validate_schema(df: pd.DataFrame, schema: FeatureSchema) -> pd.DataFrame:
# Check required columns exist
missing = [c for c in schema.required if c not in df.columns]
if missing:
raise ValueError(f"Missing required columns: {missing}")
# Type coercion
for col, dtype in schema.columns.items():
if col in df.columns:
df[col] = df[col].astype(dtype)
# Range validation
for col, (lo, hi) in schema.value_ranges.items():
violations = df[(df[col] < lo) | (df[col] > hi)]
if len(violations) > 0:
print(f"Warning: {len(violations)} rows outside range for {col}")
df = df[(df[col] >= lo) & (df[col] <= hi)]
return df
Layer 3: Domain-Specific Feature Construction
The best features encode domain knowledge. For the F1 performance system, raw lap times are useless — we need engineered signals:
def engineer_f1_features(df: pd.DataFrame) -> pd.DataFrame:
# Tire degradation slope — how fast is lap time increasing per lap on current stint?
df['tire_deg_slope'] = df.groupby(['driver', 'stint'])['lap_time'].transform(
lambda x: pd.Series(np.polyfit(range(len(x)), x, 1)[0], index=x.index)
if len(x) > 2 else 0
)
# Fuel-adjusted pace — remove the ~0.07s/lap fuel effect
FUEL_EFFECT = 0.07 # seconds per lap per kg
df['fuel_adjusted_pace'] = df['lap_time'] - (df['fuel_load'] * FUEL_EFFECT)
# Rolling lap delta — are we getting faster or slower over last 3 laps?
df['rolling_delta_3'] = df.groupby('driver')['lap_time'].transform(
lambda x: x.rolling(3, min_periods=1).mean() - x
)
# Sector consistency score — lower variance = more consistent driver
df['sector_consistency'] = df.groupby(['driver', 'stint'])['sector_1_time'].transform('std')
return df
Layer 5: Feature Drift Detection
Models degrade silently when the feature distribution shifts. Statistical drift detection using the Population Stability Index (PSI) catches this automatically:
import numpy as np
def population_stability_index(expected: np.ndarray,
actual: np.ndarray,
buckets: int = 10) -> float:
"""
PSI < 0.1 → No significant change
PSI 0.1–0.2 → Moderate change, investigate
PSI > 0.2 → Significant shift, retrain
"""
expected_pct = np.histogram(expected, buckets)[0] / len(expected)
actual_pct = np.histogram(actual, buckets)[0] / len(actual)
# Avoid log(0)
expected_pct = np.where(expected_pct == 0, 0.0001, expected_pct)
actual_pct = np.where(actual_pct == 0, 0.0001, actual_pct)
psi = np.sum((actual_pct - expected_pct) * np.log(actual_pct / expected_pct))
return psi
Reproducibility: The Hidden Production Requirement
Features computed during training must be identical during inference. The most common production bug is train/serve skew — a feature is computed differently at serving time.
The solution is a feature store: a versioned registry where features are defined once, computed once, and shared across training and inference. Even a simple Redis-backed feature store eliminates train/serve skew entirely.
Checklist: Production Feature Pipeline
- Schema validation runs before any transformation
- All features have unit tests with known inputs/outputs
- Rolling and lag features handle group boundaries correctly
- PSI monitoring runs daily on production feature distributions
- Feature definitions are version-controlled alongside model code
- Train and serve pipelines share the same feature computation code


