Machine Learning·Feature EngineeringMLOpsData Pipelines

Feature Engineering for Production ML: From Raw Data to Deployable Signals

A practical guide to building production-grade feature engineering pipelines — covering schema validation, rolling features, drift detection, and reproducible experimentation workflows.

Rishabh BhartiyaFebruary 10, 20268 min read

Feature Engineering for Production ML: From Raw Data to Deployable Signals

Feature engineering is where most ML projects actually win or lose. State-of-the-art models underperform weak models trained on excellent features. Yet most tutorials cover feature engineering as a Jupyter notebook exercise — not as a production engineering challenge.

This post covers the feature pipeline architecture I've built across multiple production ML systems, including the F1 race strategy platform and the TTA Engine at Edza.ai.

The 5 Layers of a Production Feature Pipeline

Ingestion + Schema Validation — catch bad data before it poisons models
Cleaning + Normalization — handle nulls, outliers, unit consistency
Feature Construction — domain-specific transformations
Feature Store — versioned, shareable, consistent across train and serve
Drift Monitoring — detect when the real world stops matching training data

Layer 1: Schema Validation Before Anything Else

Every feature pipeline should start with schema validation. Garbage in, garbage out — and garbage is silent without explicit checks:


import pandas as pd
from dataclasses import dataclass
from typing import Dict, Type

@dataclass
class FeatureSchema:
    columns: Dict[str, Type]
    required: list[str]
    value_ranges: Dict[str, tuple]

def validate_schema(df: pd.DataFrame, schema: FeatureSchema) -> pd.DataFrame:
    # Check required columns exist
    missing = [c for c in schema.required if c not in df.columns]
    if missing:
        raise ValueError(f"Missing required columns: {missing}")

    # Type coercion
    for col, dtype in schema.columns.items():
        if col in df.columns:
            df[col] = df[col].astype(dtype)

    # Range validation
    for col, (lo, hi) in schema.value_ranges.items():
        violations = df[(df[col] < lo) | (df[col] > hi)]
        if len(violations) > 0:
            print(f"Warning: {len(violations)} rows outside range for {col}")
            df = df[(df[col] >= lo) & (df[col] <= hi)]

    return df

Layer 3: Domain-Specific Feature Construction

The best features encode domain knowledge. For the F1 performance system, raw lap times are useless — we need engineered signals:


def engineer_f1_features(df: pd.DataFrame) -> pd.DataFrame:
    # Tire degradation slope — how fast is lap time increasing per lap on current stint?
    df['tire_deg_slope'] = df.groupby(['driver', 'stint'])['lap_time'].transform(
        lambda x: pd.Series(np.polyfit(range(len(x)), x, 1)[0], index=x.index)
        if len(x) > 2 else 0
    )

    # Fuel-adjusted pace — remove the ~0.07s/lap fuel effect
    FUEL_EFFECT = 0.07  # seconds per lap per kg
    df['fuel_adjusted_pace'] = df['lap_time'] - (df['fuel_load'] * FUEL_EFFECT)

    # Rolling lap delta — are we getting faster or slower over last 3 laps?
    df['rolling_delta_3'] = df.groupby('driver')['lap_time'].transform(
        lambda x: x.rolling(3, min_periods=1).mean() - x
    )

    # Sector consistency score — lower variance = more consistent driver
    df['sector_consistency'] = df.groupby(['driver', 'stint'])['sector_1_time'].transform('std')

    return df

Layer 5: Feature Drift Detection

Models degrade silently when the feature distribution shifts. Statistical drift detection using the Population Stability Index (PSI) catches this automatically:


import numpy as np

def population_stability_index(expected: np.ndarray,
                                actual: np.ndarray,
                                buckets: int = 10) -> float:
    """
    PSI < 0.1  → No significant change
    PSI 0.1–0.2 → Moderate change, investigate
    PSI > 0.2  → Significant shift, retrain
    """
    expected_pct = np.histogram(expected, buckets)[0] / len(expected)
    actual_pct   = np.histogram(actual,   buckets)[0] / len(actual)

    # Avoid log(0)
    expected_pct = np.where(expected_pct == 0, 0.0001, expected_pct)
    actual_pct   = np.where(actual_pct   == 0, 0.0001, actual_pct)

    psi = np.sum((actual_pct - expected_pct) * np.log(actual_pct / expected_pct))
    return psi

Reproducibility: The Hidden Production Requirement

Features computed during training must be identical during inference. The most common production bug is train/serve skew — a feature is computed differently at serving time.

The solution is a feature store: a versioned registry where features are defined once, computed once, and shared across training and inference. Even a simple Redis-backed feature store eliminates train/serve skew entirely.