How to catch data leakage before an ML model looks too good

A demand model I reviewed looked amazing in offline validation. The AUC jump was large enough that the business team wanted to push it into the next planning run. That was exactly what made me suspicious. In real warehouse demand data, big improvements usually come from better features, not magic. I started by checking the timestamp of every feature against the prediction timestamp. Two fields…