The model was fine. The feature table was not.
I spent a week chasing a model issue that turned out to be a data issue. Offline metrics looked decent, but production scores jumped around because one of the daily aggregates landed late on Mondays. The useful fix was boring: add freshness checks, log feature timestamps next to predictions, and fail closed when a feature is stale instead of letting the model guess from half a row. Since then I t…