When AI model performance suddenly drops, should I check for feature drift or prompt issues first?

I was recently working on a customer service ticket classification task. The model's offline F1 score was decent, but two days after deployment, the operations team reported that 'refunds' and 'shipping delays' were frequently being misclassified. Many people's first reaction would be to tweak the prompt or switch models, but I started by splitting the online samples by time to look at the…