Don't just stare at the CPU when an alert goes off in the middle of the night.
After working in operations support for a long time, the alerts I fear most are the ones that just say 'Service Abnormal.' Being woken up at 2 AM with only a red alert—no instance info, version, recent deployments, error codes, or impact scope—means troubleshooting in the dark. When I added monitoring to my own services, I started by breaking down the entire chain: ingress traffic, error rates…