How to perform canary releases for online services to make rollbacks easier?
I once participated in a release where the canary check only looked at whether the interface was accessible, without checking if the new and old versions were data-compatible. The first 10% of traffic didn't show major issues, but after the full rollout, we discovered that the new version couldn't read fields written by the old version. Even rolling back was useless because the data had already…