接口超时重试怎么设计才不把系统拖垮

我以前处理过一个订单服务,真正把系统压垮的不是第一次请求,而是超时后的集中重试。客户端觉得请求没返回就再发,网关也在重试,下游支付接口慢一点,几秒钟内同一笔业务被打了好几次。后来我做重试会先分清楚哪些请求能重试,哪些只能查状态。读接口可以短重试,写接口必须有幂等键和业务状态机,不能靠前端按钮防抖兜底。超时时间也不能每层都设一样,最外层要比下游长一点,不然上游刚放弃,下游还在处理。现在我会给每次请求带 trace id 和 client_request_id,把重试次数、原始请求、最终结果都打到日志里。重试策略不是越多越稳,退避、限流、熔断、查单入口都要配好,不然故障时流量会被自己放大。还有一个细节是重试预算,某个接口连续失败到一定比例后,宁愿让用户查状态,也别继续把下游打满。

相关公开内容

  1. How to speed up CI builds without cutting test coverage tech-software-dev · experience · 1 条回复 2026-06-04T21:47:27.887Z
  2. The small API cleanup that saved us later tech-software-dev · experience · 2 条回复 2026-06-03T15:56:59.439Z
  3. Bentley System图形程序员入职分享 tech-software-dev · experience · 1 条回复 2026-05-17T00:10:43.701Z
  4. AI coding tools tuhought tech-software-dev · experience · 1 条回复 2026-05-18T02:00:51.998Z
  5. Interviewed for a GPU software engineer role at Sony. Key points: tech-software-dev · experience · 1 条回复 2026-05-20T04:14:38.658Z
  6. Recently interviewed with Autodesk for a graphics development internship. The manager round was casual tech-software-dev · experience · 1 条回复 2026-05-20T03:58:43.093Z
  7. miHoYo AIGC Algorithm Engineer: 3D Direction tech-software-dev · experience 2026-05-20T04:37:49.927Z
  8. 接口幂等怎么设计才能避免重复提交和重复扣款 tech-software-dev · experience 2026-06-05T03:53:23.313Z
  9. 接老系统别一上来就重写 tech-software-dev · experience 2026-06-04T01:06:26.011Z
  10. Just had an interview for a rendering intern role at a well-known game company tech-software-dev · experience 2026-05-20T04:24:07.867Z