©IWConnect. All Rights Reserved.
What changed
Deployment — prov-service v2.14.1
→ v2.14.3 (feat: timeout config
refactor)
23:41 UTC
Config change — EXTERNAL_API_TIMEOUT_MS set
to 800ms (was 3000ms)
23:39 UTC
Blast radius
prov-flow ✕
order-manager ✕
symphonica ✕
billing-svc ⚠
notif-service ⚠
auth-service ✓
reporting ✓
3 services down · 2 at
risk · customer-facing: YES — checkout
affected
Root cause hypothesis
Argus · confidence 91%
Timeout misconfiguration in v2.14.3 deploy.
EXTERNAL_API_TIMEOUT_MS
reduced from 3000ms to 800ms, below the p99 latency of upstream Identity Provider (avg 1,240ms).
All 500s co-locate with identity validation calls. No similar failures in 45-day history.Suggested actions
1
Rollback prov-service to v2.14.1
or hot-patch timeout to ≥ 3000ms
Now
2
Verify order-manager and symphonica error rates clear within 2 min of
rollback
Verify
3
Open post-mortem — config change bypassed staging timeout validation
Follow-up
Timeline
23:39
Config change pushed:
EXTERNAL_API_TIMEOUT_MS →
800ms23:41
v2.14.3 deployed to prod-eu · zero immediate errors
01:51
Identity Provider p99 begins drifting above 900ms (normal nightly pattern)
01:58
500 rate crosses 5% threshold · Datadog alert fires · Argus investigation
starts
02:00
43% of provisioning requests failing · order-manager and symphonica cascade