Objective: The API gateway is up 99.9% of the time each month which translates to, at most, 44 minutes of unplanned downtime per month.
Method: To measure this we will measure uptime from synthetic checks every 1 minute from multiple regions.
Objective: 95% of requests finish in under 250 ms (P95 latency) meaning almost all user clicks feel instant and only the slowest 5% may take longer.
Method: To measure this we will look at latency and errors from request traces and logs at the gateway (and per domain
service).
Objective: We “spend” at most 0.1% of the month on errors/outages. If reliability slips past this allowance then we would pause new launches and fix stability first. The goal being to create customer and team trust, reduce churn, and avoid support blowups during demos or new feature releases.
Method: Manual recording initially
Objective: Total monthly cloud spend vs. budget (by environment and by product/brand) – simple measure of stay within 100% of budget
Objective: Keep team costs within budget.