OpenAI quota exceeded
Symptom
Visitors see: “Sorry — I had trouble reaching the assistant just now. Please try again in a moment.”
Dashboard’s last-1h cost is dropping but message volume isn’t.
Cause
Your Azure OpenAI deployment hit its TPM (tokens per minute) limit for gpt-5-mini. Each completion fails with a 429 from Azure OpenAI; the customer-runtime catches it and surfaces the friendly fallback.
Fix
Request a TPM quota increase:
- Azure portal → Azure AI Foundry → Quotas.
- Filter to your region +
gpt-5-mini+ GlobalStandard. - Click Request quota. Pick a higher TPM target. Submit.
Approvals are usually quick (minutes for small asks).
Sustained high usage
The customer-stack’s OpenAI deployment is provisioned at 10K TPM by default. That’s enough for a steady ~600 visitor turns per hour. If your support traffic is consistently higher, request a bigger quota AND let us know — we can ship a deployment with a larger initial SKU on the next plan upgrade.