OpenAI quota exceeded

Symptom

Visitors see: “Sorry — I had trouble reaching the assistant just now. Please try again in a moment.”

Dashboard’s last-1h cost is dropping but message volume isn’t.

Cause

Your Azure OpenAI deployment hit its TPM (tokens per minute) limit for gpt-5-mini. Each completion fails with a 429 from Azure OpenAI; the customer-runtime catches it and surfaces the friendly fallback.

Fix

Request a TPM quota increase:

Azure portal → Azure AI Foundry → Quotas.
Filter to your region + gpt-5-mini + GlobalStandard.
Click Request quota. Pick a higher TPM target. Submit.

Approvals are usually quick (minutes for small asks).

Sustained high usage

The OpenAI throughput ceiling is chosen in the purchase wizard’s Tiers & cost step and defaults to your plan’s recommendation — 10K TPM (Start), 30K TPM (Growth), 60K TPM (Professional), with 120K available. 10K TPM sustains a steady ~600 visitor turns per hour. It’s a ceiling only — you pay for tokens consumed, not for the ceiling — so raise it freely if you hit throttling: bump the capacity on the Azure OpenAI deployment in your own subscription, or pick a higher tier at deploy time. See Resource tiers & sizing.