Skip to content

OpenAI quota exceeded

Symptom

Visitors see: “Sorry — I had trouble reaching the assistant just now. Please try again in a moment.”

Dashboard’s last-1h cost is dropping but message volume isn’t.

Cause

Your Azure OpenAI deployment hit its TPM (tokens per minute) limit for gpt-5-mini. Each completion fails with a 429 from Azure OpenAI; the customer-runtime catches it and surfaces the friendly fallback.

Fix

Request a TPM quota increase:

  1. Azure portal → Azure AI FoundryQuotas.
  2. Filter to your region + gpt-5-mini + GlobalStandard.
  3. Click Request quota. Pick a higher TPM target. Submit.

Approvals are usually quick (minutes for small asks).

Sustained high usage

The customer-stack’s OpenAI deployment is provisioned at 10K TPM by default. That’s enough for a steady ~600 visitor turns per hour. If your support traffic is consistently higher, request a bigger quota AND let us know — we can ship a deployment with a larger initial SKU on the next plan upgrade.