Glossary

What is Rate Limiting?

Controlling the number of API requests allowed within a time period to prevent overuse and manage costs.

LLM APIs have rate limits measured in tokens per minute (TPM) and requests per minute (RPM). AI gateways can implement their own rate limiting to prevent runaway costs, ensure fair usage across customers, and avoid hitting provider limits. Rate limiting can be applied per API key, per user, or globally.

Examples

→ Limit free tier users to 10 requests per minute
→ Cap daily spending at $100 per customer
→ Distribute requests across multiple API keys to increase effective limits

Related Terms

AI Gateway LLM Proxy Model Routing

Ready to implement rate limiting?

ScaleMind provides everything you need.

Get Started Free →