Glossary
Key concepts in AI infrastructure.
AI Gateway
A proxy layer between your application and LLM providers that handles routing, caching, failover, and observability.
Context Window
The maximum number of tokens an LLM can process in a single request, including both input and output.
Fallback / Failover
Automatically routing requests to a backup LLM provider when the primary provider fails or is unavailable.
LLM Observability
The ability to monitor, debug, and understand LLM application behavior through logging, metrics, and tracing.
LLM Proxy
A server that forwards LLM API requests on behalf of your application, adding features like caching, logging, and failover.
Model Routing
The practice of automatically selecting which LLM model handles each request based on criteria like cost, capability, or latency.
Prompt Engineering
The practice of designing and optimizing prompts to get better results from LLMs.
Rate Limiting
Controlling the number of API requests allowed within a time period to prevent overuse and manage costs.
Semantic Caching
A caching technique that returns stored responses for semantically similar queries, not just exact matches.
Token
The basic unit of text processing in LLMs. Tokens can be words, subwords, or characters depending on the model's tokenizer.