Glossary

Key concepts in AI infrastructure.

AI Gateway

A proxy layer between your application and LLM providers that handles routing, caching, failover, and observability.

The maximum number of tokens an LLM can process in a single request, including both input and output.

Automatically routing requests to a backup LLM provider when the primary provider fails or is unavailable.

The ability to monitor, debug, and understand LLM application behavior through logging, metrics, and tracing.

A server that forwards LLM API requests on behalf of your application, adding features like caching, logging, and failover.

The practice of automatically selecting which LLM model handles each request based on criteria like cost, capability, or latency.

The practice of designing and optimizing prompts to get better results from LLMs.

Controlling the number of API requests allowed within a time period to prevent overuse and manage costs.

A caching technique that returns stored responses for semantically similar queries, not just exact matches.

The basic unit of text processing in LLMs. Tokens can be words, subwords, or characters depending on the model's tokenizer.