KV cache

Stored key-value tensors that let models skip recomputing past tokens. Drives inference cost in long-context workloads.

Updated: June 4, 2026

Working definition

The KV cache stores key-value tensors from attention so models can skip recomputing past tokens. This is what fills up in long-context workloads. It’s also what’s driving your inference cost. Long context isn’t free. The KV cache is the receipt.

KV cache

Working definition

Settings

Language

Your space

Privacy

Accessibility