Quantization

Letter Q

Shrinking a model by lowering the numerical precision of its weights. Same model, fraction of the memory, almost the same accuracy.

Updated: June 4, 2026

Working definition

Quantization shrinks a model by lowering the numerical precision of its weights. FP16 to INT8 to INT4. Same model, fraction of the memory, almost the same accuracy. It’s why the gap between frontier and open source keeps closing.

Quantization

Working definition

Settings

Language

Your space

Privacy

Accessibility