Path: blob/main/transformers_doc/en/quantization/torchao.ipynb
4542 views
Kernel: Unknown Kernel
torchao is a PyTorch architecture optimization library with support for custom high performance data types, quantization, and sparsity. It is composable with native PyTorch features such as torch.compile for even faster inference and training.
To quantize a model, you need to install torchao and follow the examples below
In [ ]:
If the execution runtime is GPU, the code will quantize the model on the GPU with device_map="auto"
.
In [ ]:
The example below will quantize the model on the CPU with device_map="cpu"
.
In [ ]: