PyTorch Quantization for TensorRT

There seems to be quite a few possible ways to do this:

PyTorch Eager Mode Quantization TensorRT Acceleration , seems a bit cumbersome:
1. torchao quantization
2. ONNX conversion
3. Graph Surgery (changing some ops in the onnx graph)
4. tensorrt conversion
Not sure if it works, but would be ideal
1. torch.export
2. torchao quantization
3. tensorrt conversion
Less ideal would be:
1. torchao quantization
2. torch.export
3. tensorrt conversion
- I’ve already sort of tried this using the vgg ptq example from tensorrt, but torch.export complained that it couldn’t translate the quantized operations

Second Brain