Quantized Models
The Protection SDK supports post-training quantization, enabling you to optimize protected models for deployment while maintaining performance. Currently, we provide support for:
- Pytorch post-training quantization using
torch.ao.quantization - ONNX post-training quantization using
onnxruntime.quantization.quantize_static
It’s not possible to directly quantize a protected model using torch.ao.quantization and convert it to ONNX. To quantize a protected ONNX model, you must first export the protected float32 ONNX model and then perform quantization using onnxruntime.quantization.quantize_static.
Workflow
For quantizating a protected model, the right workflow is protect and then quantize (protect->quantize). When protecting the model, ensure it is compatible with quantization by setting the quantize_compatibility argument in the protect function to True:
def protect(
self,
.... # Other parameters
quantize_compatibility : Optional[bool] = False
)
For details on how to quantize a protected model, please refer to the example script that has been provided to you
Runner usage
For inference comparison and benchmarking with a quantized model, use the same runners as with float32 models. Use the runner variants ending with Q:
# Example usage of quantize_compatibility argument
protected_model = RunnerQ(
model_path=skyld_model_path,mask_path=mask_path,
magic_number_path=magic_number_path,use_tf32=True)
# Run one inference on inputs
protected_results = protected_model(*inputs)