Quantized Models

The Protection SDK supports post-training quantization, enabling you to optimize protected models for deployment while maintaining performance. Currently, we provide support for:

  • Pytorch post-training quantization using torch.ao.quantization
  • ONNX post-training quantization using onnxruntime.quantization.quantize_static

It’s not possible to directly quantize a protected model using torch.ao.quantization and convert it to ONNX. To quantize a protected ONNX model, you must first export the protected float32 ONNX model and then perform quantization using onnxruntime.quantization.quantize_static.

Workflow

For quantizating a protected model, the right workflow is protect and then quantize (protect->quantize). When protecting the model, ensure it is compatible with quantization by setting the quantize_compatibility argument in the protect function to True:

def protect(
        self,
        .... # Other parameters
        quantize_compatibility : Optional[bool] = False
    )

For details on how to quantize a protected model, please refer to the example script that has been provided to you

Runner usage

For inference comparison and benchmarking with a quantized model, use the same runners as with float32 models. Use the runner variants ending with Q:

# Example usage of quantize_compatibility argument
protected_model = RunnerQ(
            model_path=skyld_model_path,mask_path=mask_path,
            magic_number_path=magic_number_path,use_tf32=True)
# Run one inference on inputs
protected_results = protected_model(*inputs)