Test the Protected Models
Table of contents
To check protected model results and impacts on the original model, skyld provides runners. These runners are implemented for Pytorch and OnnxRuntime respectively. Two different types or runners are implemented for these two usage:
- Inference test: check output results differencies between the protected and the original model.
- Benchmark tests: check inference time differencies between the protected and the original model.
Requirements
- pycryptodome==3.21.0
- onnxruntime or onnxruntime-gpu > 1.18 (If running ONNX models only)
- If running ONNX models with external data onnxruntime >= 1.21.0 is required
- torch >= 2.6 (if running using
torch.export AOTInductor) - tensorrt installed on the device (If running ONNX models with the TensorRt provider)
- cupy installed if running benchmarks with ONNX models for TensorRT and or Cuda. Can be installed with
pip install cupy-cuda<version>xwhere<version>is the cuda version
Test Inferences
Use the Skyld Runners
To check output results differences between protected and original models you can use the following runners:
# Runners for Torchscript (CPU)
from runners.pytorch import PytorchModelRunner
# Runners for Torchscript (CUDA)
from runners.pytorch_cuda import PytorchCUDAModelRunner
# Runners for onnx-runtime(CPU)
from runners.onnx import OnnxModelRunner
# Runners for onnx-runtime(CUDA)
from runners.onnx_cuda import OnnxCUDAModelRunner
# Runners for onnx-runtime(TensorRT)
from runners.onnx_tensorrt import OnnxTensorRTModelRunner
# Runners for AOTInductor (CPU)
from runners.pytorchv2 import Pytorchv2ModelRunner
# Runners for AOTInductor (CUDA)
from runners.pytorchv2_cuda import Pytorchv2CUDAModelRunner
| Runner class getter | Usage (inference call) |
|---|---|
PytorchModelRunner | Pytorch (torchscript) running on CPU |
PytorchCUDAModelRunner | Pytorch (torchscript) running on a cuda compatible GPU |
Pytorchv2ModelRunner | Pytorch (AOTInductor) running on CPU |
Pytorchv2CUDAModelRunner | Pytorch (AOTInductor) running on a cuda compatible GPU |
OnnxModelRunner | ONNX running on CPU |
OnnxCUDAModelRunner | ONNX running on a cuda compatible GPU |
OnnxTensorRTModelRunner | ONNX running with the TensorRT provider |
For the original model, each runner can be called as follows:
# Load original model
original_model = Runner(model_path=original_model_path)
# Run one inference on inputs
original_results = original_model(*inputs)
Where Runner is one among the listed runners and original_model_path the path where the original (exported) model is stored. original_results is a NumPy array that contains output results, and inputs a list of input tensors in NumPy format.
For PyTorch original_results will be in the same format as what typically returns model(inputs).detach().cpu().numpy(), and for ONNX in the same format as what typically returns session.run(outputs_name, input_dict).
For the exported model, each runner can be called as follows:
# Load the protected model
protected_model = Runner(
model_path=skyld_model_path, mask_path=mask_path,
magic_number_path=magic_number_path)
# Run one inference on inputs
original_results = protected_model(*inputs)
Where Runner is one among the listed runners and skyld_model_path the path where the protected (exported) model is stored. mask_path and magic_number_path are path to txt files that are obtained when protecting the model (stored in the same folder as the exported models) If enabling the test_key parameter when protecting the model, only the key_vector_inv txt file will be generated. The path to this file should be set in the key_path parametter of the runner. In this case, the values of the mask_path and magic_number_path should be empty.
Make sure that SkProtectorPyTorch.ExportPlatform.TXT is passed to the deployment_platform argument of protect function if you want to use runners.
Outputs Comparison
To compare output results between an original and a protected model, a test_results example function as been provided to you in the usage_example.py file. This function compute the maximum L2 norm between each output tensors:
# Perform inferences on protected and original models
original_results = original_model(*inputs)
protected_results = protected_model(*inputs)
# You can compare the results of the two models as you wish here
# Ex:
norms_between_results = [
np.linalg.norm(original_result - protected_result)
for original_result, protected_result in zip(
original_results, protected_results
) # L2 norm between each output tensor
]
max_norm = np.max(norms_between_results)
You can implement any comparison process you need in this code section, the goal being to check if the protected model is satisfying for your usage.
In the test_results function we sample random inputs for testing, you are free to pass any input you want (like ones from your dataset).
Here we choose random inputs because the protected model and the original model are functionally speaking the same model (regarding computation).
Benchmarks the Models
Run the Benchmarks
To run inference time comparisons between protected and original models you can use four different runners:
# Runners for Torchscript (CPU)
from runners.pytorch_benchmark import PytorchModelRunnerBenchmark
# Runners for Torchscript (CUDA)
from runners.pytorch_benchmark_cuda import PytorchCUDAModelRunnerBenchmark
# Runners for onnx-runtime(CPU)
from runners.onnx_benchmark import OnnxModelRunnerBenchmark
# Runners for onnx-runtime(CUDA)
from runners.onnx_benchmark_cuda import OnnxCUDAModelRunnerBenchmark
# Runners for onnx-runtime(TensorRT)
from runners.onnx_benchmark_tensorrt import OnnxTensorRTModelRunnerBenchmark
# Runners for AOTInductor (CPU)
from runners.pytorchv2_benchmark import Pytorchv2ModelRunnerBenchmark
# Runners for AOTInductor (CUDA)
from runners.pytorchv2_benchmark_cuda import Pytorchv2CUDAModelRunnerBenchmark
| Runner classes | Usage (mean inference time measurement) |
|---|---|
PytorchModelRunnerBenchmark | Pytorch (torchscript) running on CPU |
PytorchCUDAModelRunnerBenchmark | Pytorch (torchscript) running on a cuda compatible GPU |
Pytorchv2ModelRunnerBenchmark | Pytorch (AOTInductor) running on CPU |
Pytorchv2CUDAModelRunnerBenchmark | Pytorch (AOTInductor) running on a cuda compatible GPU |
OnnxModelRunnerBenchmark | ONNX running on CPU |
OnnxCUDAModelRunnerBenchmark | ONNX running on a cuda compatible GPU |
OnnxTensorRTModelRunnerBenchmark | ONNX running with the TensorRT provider |
Note that we have two different types of runners, runners for inferences (previous section) and runners for benchmarks (current section)
For the original model, the mean inference time on 100 runs with 50 warmups runs can be computed as follows:
original_model = Runner(
model_path=original_model_path,num_run=100,num_warm=50)
# Get mean inference time
original_inference_time = original_model(*inputs)
where Runner is a benchmark runner, original_model_path the path of the original exported model, num_run the number of runs to compute the mean inference time, and num_warm the number of warmups i.e. first inferences runs that are not counted in the mean.
For the protected model, the mean inference time on 100 runs with 50 warmups runs can be computed as follows:
protected_model = Runner(
model_path=skyld_model_path,mask_path=mask_path,
magic_number_path=magic_number_path,
num_run=100,num_warm=50)
protected_inference_time = protected_model(*inputs)
mask_path and magic_number_path are path to txt files that are obtained when protecting the model (stored in the same folder as the exported models) If enabling the test_key parameter when protecting the model, only the key_vector_inv txt file will be generated. The path to this file should be set in the key_path parameter of the runner. In this case, the values of the mask_path and magic_number_path should be empty.
To benchmark the original and the protected model correctly, the number of runs and warmups need to be the same between the two runner instances
Compare the Benchmarks Results
To benchmark the protected model against the original model regarding inference time, a benchmark_models example function as been provided to you in the usage_example.py file. This function compute the overhead in percentage between the protected and the original model on 200 runs with 50 warmups:
# Perform benchmarks on protected and original models
original_inference_time = original_model(*inputs)
protected_inference_time = protected_model(*inputs)
# Compute overhead in percentage
overhead = (protected_inference_time -
original_inference_time) / original_inference_time * 100
This function is entirely customizable if you need to measure anything else than overhead.
In the benchmark_models function, random inputs are used for benchmarking.
Please note that the measured overhead is relative to your machine, and could be different when the AI model is deployed on another device (in production).
GPU Settings Requirements for Protected Model
When using a cuda GPU with ONNX or PyTorch, the default precision for convolution and matmul operations is tf32.
Using the tf32 precision, with a protected model can lead to a loss of precision regarding the output results of the protected model in comparison with the original model (before protection). In case where the precision loss strongly affect the output results, you can disable tf32.
tf32 is disabled by default in runners and in the example script (usage_example.py)
Running with the Runners
For the following Cuda runners: onnx_cuda_model_runner_benchmark,pytorch_cuda_model_runner_benchmark,pytorch_cuda_model_runner,onnx_cuda_model_runner a use_tf32 argument can be passed to use (or not) tf32 when running inferences or benchmarks, the default value is False.
# Example usage of "use_tf32" argument
protected_model = Runner(
model_path=skyld_model_path, mask_path=mask_path,
magic_number_path=magic_number_path, use_tf32=True)
# Run one inference on inputs
protected_results = protected_model(*inputs)
In the example.py script the use_tf32 argument is set to False. You can switch to tf32 usage by changing the following values in both benchmark_models and test_results:
params_original["use_tf32"] = False
params_protected["use_tf32"] = False
Running without the Runners
PyTorch
In PyTorch, you can disable tf32 using:
torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.allow_tf32 = False
ONNX Runtime
Using ONNXRuntime you can disable tf32 when creating the inference session:
model = onnxruntime.InferenceSession(
...,
providers=[("CUDAExecutionProvider", {"use_tf32": 0})],
)