Benchmark Generation

Benchmark Generation

One of the values of Benanza is its ability to automatically generate benchmark code given a model. It does this by first performing shape inference on the graph, traversing it (as if it is performing an execution), and collects all the layers that would be run. The layers are placed into a set of layers, with only unique layers are present. Benchmark generation for multiple models operates in the same way, were the set of layers capture the unique layers for all models. We accomplish two things by generating benchmarks for only unique layer: (1) we make the generated benchmark code smaller and (2) we decrease the time of benchmarking.

To generate a benchmark for MobileNet-v2 inference with batch size=1 you write

benanza benchgen --model_path //MobileNet-v2/ -o gen/generated_mobilenetv2_benchmarks.hpp --backward=false --forward=true --batch_size=1

Generate All

There a few helper scripts within the repository that generate benchmarks for different purposes. To run all do

./scripts/gen_benchmarks_resnet50.sh
./scripts/gen_benchmarks.sh
./scripts/gen_only_fused_benchmarks.sh
./scripts/gen_profile_benchmarks.sh
./scripts/gen_benchmarks_mobilenetv2.sh

Generated Benchmarks

The generated code contains metadata that encode the layer type, its parameters, and any other run options that are passed in. Given that considerable time is saved by benchmarking only unique layers, the benchmark generator uses this time to exhaustively measure all possible combinations by which the kernels can be called. As one can imagine, the generated code can be quite large, and the following is an excerpt of the code generated for batch size = 1.

Benchmark Output

The benchmark results are stored in JSON format and later on in a database. The database is indexed by the machine information (GPU, CPU, and software versions). Again, the benchmark results are quite large and we only show an excerpt here.

We describe how this information is parsed in later sections.

Benchmark Runtime

The benchmark requires a soon to be released runtime to operate, but the reader can examine the C++ code generated to gather information on the types of code generated. The benchmark runtime utilizes the Google Benchmark library to capture statistically valid timing results — i.e. the number of iterations each benchmark is run is dependent on its performance characteristics.