OpenVINO™ インストール時のサンプルデモ「demo_benchmark_app.sh」を実行することにより、ベンチマークテストのツールがインストールされる。
$ cd ~/inference_engine_samples_build/intel64/Release/
$ ./benchmark_app -h
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
benchmark_app [OPTION]
Options:
-h, --help Print a usage message
-m "<path>" Required. Path to an .xml/.onnx/.prototxt file with a trained model or to a .blob files with a trained compiled model.
-i "<path>" Optional. Path to a folder with images and/or binaries or to specific image or binary file.
-d "<device>" Optional. Specify a target device to infer on (the list of available devices is shown below). Default value is CPU. Use "-d HETERO:<comma-separated_devices_list>" format to specify HETERO plugin. Use "-d MULTI:<comma-separated_devices_list>" format to specify MULTI plugin. The application looks for a suitable plugin for the specified device.
-l "<absolute_path>" Required for CPU custom layers. Absolute path to a shared library with the kernels implementations.
Or
-c "<absolute_path>" Required for GPU custom kernels. Absolute path to an .xml file with the kernels description.
-api "<sync/async>" Optional. Enable Sync/Async API. Default value is "async".
-niter "<integer>" Optional. Number of iterations. If not specified, the number of iterations is calculated depending on a device.
-nireq "<integer>" Optional. Number of infer requests. Default value is determined automatically for device.
-b "<integer>" Optional. Batch size value. If not specified, the batch size value is determined from Intermediate Representation.
-stream_output Optional. Print progress as a plain text. When specified, an interactive progress bar is replaced with a multiline output.
-t Optional. Time in seconds to execute topology.
-progress Optional. Show progress bar (can affect performance measurement). Default values is "false".
-shape Optional. Set shape for input. For example, "input1[1,3,224,224],input2[1,4]" or "[1,3,224,224]" in case of one input size.
-layout Optional. Prompts how network layouts should be treated by application. For example, "input1[NCHW],input2[NC]" or "[NCHW]" in case of one input size.
device-specific performance options:
-nstreams "<integer>" Optional. Number of streams to use for inference on the CPU, GPU or MYRIAD devices (for HETERO and MULTI device cases use format <dev1>:<nstreams1>,<dev2>:<nstreams2> or just <nstreams>). Default value is determined automatically for a device.Please note that although the automatic selection usually provides a reasonable performance, it still may be non - optimal for some cases, especially for very small networks. See sample's README for more details. Also, using nstreams>1 is inherently throughput-oriented option, while for the best-latency estimations the number of streams should be set to 1.
-nthreads "<integer>" Optional. Number of threads to use for inference on the CPU (including HETERO and MULTI cases).
-enforcebf16 Optional. Enforcing of floating point operations execution in bfloat16 precision where it is acceptable.
-pin "YES"/"NO"/"NUMA" Optional. Enable threads->cores ("YES", default), threads->(NUMA)nodes ("NUMA") or completely disable ("NO") CPU threads pinning for CPU-involved inference.
Statistics dumping options:
-report_type "<type>" Optional. Enable collecting statistics report. "no_counters" report contains configuration options specified, resulting FPS and latency. "average_counters" report extends "no_counters" report and additionally includes average PM counters values for each layer from the network. "detailed_counters" report extends "average_counters" report and additionally includes per-layer PM counters and latency for each executed infer request.
-report_folder Optional. Path to a folder where statistics report is stored.
-exec_graph_path Optional. Path to a file where to store executable graph information serialized.
-pc Optional. Report performance counters.
-dump_config Optional. Path to XML/YAML/JSON file to dump IE parameters, which were set by application.
-load_config Optional. Path to XML/YAML/JSON file to load custom IE parameters. Please note, command line parameters have higher priority then parameters from configuration file.
-qb Optional. Weight bits for quantization: 8 or 16 (default)
[E:] [BSL] found 0 ioexpander device
Available target devices: CPU GNA MYRIAD
実行ディレクトリへ
$ cd ~/inference_engine_samples_build/intel64/Release
● Core™ i7-1185G7
・GPU (32)
$ ./benchmark_app -d GPU -m ~/model/public/FP32/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel(R) Gen12LP HD Graphics (iGPU)
Count: 1000 iterations
Duration: 1332.08 ms
Latency: 5.17 ms
Throughput: 750.70 FPS
・GPU (16)
$ ./benchmark_app -d GPU -m ~/model/public/FP16/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel(R) Gen12LP HD Graphics (iGPU)
Count: 1000 iterations
Duration: 2263.99 ms
Latency: 7.85 ms
Throughput: 441.70 FPS
・CPU (32)
$ ./benchmark_app -d CPU -m ~/model/public/FP32/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel(R) Gen12LP HD Graphics (iGPU)
Count: 1000 iterations
Duration: 2341.66 ms
Latency: 9.42 ms
Throughput: 427.05 FPS
・CPU (16)
$ ./benchmark_app -d CPU -m ~/model/public/FP16/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel Movidius Myriad X VPU
Count: 1000 iterations
Duration: 2389.01 ms
Latency: 9.60 ms
Throughput: 418.58 FPS
● Core™ i3-1115G4
./benchmark_app -d CPU -m ~/model/public/FP32/squeezenet1.1.xml -pc -niter 1000
● Core™ i5-10210U
./benchmark_app -d CPU -m ~/model/public/FP32/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz
Count: 1000 iterations
#codeprettify(){{
Duration: 2119.29 ms
Latency: 8.41 ms
Throughput: 471.86 FPS
● Core™ i7-6700
./benchmark_app -d CPU -m ~/model/public/FP32/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
Count: 1000 iterations
Duration: 2932.73 ms
Latency: 9.90 ms
Throughput: 340.98 FPS
● Core™ i7-2620M
./benchmark_app -d CPU-m ~/model/public/FP32/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
Count: 1000 iterations
Duration: 24671.35 ms
Latency: 22.21 ms
Throughput: 40.53 FPS
実行ディレクトリへ
$ cd ~/inference_engine_samples_build/intel64/Release
● Core™ i7-1185G7
./benchmark_app -d MYRIAD -m ~/model/public/FP16/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel Movidius Myriad X VPU
Count: 1000 iterations
Duration: 3504.72 ms
Latency: 14.00 ms
Throughput: 285.33 FPS
● Core™ i3-1115G4
./benchmark_app -d MYRIAD -m ~/model/public/FP16/squeezenet1.1.xml -pc -niter 1000
● Core™ i5-10210U
./benchmark_app -d MYRIAD -m ~/model/public/FP16/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel Movidius Myriad X VPU
Count: 1000 iterations
Duration: 3476.72 ms
Latency: 13.86 ms
Throughput: 287.63 FPS
● Core™ i7-6700
./benchmark_app -d MYRIAD -m ~/model/public/FP16/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel Movidius Myriad X VPU
Count: 1000 iterations
Duration: 3716.35 ms
Latency: 14.69 ms
Throughput: 269.08 FPS
● Core™ i7-2620M
./benchmark_app -d MYRIAD -m ~/model/public/FP16/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel Movidius Myriad X VPU
Count: 1000 iterations
Duration: 10659.16 ms
Latency: 41.04 ms
Throughput: 93.82 FPS
スクリプト実行ディレクトリへ
$ cd ~/run_app/
● X 3 本
$ ./_benchmark_app.sh MULTI:MYRIAD.3.4.1-ma2480,MYRIAD.3.4.3-ma2480,MYRIAD.3.4.4-ma2480 ~/model/public/FP16/squeezenet1.1.xml
[benchmark_app.sh] 'benchmark_app' Run !!
'command: ./benchmark_app -d MULTI:MYRIAD.3.4.1-ma2480,MYRIAD.3.4.3-ma2480,MYRIAD.3.4.4-ma2480 -m /home/mizutu/model/public/FP16/squeezenet1.1.xml -pc -niter 1000'
:
:
Full device name:
Count: 1008 iterations
Duration: 1180.91 ms
Throughput: 853.58 FPS
● X 2 本
$ ./_benchmark_app.sh MULTI:MYRIAD.3.4.1-ma2480,MYRIAD.3.4.3-ma2480 ~/model/public/FP16/squeezenet1.1.xml
[benchmark_app.sh] 'benchmark_app' Run !!
'command: ./benchmark_app -d MULTI:MYRIAD.3.4.1-ma2480,MYRIAD.3.4.3-ma2480 -m /home/mizutu/model/public/FP16/squeezenet1.1.xml -pc -niter 1000'
:
:
Full device name:
Count: 1000 iterations
Duration: 1755.24 ms
Throughput: 569.72 FPS
● X 3 本
$ ./_hello_query_device.sh
[hello_query_device.sh] 'hello_query_device' Run !!
Available devices:
[E:] [BSL] found 0 ioexpander device
Device: CPU
Metrics:
AVAILABLE_DEVICES:
SUPPORTED_METRICS: AVAILABLE_DEVICES, SUPPORTED_METRICS, FULL_DEVICE_NAME, OPTIMIZATION_CAPABILITIES, SUPPORTED_CONFIG_KEYS, RANGE_FOR_ASYNC_INFER_REQUESTS, RANGE_FOR_STREAMS
FULL_DEVICE_NAME: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
OPTIMIZATION_CAPABILITIES: WINOGRAD, FP32, FP16, INT8, BIN
SUPPORTED_CONFIG_KEYS: CPU_BIND_THREAD, CPU_THREADS_NUM, CPU_THROUGHPUT_STREAMS, DUMP_EXEC_GRAPH_AS_DOT, DYN_BATCH_ENABLED, DYN_BATCH_LIMIT, ENFORCE_BF16, EXCLUSIVE_ASYNC_REQUESTS, PERF_COUNT
RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 1, 1
RANGE_FOR_STREAMS: 1, 8
Default values for device configuration keys:
CPU_BIND_THREAD: YES
CPU_THREADS_NUM: 0
CPU_THROUGHPUT_STREAMS: 1
DUMP_EXEC_GRAPH_AS_DOT:
DYN_BATCH_ENABLED: NO
DYN_BATCH_LIMIT: 0
ENFORCE_BF16: NO
EXCLUSIVE_ASYNC_REQUESTS: NO
PERF_COUNT: NO
Device: GNA
Metrics:
GNA_LIBRARY_FULL_VERSION: 2.0.0.1047
FULL_DEVICE_NAME: GNA_SW
OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1
SUPPORTED_CONFIG_KEYS: EXCLUSIVE_ASYNC_REQUESTS, GNA_COMPACT_MODE, GNA_DEVICE_MODE, GNA_FIRMWARE_MODEL_IMAGE, GNA_FIRMWARE_MODEL_IMAGE_GENERATION, GNA_LIB_N_THREADS, GNA_PRECISION, GNA_PWL_UNIFORM_DESIGN, GNA_SCALE_FACTOR, GNA_SCALE_FACTOR_0, PERF_COUNT, SINGLE_THREAD
SUPPORTED_METRICS: GNA_LIBRARY_FULL_VERSION, FULL_DEVICE_NAME, OPTIMAL_NUMBER_OF_INFER_REQUESTS, SUPPORTED_CONFIG_KEYS, SUPPORTED_METRICS, AVAILABLE_DEVICES
AVAILABLE_DEVICES: GNA_SW
Default values for device configuration keys:
EXCLUSIVE_ASYNC_REQUESTS: NO
GNA_COMPACT_MODE: NO
GNA_DEVICE_MODE: GNA_SW_EXACT
GNA_FIRMWARE_MODEL_IMAGE:
GNA_FIRMWARE_MODEL_IMAGE_GENERATION:
GNA_LIB_N_THREADS: 1
GNA_PRECISION: I16
GNA_PWL_UNIFORM_DESIGN: NO
GNA_SCALE_FACTOR: 1.000000
GNA_SCALE_FACTOR_0: 1.000000
PERF_COUNT: NO
SINGLE_THREAD: YES
Device: GPU
Metrics:
AVAILABLE_DEVICES: 0
SUPPORTED_METRICS: AVAILABLE_DEVICES, SUPPORTED_METRICS, FULL_DEVICE_NAME, OPTIMIZATION_CAPABILITIES, SUPPORTED_CONFIG_KEYS, RANGE_FOR_ASYNC_INFER_REQUESTS, RANGE_FOR_STREAMS
FULL_DEVICE_NAME: Intel(R) Gen12LP HD Graphics (iGPU)
OPTIMIZATION_CAPABILITIES: FP32, BIN, FP16, INT8
SUPPORTED_CONFIG_KEYS: CACHE_DIR, CLDNN_ENABLE_FP16_FOR_QUANTIZED_MODELS, CLDNN_GRAPH_DUMPS_DIR, CLDNN_MEM_POOL, CLDNN_NV12_TWO_INPUTS, CLDNN_PLUGIN_PRIORITY, CLDNN_PLUGIN_THROTTLE, CLDNN_SOURCES_DUMPS_DIR, CONFIG_FILE, DEVICE_ID, DUMP_KERNELS, DYN_BATCH_ENABLED, EXCLUSIVE_ASYNC_REQUESTS, GPU_THROUGHPUT_STREAMS, PERF_COUNT, TUNING_FILE, TUNING_MODE
RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 2, 1
RANGE_FOR_STREAMS: 1, 2
Default values for device configuration keys:
CACHE_DIR:
CLDNN_ENABLE_FP16_FOR_QUANTIZED_MODELS: YES
CLDNN_GRAPH_DUMPS_DIR:
CLDNN_MEM_POOL: YES
CLDNN_NV12_TWO_INPUTS: NO
CLDNN_PLUGIN_PRIORITY: 0
CLDNN_PLUGIN_THROTTLE: 0
CLDNN_SOURCES_DUMPS_DIR:
CONFIG_FILE:
DEVICE_ID:
DUMP_KERNELS: NO
DYN_BATCH_ENABLED: NO
EXCLUSIVE_ASYNC_REQUESTS: NO
GPU_THROUGHPUT_STREAMS: 1
PERF_COUNT: NO
TUNING_FILE:
TUNING_MODE: TUNING_DISABLED
Device: MYRIAD.3.4.1-ma2480
Metrics:
DEVICE_THERMAL: UNSUPPORTED TYPE
OPTIMIZATION_CAPABILITIES: FP16
RANGE_FOR_ASYNC_INFER_REQUESTS: 3, 6, 1
SUPPORTED_METRICS: DEVICE_THERMAL, OPTIMIZATION_CAPABILITIES, RANGE_FOR_ASYNC_INFER_REQUESTS, SUPPORTED_METRICS, SUPPORTED_CONFIG_KEYS, FULL_DEVICE_NAME, AVAILABLE_DEVICES
SUPPORTED_CONFIG_KEYS: DEVICE_ID, EXCLUSIVE_ASYNC_REQUESTS, LOG_LEVEL, VPU_MYRIAD_FORCE_RESET, VPU_MYRIAD_PLATFORM, VPU_CUSTOM_LAYERS, PERF_COUNT, VPU_PRINT_RECEIVE_TENSOR_TIME, CONFIG_FILE, VPU_HW_STAGES_OPTIMIZATION, MYRIAD_THROUGHPUT_STREAMS, MYRIAD_ENABLE_FORCE_RESET, MYRIAD_ENABLE_RECEIVING_TENSOR_TIME, MYRIAD_CUSTOM_LAYERS, MYRIAD_ENABLE_HW_ACCELERATION
FULL_DEVICE_NAME: Intel Movidius Myriad X VPU
AVAILABLE_DEVICES: 3.4.1-ma2480, 3.4.3-ma2480, 3.4.4-ma2480
Default values for device configuration keys:
DEVICE_ID:
EXCLUSIVE_ASYNC_REQUESTS: NO
LOG_LEVEL: LOG_NONE
VPU_MYRIAD_FORCE_RESET: NO
VPU_MYRIAD_PLATFORM:
VPU_CUSTOM_LAYERS:
PERF_COUNT: NO
VPU_PRINT_RECEIVE_TENSOR_TIME: NO
CONFIG_FILE:
VPU_HW_STAGES_OPTIMIZATION: YES
MYRIAD_THROUGHPUT_STREAMS: -1
MYRIAD_ENABLE_FORCE_RESET: NO
MYRIAD_ENABLE_RECEIVING_TENSOR_TIME: NO
MYRIAD_CUSTOM_LAYERS:
MYRIAD_ENABLE_HW_ACCELERATION: YES
Device: MYRIAD.3.4.3-ma2480
Metrics:
DEVICE_THERMAL: UNSUPPORTED TYPE
OPTIMIZATION_CAPABILITIES: FP16
RANGE_FOR_ASYNC_INFER_REQUESTS: 3, 6, 1
SUPPORTED_METRICS: DEVICE_THERMAL, OPTIMIZATION_CAPABILITIES, RANGE_FOR_ASYNC_INFER_REQUESTS, SUPPORTED_METRICS, SUPPORTED_CONFIG_KEYS, FULL_DEVICE_NAME, AVAILABLE_DEVICES
SUPPORTED_CONFIG_KEYS: DEVICE_ID, EXCLUSIVE_ASYNC_REQUESTS, LOG_LEVEL, VPU_MYRIAD_FORCE_RESET, VPU_MYRIAD_PLATFORM, VPU_CUSTOM_LAYERS, PERF_COUNT, VPU_PRINT_RECEIVE_TENSOR_TIME, CONFIG_FILE, VPU_HW_STAGES_OPTIMIZATION, MYRIAD_THROUGHPUT_STREAMS, MYRIAD_ENABLE_FORCE_RESET, MYRIAD_ENABLE_RECEIVING_TENSOR_TIME, MYRIAD_CUSTOM_LAYERS, MYRIAD_ENABLE_HW_ACCELERATION
FULL_DEVICE_NAME: Intel Movidius Myriad X VPU
AVAILABLE_DEVICES: 3.4.1-ma2480, 3.4.3-ma2480, 3.4.4-ma2480
Default values for device configuration keys:
DEVICE_ID:
EXCLUSIVE_ASYNC_REQUESTS: NO
LOG_LEVEL: LOG_NONE
VPU_MYRIAD_FORCE_RESET: NO
VPU_MYRIAD_PLATFORM:
VPU_CUSTOM_LAYERS:
PERF_COUNT: NO
VPU_PRINT_RECEIVE_TENSOR_TIME: NO
CONFIG_FILE:
VPU_HW_STAGES_OPTIMIZATION: YES
MYRIAD_THROUGHPUT_STREAMS: -1
MYRIAD_ENABLE_FORCE_RESET: NO
MYRIAD_ENABLE_RECEIVING_TENSOR_TIME: NO
MYRIAD_CUSTOM_LAYERS:
MYRIAD_ENABLE_HW_ACCELERATION: YES
Device: MYRIAD.3.4.4-ma2480
Metrics:
DEVICE_THERMAL: UNSUPPORTED TYPE
OPTIMIZATION_CAPABILITIES: FP16
RANGE_FOR_ASYNC_INFER_REQUESTS: 3, 6, 1
SUPPORTED_METRICS: DEVICE_THERMAL, OPTIMIZATION_CAPABILITIES, RANGE_FOR_ASYNC_INFER_REQUESTS, SUPPORTED_METRICS, SUPPORTED_CONFIG_KEYS, FULL_DEVICE_NAME, AVAILABLE_DEVICES
SUPPORTED_CONFIG_KEYS: DEVICE_ID, EXCLUSIVE_ASYNC_REQUESTS, LOG_LEVEL, VPU_MYRIAD_FORCE_RESET, VPU_MYRIAD_PLATFORM, VPU_CUSTOM_LAYERS, PERF_COUNT, VPU_PRINT_RECEIVE_TENSOR_TIME, CONFIG_FILE, VPU_HW_STAGES_OPTIMIZATION, MYRIAD_THROUGHPUT_STREAMS, MYRIAD_ENABLE_FORCE_RESET, MYRIAD_ENABLE_RECEIVING_TENSOR_TIME, MYRIAD_CUSTOM_LAYERS, MYRIAD_ENABLE_HW_ACCELERATION
FULL_DEVICE_NAME: Intel Movidius Myriad X VPU
AVAILABLE_DEVICES: 3.4.1-ma2480, 3.4.3-ma2480, 3.4.4-ma2480
Default values for device configuration keys:
DEVICE_ID:
EXCLUSIVE_ASYNC_REQUESTS: NO
LOG_LEVEL: LOG_NONE
VPU_MYRIAD_FORCE_RESET: NO
VPU_MYRIAD_PLATFORM:
VPU_CUSTOM_LAYERS:
PERF_COUNT: NO
VPU_PRINT_RECEIVE_TENSOR_TIME: NO
CONFIG_FILE:
VPU_HW_STAGES_OPTIMIZATION: YES
MYRIAD_THROUGHPUT_STREAMS: -1
MYRIAD_ENABLE_FORCE_RESET: NO
MYRIAD_ENABLE_RECEIVING_TENSOR_TIME: NO
MYRIAD_CUSTOM_LAYERS:
MYRIAD_ENABLE_HW_ACCELERATION: YES
● X 2 本
$ ./_hello_query_device.sh
[hello_query_device.sh] 'hello_query_device' Run !!
Available devices:
[E:] [BSL] found 0 ioexpander device
Device: CPU
Metrics:
AVAILABLE_DEVICES:
SUPPORTED_METRICS: AVAILABLE_DEVICES, SUPPORTED_METRICS, FULL_DEVICE_NAME, OPTIMIZATION_CAPABILITIES, SUPPORTED_CONFIG_KEYS, RANGE_FOR_ASYNC_INFER_REQUESTS, RANGE_FOR_STREAMS
FULL_DEVICE_NAME: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
OPTIMIZATION_CAPABILITIES: WINOGRAD, FP32, FP16, INT8, BIN
SUPPORTED_CONFIG_KEYS: CPU_BIND_THREAD, CPU_THREADS_NUM, CPU_THROUGHPUT_STREAMS, DUMP_EXEC_GRAPH_AS_DOT, DYN_BATCH_ENABLED, DYN_BATCH_LIMIT, ENFORCE_BF16, EXCLUSIVE_ASYNC_REQUESTS, PERF_COUNT
RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 1, 1
RANGE_FOR_STREAMS: 1, 8
Default values for device configuration keys:
CPU_BIND_THREAD: YES
CPU_THREADS_NUM: 0
CPU_THROUGHPUT_STREAMS: 1
DUMP_EXEC_GRAPH_AS_DOT:
DYN_BATCH_ENABLED: NO
DYN_BATCH_LIMIT: 0
ENFORCE_BF16: NO
EXCLUSIVE_ASYNC_REQUESTS: NO
PERF_COUNT: NO
Device: GNA
Metrics:
GNA_LIBRARY_FULL_VERSION: 2.0.0.1047
FULL_DEVICE_NAME: GNA_SW
OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1
SUPPORTED_CONFIG_KEYS: EXCLUSIVE_ASYNC_REQUESTS, GNA_COMPACT_MODE, GNA_DEVICE_MODE, GNA_FIRMWARE_MODEL_IMAGE, GNA_FIRMWARE_MODEL_IMAGE_GENERATION, GNA_LIB_N_THREADS, GNA_PRECISION, GNA_PWL_UNIFORM_DESIGN, GNA_SCALE_FACTOR, GNA_SCALE_FACTOR_0, PERF_COUNT, SINGLE_THREAD
SUPPORTED_METRICS: GNA_LIBRARY_FULL_VERSION, FULL_DEVICE_NAME, OPTIMAL_NUMBER_OF_INFER_REQUESTS, SUPPORTED_CONFIG_KEYS, SUPPORTED_METRICS, AVAILABLE_DEVICES
AVAILABLE_DEVICES: GNA_SW
Default values for device configuration keys:
EXCLUSIVE_ASYNC_REQUESTS: NO
GNA_COMPACT_MODE: NO
GNA_DEVICE_MODE: GNA_SW_EXACT
GNA_FIRMWARE_MODEL_IMAGE:
GNA_FIRMWARE_MODEL_IMAGE_GENERATION:
GNA_LIB_N_THREADS: 1
GNA_PRECISION: I16
GNA_PWL_UNIFORM_DESIGN: NO
GNA_SCALE_FACTOR: 1.000000
GNA_SCALE_FACTOR_0: 1.000000
PERF_COUNT: NO
SINGLE_THREAD: YES
Device: GPU
Metrics:
AVAILABLE_DEVICES: 0
SUPPORTED_METRICS: AVAILABLE_DEVICES, SUPPORTED_METRICS, FULL_DEVICE_NAME, OPTIMIZATION_CAPABILITIES, SUPPORTED_CONFIG_KEYS, RANGE_FOR_ASYNC_INFER_REQUESTS, RANGE_FOR_STREAMS
FULL_DEVICE_NAME: Intel(R) Gen12LP HD Graphics (iGPU)
OPTIMIZATION_CAPABILITIES: FP32, BIN, FP16, INT8
SUPPORTED_CONFIG_KEYS: CACHE_DIR, CLDNN_ENABLE_FP16_FOR_QUANTIZED_MODELS, CLDNN_GRAPH_DUMPS_DIR, CLDNN_MEM_POOL, CLDNN_NV12_TWO_INPUTS, CLDNN_PLUGIN_PRIORITY, CLDNN_PLUGIN_THROTTLE, CLDNN_SOURCES_DUMPS_DIR, CONFIG_FILE, DEVICE_ID, DUMP_KERNELS, DYN_BATCH_ENABLED, EXCLUSIVE_ASYNC_REQUESTS, GPU_THROUGHPUT_STREAMS, PERF_COUNT, TUNING_FILE, TUNING_MODE
RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 2, 1
RANGE_FOR_STREAMS: 1, 2
Default values for device configuration keys:
CACHE_DIR:
CLDNN_ENABLE_FP16_FOR_QUANTIZED_MODELS: YES
CLDNN_GRAPH_DUMPS_DIR:
CLDNN_MEM_POOL: YES
CLDNN_NV12_TWO_INPUTS: NO
CLDNN_PLUGIN_PRIORITY: 0
CLDNN_PLUGIN_THROTTLE: 0
CLDNN_SOURCES_DUMPS_DIR:
CONFIG_FILE:
DEVICE_ID:
DUMP_KERNELS: NO
DYN_BATCH_ENABLED: NO
EXCLUSIVE_ASYNC_REQUESTS: NO
GPU_THROUGHPUT_STREAMS: 1
PERF_COUNT: NO
TUNING_FILE:
TUNING_MODE: TUNING_DISABLED
Device: MYRIAD.3.4.1-ma2480
Metrics:
DEVICE_THERMAL: UNSUPPORTED TYPE
OPTIMIZATION_CAPABILITIES: FP16
RANGE_FOR_ASYNC_INFER_REQUESTS: 3, 6, 1
SUPPORTED_METRICS: DEVICE_THERMAL, OPTIMIZATION_CAPABILITIES, RANGE_FOR_ASYNC_INFER_REQUESTS, SUPPORTED_METRICS, SUPPORTED_CONFIG_KEYS, FULL_DEVICE_NAME, AVAILABLE_DEVICES
SUPPORTED_CONFIG_KEYS: DEVICE_ID, EXCLUSIVE_ASYNC_REQUESTS, LOG_LEVEL, VPU_MYRIAD_FORCE_RESET, VPU_MYRIAD_PLATFORM, VPU_CUSTOM_LAYERS, PERF_COUNT, VPU_PRINT_RECEIVE_TENSOR_TIME, CONFIG_FILE, VPU_HW_STAGES_OPTIMIZATION, MYRIAD_THROUGHPUT_STREAMS, MYRIAD_ENABLE_FORCE_RESET, MYRIAD_ENABLE_RECEIVING_TENSOR_TIME, MYRIAD_CUSTOM_LAYERS, MYRIAD_ENABLE_HW_ACCELERATION
FULL_DEVICE_NAME: Intel Movidius Myriad X VPU
AVAILABLE_DEVICES: 3.4.1-ma2480, 3.4.3-ma2480
Default values for device configuration keys:
DEVICE_ID:
EXCLUSIVE_ASYNC_REQUESTS: NO
LOG_LEVEL: LOG_NONE
VPU_MYRIAD_FORCE_RESET: NO
VPU_MYRIAD_PLATFORM:
VPU_CUSTOM_LAYERS:
PERF_COUNT: NO
VPU_PRINT_RECEIVE_TENSOR_TIME: NO
CONFIG_FILE:
VPU_HW_STAGES_OPTIMIZATION: YES
MYRIAD_THROUGHPUT_STREAMS: -1
MYRIAD_ENABLE_FORCE_RESET: NO
MYRIAD_ENABLE_RECEIVING_TENSOR_TIME: NO
MYRIAD_CUSTOM_LAYERS:
MYRIAD_ENABLE_HW_ACCELERATION: YES
Device: MYRIAD.3.4.3-ma2480
Metrics:
DEVICE_THERMAL: UNSUPPORTED TYPE
OPTIMIZATION_CAPABILITIES: FP16
RANGE_FOR_ASYNC_INFER_REQUESTS: 3, 6, 1
SUPPORTED_METRICS: DEVICE_THERMAL, OPTIMIZATION_CAPABILITIES, RANGE_FOR_ASYNC_INFER_REQUESTS, SUPPORTED_METRICS, SUPPORTED_CONFIG_KEYS, FULL_DEVICE_NAME, AVAILABLE_DEVICES
SUPPORTED_CONFIG_KEYS: DEVICE_ID, EXCLUSIVE_ASYNC_REQUESTS, LOG_LEVEL, VPU_MYRIAD_FORCE_RESET, VPU_MYRIAD_PLATFORM, VPU_CUSTOM_LAYERS, PERF_COUNT, VPU_PRINT_RECEIVE_TENSOR_TIME, CONFIG_FILE, VPU_HW_STAGES_OPTIMIZATION, MYRIAD_THROUGHPUT_STREAMS, MYRIAD_ENABLE_FORCE_RESET, MYRIAD_ENABLE_RECEIVING_TENSOR_TIME, MYRIAD_CUSTOM_LAYERS, MYRIAD_ENABLE_HW_ACCELERATION
FULL_DEVICE_NAME: Intel Movidius Myriad X VPU
AVAILABLE_DEVICES: 3.4.1-ma2480, 3.4.3-ma2480
Default values for device configuration keys:
DEVICE_ID:
EXCLUSIVE_ASYNC_REQUESTS: NO
LOG_LEVEL: LOG_NONE
VPU_MYRIAD_FORCE_RESET: NO
VPU_MYRIAD_PLATFORM:
VPU_CUSTOM_LAYERS:
PERF_COUNT: NO
VPU_PRINT_RECEIVE_TENSOR_TIME: NO
CONFIG_FILE:
VPU_HW_STAGES_OPTIMIZATION: YES
MYRIAD_THROUGHPUT_STREAMS: -1
MYRIAD_ENABLE_FORCE_RESET: NO
MYRIAD_ENABLE_RECEIVING_TENSOR_TIME: NO
MYRIAD_CUSTOM_LAYERS:
MYRIAD_ENABLE_HW_ACCELERATION: YES
● X 3 本
$ ./_benchmark_app.sh MULTI:MYRIAD.3.4.1-ma2480,MYRIAD.3.4.3-ma2480,MYRIAD.3.4.4-ma2480 ~/model/public/FP16/squeezenet1.1.xml
[benchmark_app.sh] 'benchmark_app' Run !!
'command: ./benchmark_app -d MULTI:MYRIAD.3.4.1-ma2480,MYRIAD.3.4.3-ma2480,MYRIAD.3.4.4-ma2480 -m /home/mizutu/model/public/FP16/squeezenet1.1.xml -pc -niter 1000'
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
API version ............ 2.1
Build .................. 2021.3.0-2787-60059f2c755-releases/2021/3
Description ....... API
[ INFO ] Device info:
MULTI
MultiDevicePlugin version ......... 2.1
Build ........... 2021.3.0-2787-60059f2c755-releases/2021/3
MYRIAD
myriadPlugin version ......... 2.1
Build ........... 2021.3.0-2787-60059f2c755-releases/2021/3
[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for MYRIAD device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[ WARNING ] -nstreams default value is determined automatically for MYRIAD device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[ WARNING ] -nstreams default value is determined automatically for MYRIAD device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 11.91 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 7218.07 ms
[Step 8/11] Setting optimal runtime parameters
[ WARNING ] Number of iterations was aligned by request number from 1000 to 1008 using number of requests 12
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'data' precision U8, dimensions (NCHW): 1 3 227 227
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 1 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 2 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 3 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 4 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 5 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 6 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 7 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 8 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 9 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 10 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 11 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 12 inference requests, limits: 1008 iterations)
[ INFO ] First inference took 9.10 ms
[Step 11/11] Dumping statistics report
[ INFO ] Performance counts for 0-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 1-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 2-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 3-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 4-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 5-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 6-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 7-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 8-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 9-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 10-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 11-th infer request:
Total time: 0 microseconds
Full device name:
Count: 1008 iterations
Duration: 1180.91 ms
Throughput: 853.58 FPS
● X 2 本
$ ./_benchmark_app.sh MULTI:MYRIAD.3.4.1-ma2480,MYRIAD.3.4.3-ma2480 ~/model/public/FP16/squeezenet1.1.xml
[benchmark_app.sh] 'benchmark_app' Run !!
'command: ./benchmark_app -d MULTI:MYRIAD.3.4.1-ma2480,MYRIAD.3.4.3-ma2480 -m /home/mizutu/model/public/FP16/squeezenet1.1.xml -pc -niter 1000'
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[Step 2/11] Loading Inference Engine
[ INFO ] InferenceEngine:
API version ............ 2.1
Build .................. 2021.3.0-2787-60059f2c755-releases/2021/3
Description ....... API
[ INFO ] Device info:
MULTI
MultiDevicePlugin version ......... 2.1
Build ........... 2021.3.0-2787-60059f2c755-releases/2021/3
MYRIAD
myriadPlugin version ......... 2.1
Build ........... 2021.3.0-2787-60059f2c755-releases/2021/3
[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for MYRIAD device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[ WARNING ] -nstreams default value is determined automatically for MYRIAD device. Although the automatic selection usually provides a reasonable performance, but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading network files
[ INFO ] Loading network files
[ INFO ] Read network took 16.48 ms
[Step 5/11] Resizing network to match image sizes and given batch
[ INFO ] Network batch size: 1
[Step 6/11] Configuring input of the model
[Step 7/11] Loading the model to the device
[ INFO ] Load network took 4903.99 ms
[Step 8/11] Setting optimal runtime parameters
[Step 9/11] Creating infer requests and filling input blobs with images
[ INFO ] Network input 'data' precision U8, dimensions (NCHW): 1 3 227 227
[ WARNING ] No input files were given: all inputs will be filled with random values!
[ INFO ] Infer Request 0 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 1 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 2 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 3 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 4 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 5 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 6 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[ INFO ] Infer Request 7 filling
[ INFO ] Fill input 'data' with random values (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests, limits: 1000 iterations)
[ INFO ] First inference took 8.91 ms
[Step 11/11] Dumping statistics report
[ INFO ] Performance counts for 0-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 1-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 2-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 3-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 4-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 5-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 6-th infer request:
Total time: 0 microseconds
Full device name:
[ INFO ] Performance counts for 7-th infer request:
Total time: 0 microseconds
Full device name:
Count: 1000 iterations
Duration: 1755.24 ms
Throughput: 569.72 FPS
実行ディレクトリへ
$ cd ~/inference_engine_samples_build/intel64/Release
● Core™ i7-6700
- チューニング前
・main memory: 2048MB
・proccessor: 1
./benchmark_app -d CPU -m ~/model/public/FP32/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
Count: 1000 iterations
Duration: 9760.39 ms
Latency: 8.54 ms
Throughput: 102.45 FPS
- チューニング後
・main memory: 11288MB
・proccessor: 4
./benchmark_app -d CPU -m ~/model/public/FP32/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
Count: 1000 iterations
Duration: 2932.73 ms
Latency: 9.90 ms
Throughput: 340.98 FPS
● Core™ i7-2620M
- チューニング前
・main memory: 2048MB
・proccessor: 1
./benchmark_app -d CPU -m ~/model/public/FP32/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
Count: 1000 iterations
Duration: 34790.98 ms
Latency: 33.33 ms
Throughput: 28.74 FPS
- チューニング後
・main memory: 11288MB
・proccessor: 4
./benchmark_app -d CPU -m ~/model/public/FP32/squeezenet1.1.xml -pc -niter 1000
:
Full device name: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz
Count: 1000 iterations
Duration: 24671.35 ms
Latency: 22.21 ms
Throughput: 40.53 FPS