私的AI研究会 > OpenVINO9
Open Model Zooと呼ばれる学習済みモデルファイルのアーカイブから Model Optimizer で学習済みモデルの変換を行いディープラーニング推論を実習する。
Open Model Zoo の public アーカイブから Caffe の学習済み物体検知モデル「mobilenet-ssd」を使って、人を検知した時は赤色の枠、車を検知したときは黄色の枠で囲うアプリケーションを作成する。
~/work$ python3 $INTEL_OPENVINO_DIR/deployment_tools/tools/model_downloader/downloader.py --name mobilenet-ssd ################|| Downloading mobilenet-ssd ||################ ========== Downloading /home/mizutu/work/public/mobilenet-ssd/mobilenet-ssd.prototxt ... 100%, 28 KB, 100614 KB/s, 0 seconds passed ========== Downloading /home/mizutu/work/public/mobilenet-ssd/mobilenet-ssd.caffemodel ... 100%, 22605 KB, 13653 KB/s, 1 seconds passed
~/work$ ls ./public/mobilenet-ssd/* ./public/mobilenet-ssd/mobilenet-ssd.caffemodel ./public/mobilenet-ssd/mobilenet-ssd.prototxt
~/work$ python3 $INTEL_OPENVINO_DIR/deployment_tools/tools/model_downloader/converter.py --name mobilenet-ssd --precisions FP16 ========== Converting mobilenet-ssd to IR (FP16) : Model Optimizer arguments: Common parameters: - Path to the Input Model: /home/mizutu/work/public/mobilenet-ssd/mobilenet-ssd.caffemodel - Path for generated IR: /home/mizutu/work/public/mobilenet-ssd/FP16 - IR output name: mobilenet-ssd - Log level: ERROR - Batch: Not specified, inherited from the model - Input layers: data - Output layers: detection_out - Input shapes: [1,3,300,300] - Mean values: data[127.5,127.5,127.5] - Scale values: data[127.5] - Scale factor: Not specified - Precision of IR: FP16 - Enable fusing: True - Enable grouped convolutions fusing: True - Move mean values to preprocess section: None - Reverse input channels: False Caffe specific parameters: - Path to Python Caffe* parser generated from caffe.proto: /opt/intel/openvino_2021/deployment_tools/model_optimizer/mo/front/caffe/proto - Enable resnet optimization: True - Path to the Input prototxt: /home/mizutu/work/public/mobilenet-ssd/mobilenet-ssd.prototxt - Path to CustomLayersMapping.xml: Default - Path to a mean file: Not specified - Offsets for a mean file: Not specified Model Optimizer version: : [ SUCCESS ] Total execution time: 8.14 seconds. [ SUCCESS ] Memory consumed: 365 MB.
~/work$ ls ./public/mobilenet-ssd/FP16/* mobilenet-ssd.bin mobilenet-ssd.mapping mobilenet-ssd.xml
~/work$ python3 $INTEL_OPENVINO_DIR/deployment_tools/tools/model_downloader/downloader.py --all
~/work$ cp $INTEL_OPENVINO_DIR/deployment_tools/open_model_zoo/demos/python_demos/voc_labels.txt .
~/work$ cat voc_labels.txt background aeroplane bicycle bird boat bottle bus car cat chair cow diningtable dog horse motorbike person pottedplant sheep sofa train tvmonitor
Input Original model Image, name - prob, shape - 1,3,300,300, format is B,C,H,W where: B - batch size C - channel H - height W - width Channel order is BGR. Mean values - [127.5, 127.5, 127.5], scale value - 127.5. Converted model Image, name - prob, shape - 1,3,300,300, format is B,C,H,W where: B - batch size C - channel H - height W - width Channel order is BGR※ 入力 Name 間違っているようだ。'prob' → 'data'
Output Original model The array of detection summary info, name - detection_out, shape - 1, 1, N, 7, where N is the number of detected bounding boxes. For each detection, the description has the format: [image_id, label, conf, x_min, y_min, x_max, y_max], where: image_id - ID of the image in the batch label - predicted class ID conf - confidence for the predicted class (x_min, y_min) - coordinates of the top left bounding box corner (coordinates are in normalized format, in range [0, 1]) (x_max, y_max) - coordinates of the bottom right bounding box corner (coordinates are in normalized format, in range [0, 1]) Converted model The array of detection summary info, name - detection_out, shape - 1, 1, N, 7, where N is the number of detected bounding boxes. For each detection, the description has the format: [image_id, label, conf, x_min, y_min, x_max, y_max], where: image_id - ID of the image in the batch label - predicted class ID conf - confidence for the predicted class (x_min, y_min) - coordinates of the top left bounding box corner (coordinates are in normalized format, in range [0, 1]) (x_max, y_max) - coordinates of the bottom right bounding box corner (coordinates are in normalized format, in range [0, 1])
※ 以下の操作は Raspberry Pi 上で行う。
vi object_detect.py # -*- coding: utf-8 -*- #%matplotlib inline import cv2 import matplotlib.pyplot as plt import numpy as np from openvino.inference_engine import IECore label = open('voc_labels.txt').readlines() print(label) # Inference Engineコアオブジェクトの生成 ie = IECore() # IRモデルファイルの読み込み model = './public/mobilenet-ssd/FP16/mobilenet-ssd' net = ie.read_network(model=model+'.xml', weights=model+'.bin') # 入出力blobの名前の取得、入力blobのシェイプの取得 input_blob_name = net.input_info['data'].name output_blob_name = next(iter(net.outputs)) batch,channel,height,width = net.input_info[input_blob_name].input_data.shape exec_net = ie.load_network(network=net, device_name='MYRIAD', num_requests=1) def infer(path): print('input blob: name="{}", N={}, C={}, H={}, W={}'.format(input_blob_name, batch, channel, height, width)) img = cv2.imread(path) in_img = cv2.resize(img, (width,height)) in_img = in_img.transpose((2, 0, 1)) in_img = in_img.reshape((1, channel, height, width)) return img, exec_net.infer(inputs={input_blob_name: in_img}) def show(img, res): print('output blob: name="{}", shape={}'.format(output_blob_name, net.outputs[output_blob_name].shape)) result = res[output_blob_name][0][0] img_h, img_w, _ = img.shape for obj in result: imgid, clsid, confidence, x1, y1, x2, y2 = obj if confidence>0.6: x1 = int(x1 * img_w) y1 = int(y1 * img_h) x2 = int(x2 * img_w) y2 = int(y2 * img_h) color = (0,255,0) if label[int(clsid)][:-1] == 'car': color = (0,255,255) elif label[int(clsid)][:-1] == 'person': color = (0,0,255) cv2.rectangle(img, (x1, y1), (x2, y2), color, thickness=4 ) cv2.putText(img, label[int(clsid)][:-1], (x1, y1), cv2.FONT_HERSHEY_PLAIN, fontScale=4, color=color, thickness=4) # %matplotlib inline import matplotlib.pyplot as plt img=cv2.cvtColor(img, cv2.COLOR_BGR2RGB) plt.imshow(img) plt.show() img, res = infer('./image/car.jpg') show(img, res)
pi@raspberrypi:~/workspace $ python3 object_detect.py ['background\n', 'aeroplane\n', 'bicycle\n', 'bird\n', 'boat\n', 'bottle\n', 'bus\n', 'car\n', 'cat\n', 'chair\n', 'cow\n', 'diningtable\n', 'dog\n', 'horse\n', 'motorbike\n', 'person\n', 'pottedplant\n', 'sheep\n', 'sofa\n', 'train\n', 'tvmonitor'] input blob: name="data", N=1, C=3, H=300, W=300 output blob: name="detection_out", shape=[1, 1, 100, 7]
引用プログラムをこれまでの形で作り直してみる。
vi object_detect1.py # -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ model -mobilenet-ssd- ## ** Object Detect ** ## 2021.01.18 Masahiro Izutsu ## ## 2021.02.10 warning error ##------------------------------------------ import cv2 import numpy as np # モジュール読み込み from openvino.inference_engine import IECore # ラベル読み込み label = open('voc_labels.txt').readlines() print(label) # Inference Engineコアオブジェクトの生成 ie = IECore() # IRモデルファイルの読み込み model = './public/mobilenet-ssd/FP16/mobilenet-ssd' net = ie.read_network(model=model+'.xml', weights=model+'.bin') # 入出力blobの名前の取得、入力blobのシェイプの取得 input_blob_name = net.input_info['data'].name output_blob_name = next(iter(net.outputs)) batch,channel,height,width = net.input_info[input_blob_name].input_data.shape exec_net = ie.load_network(network=net, device_name='MYRIAD', num_requests=1) # 入力画像読み込み frame = cv2.imread('./image/car-person.jpg') # 入力データフォーマットへ変換 img = cv2.resize(frame, (width,height)) img = img.transpose((2, 0, 1)) img = img.reshape((1, channel, height, width)) # 推論実行 out = exec_net.infer(inputs={input_blob_name: img}) # 出力から必要なデータのみ取り出し print('output blob: name="{}", shape={}'.format(output_blob_name, net.outputs[output_blob_name].shape)) result = out[output_blob_name][0][0] img_h, img_w, _ = frame.shape # 検出されたすべてのオブジェクトに対して1つずつ処理 for obj in result: imgid, clsid, confidence, x1, y1, x2, y2 = obj if confidence>0.6: x1 = int(x1 * img_w) y1 = int(y1 * img_h) x2 = int(x2 * img_w) y2 = int(y2 * img_h) color = (0,255,0) if label[int(clsid)][:-1] == 'car': color = (0,255,255) elif label[int(clsid)][:-1] == 'person': color = (0,0,255) cv2.rectangle(frame, (x1, y1), (x2, y2), color, thickness=2 ) cv2.putText(frame, label[int(clsid)][:-1], (x1, y1), cv2.FONT_HERSHEY_PLAIN, fontScale=2, color=color, thickness=2) # 画像表示 cv2.imshow('Object-Detect', frame) # キーが押されたら終了 cv2.waitKey(0) cv2.destroyAllWindows() pi@raspberrypi-mas:~/workspace $
~/workspace $ python3 object_detect1.py ['background\n', 'aeroplane\n', 'bicycle\n', 'bird\n', 'boat\n', 'bottle\n', 'bus\n', 'car\n', 'cat\n', 'chair\n', 'cow\n', 'diningtable\n', 'dog\n', 'horse\n', 'motorbike\n', 'person\n', 'pottedplant\n', 'sheep\n', 'sofa\n', 'train\n', 'tvmonitor\n'] object_detect1.py:18: DeprecationWarning: 'inputs' property of IENetwork class is deprecated. To access DataPtrs user need to use 'input_data' property of InputInfoPtr objects which can be accessed by 'input_info' property. input_blob_name = list(net.inputs.keys())[0] output blob: name="detection_out", shape=[1, 1, 100, 7]
object_detect1.py の検出オブジェクトを日本語で表示する。
vi object_detect1_jp.py # -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ model -mobilenet-ssd- ## ** Object Detect ** japanease ## 2021.01.18 Masahiro Izutsu ## ## 2021.02.10 warning error ##------------------------------------------ import cv2 import numpy as np import myfunction # モジュール読み込み from openvino.inference_engine import IECore # ラベル読み込み label = open('voc_labels.txt').readlines() label_jp = open('voc_labels_jp.txt').readlines() # 日本語フォント指定 fontPIL = 'NotoSansCJK-Bold.ttc' # Inference Engineコアオブジェクトの生成 ie = IECore() # IRモデルファイルの読み込み model = './public/mobilenet-ssd/FP16/mobilenet-ssd' net = ie.read_network(model=model+'.xml', weights=model+'.bin') # 入出力blobの名前の取得、入力blobのシェイプの取得 input_blob_name = net.input_info['data'].name output_blob_name = next(iter(net.outputs)) batch,channel,height,width = net.input_info[input_blob_name].input_data.shape exec_net = ie.load_network(network=net, device_name='MYRIAD', num_requests=1) # 入力画像読み込み frame = cv2.imread('./image/car-person.jpg') # 入力データフォーマットへ変換 img = cv2.resize(frame, (width,height)) img = img.transpose((2, 0, 1)) img = img.reshape((1, channel, height, width)) # 推論実行 out = exec_net.infer(inputs={input_blob_name: img}) # 出力から必要なデータのみ取り出し print('output blob: name="{}", shape={}'.format(output_blob_name, net.outputs[output_blob_name].shape)) result = out[output_blob_name][0][0] img_h, img_w, _ = frame.shape # 検出されたすべてのオブジェクトに対して1つずつ処理 for obj in result: imgid, clsid, confidence, x1, y1, x2, y2 = obj if confidence>0.6: x1 = int(x1 * img_w) y1 = int(y1 * img_h) x2 = int(x2 * img_w) y2 = int(y2 * img_h) color = (0,255,0) if label[int(clsid)][:-1] == 'car': color = (0,255,255) elif label[int(clsid)][:-1] == 'person': color = (0,0,255) cv2.rectangle(frame, (x1, y1), (x2, y2), color, thickness=2 ) # cv2.putText(frame, label[int(clsid)][:-1], (x1, y1), cv2.FONT_HERSHEY_PLAIN, fontScale=2, color=color, thickness=2) myfunction.cv2_putText(img = frame, text = label_jp[int(clsid)][:-1], org = (x1, y1), fontFace = fontPIL, fontScale = 12, color = color, mode = 0) # 画像表示 cv2.imshow('Object-Detect', frame) # キーが押されたら終了 cv2.waitKey(0) cv2.destroyAllWindows() pi@raspberrypi-mas:~/workspace $
pi@raspberrypi:~/workspace $ python3 object_detect1_jp.py input blob: name="data", N=1, C=3, H=300, W=300 output blob: name="detection_out", shape=[1, 1, 100, 7]
内蔵カメラでリアルタイムに物体認識を行う。
labels | -- |
background | 背景 |
aeroplane | 飛行機 |
bicycle | 自転車 |
bird | 鳥 |
boat | ボート |
bottle | ボトル |
bus | バス |
car | 車 |
cat | 猫 |
chair | 椅子 |
cow | 牛 |
diningtable | ダイニングテーブル |
dog | 犬 |
horse | 馬 |
motorbike | バイク |
person | 人 |
pottedplant | 鉢植え |
sheep | 羊 |
sofa | ソファー |
train | 列車 |
tvmonitor | テレビ |
vi object_detect2.py # -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ model -mobilenet-ssd- ## ** Object Detect ** camera ## 2021.01.18 Masahiro Izutsu ## ## 2021.02.10 warning error ##------------------------------------------ import cv2 import numpy as np # モジュール読み込み from openvino.inference_engine import IECore # ラベル読み込み label = open('voc_labels.txt').readlines() print(label) # Inference Engineコアオブジェクトの生成 ie = IECore() # IRモデルファイルの読み込み model = './public/mobilenet-ssd/FP16/mobilenet-ssd' net = ie.read_network(model=model+'.xml', weights=model+'.bin') # 入出力blobの名前の取得、入力blobのシェイプの取得 input_blob_name = net.input_info['data'].name output_blob_name = list(net.outputs.keys())[0] batch,channel,height,width = net.input_info[input_blob_name].input_data.shape exec_net = ie.load_network(network=net, device_name='MYRIAD', num_requests=1) # カメラ準備 cap = cv2.VideoCapture(0) # メインループ while True: ret, frame = cap.read() # Reload on error if ret == False: continue # 入力データフォーマットへ変換 img = cv2.resize(frame, (width,height)) img = img.transpose((2, 0, 1)) img = img.reshape((1, channel, height, width)) # 推論実行 out = exec_net.infer(inputs={input_blob_name: img}) # 出力から必要なデータのみ取り出し result = out[output_blob_name][0][0] img_h, img_w, _ = frame.shape # 検出されたすべてのオブジェクトに対して1つずつ処理 for obj in result: imgid, clsid, confidence, x1, y1, x2, y2 = obj # conf値が0.6より大きい場合のみバウンディングボックス表示 if confidence>0.6: x1 = int(x1 * img_w) y1 = int(y1 * img_h) x2 = int(x2 * img_w) y2 = int(y2 * img_h) color = (0,255,0) if label[int(clsid)][:-1] == 'car': color = (0,255,255) elif label[int(clsid)][:-1] == 'person': color = (0,0,255) cv2.rectangle(frame, (x1, y1), (x2, y2), color, thickness=2 ) cv2.putText(frame, label[int(clsid)][:-1], (x1, y1), cv2.FONT_HERSHEY_PLAIN, fontScale=2, color=color, thickness=2) # 画像表示 cv2.imshow('Object-Detect', frame) # 何らかのキーが押されたら終了 key = cv2.waitKey(1) if key != -1: break # 終了処理 cap.release() cv2.destroyAllWindows()
pi@raspberrypi:~/workspace $ python3 object_detect2.py ['background\n', 'aeroplane\n', 'bicycle\n', 'bird\n', 'boat\n', 'bottle\n', 'bus\n', 'car\n', 'cat\n', 'chair\n', 'cow\n', 'diningtable\n', 'dog\n', 'horse\n', 'motorbike\n', 'person\n', 'pottedplant\n', 'sheep\n', 'sofa\n', 'train\n', 'tvmonitor']
vi object_detect2_jp.py # -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ model -mobilenet-ssd- ## ** Object Detect ** camera japanease ## 2021.01.18 Masahiro Izutsu ## ## 2021.02.10 warning error ##------------------------------------------ import cv2 import numpy as np import myfunction # モジュール読み込み from openvino.inference_engine import IECore # ラベル読み込み label = open('voc_labels.txt').readlines() label_jp = open('voc_labels_jp.txt').readlines() print(label, label_jp) # Inference Engineコアオブジェクトの生成 ie = IECore() # IRモデルファイルの読み込み model = './public/mobilenet-ssd/FP16/mobilenet-ssd' net = ie.read_network(model=model+'.xml', weights=model+'.bin') # 入出力blobの名前の取得、入力blobのシェイプの取得 input_blob_name = net.input_info['data'].name output_blob_name = list(net.outputs.keys())[0] batch,channel,height,width = net.input_info[input_blob_name].input_data.shape exec_net = ie.load_network(network=net, device_name='MYRIAD', num_requests=1) # カメラ準備 cap = cv2.VideoCapture(0) # メインループ while True: ret, frame = cap.read() # Reload on error if ret == False: continue # 入力データフォーマットへ変換 img = cv2.resize(frame, (width,height)) img = img.transpose((2, 0, 1)) img = img.reshape((1, channel, height, width)) # 推論実行 out = exec_net.infer(inputs={input_blob_name: img}) # 出力から必要なデータのみ取り出し result = out[output_blob_name][0][0] img_h, img_w, _ = frame.shape # 検出されたすべてのオブジェクトに対して1つずつ処理 for obj in result: imgid, clsid, confidence, x1, y1, x2, y2 = obj # conf値が0.6より大きい場合のみバウンディングボックス表示 if confidence>0.6: x1 = int(x1 * img_w) y1 = int(y1 * img_h) x2 = int(x2 * img_w) y2 = int(y2 * img_h) color = (0,255,0) if label[int(clsid)][:-1] == 'car': color = (0,255,255) elif label[int(clsid)][:-1] == 'person': color = (0,0,255) cv2.rectangle(frame, (x1, y1), (x2, y2), color, thickness=2 ) myfunction.cv2_putText(img = frame, text = label_jp[int(clsid)][:-1], org = (x1, y1), fontFace = fontPIL, fontScale = 12, color = color, mode = 0) # 画像表示 cv2.imshow('Object-Detect', frame) # 何らかのキーが押されたら終了 key = cv2.waitKey(1) if key != -1: break # 終了処理 cap.release() cv2.destroyAllWindows()
pi@raspberrypi:~/workspace $ python3 object_detect2_jp.py ['background\n', 'aeroplane\n', 'bicycle\n', 'bird\n', 'boat\n', 'bottle\n', 'bus\n', 'car\n', 'cat\n', 'chair\n', 'cow\n', 'diningtable\n', 'dog\n', 'horse\n', 'motorbike\n', 'person\n', 'pottedplant\n', 'sheep\n', 'sofa\n', 'train\n', 'tvmonitor'] ['背景\n', '飛行機\n', '自転車\n', '鳥\n', 'ボート\n', 'ボトル\n', 'バス\n', '車\n', '猫\n', '椅子\n', '牛\n', 'ダイニングテーブル\n', '犬\n', '馬\n', 'バイク\n', '人\n', '鉢植え\n', '羊\n', 'ソファー\n', '列車\n', 'テレビ']
動画ファイルに対して物体認識を行う。「YOLO v3」のサンプルビデオ(champs-elysees.mp4)を試す。
$ cp object_detect2.py object_detect3.py
# カメラ準備 cap = cv2.VideoCapture('../Videos/champs-elysees.mp4')
# Reload on error if ret == False: print('File End') break
pi@raspberrypi:~/workspace $ python3 object_detect3.py ['background\n', 'aeroplane\n', 'bicycle\n', 'bird\n', 'boat\n', 'bottle\n', 'bus\n', 'car\n', 'cat\n', 'chair\n', 'cow\n', 'diningtable\n', 'dog\n', 'horse\n', 'motorbike\n', 'person\n', 'pottedplant\n', 'sheep\n', 'sofa\n', 'train\n', 'tvmonitor'] [ WARN:0] global ../opencv/modules/videoio/src/cap_gstreamer.cpp (919) open OpenCV | GStreamer warning: unable to query duration of stream [ WARN:0] global ../opencv/modules/videoio/src/cap_gstreamer.cpp (956) open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1
$ cp object_detect2_jp.py object_detect3_jp.py
# カメラ準備 cap = cv2.VideoCapture('../Videos/champs-elysees.mp4')
# Reload on error if ret == False: print('File End') break
pi@raspberrypi:~/workspace $ python3 object_detect3_jp.py ['background\n', 'aeroplane\n', 'bicycle\n', 'bird\n', 'boat\n', 'bottle\n', 'bus\n', 'car\n', 'cat\n', 'chair\n', 'cow\n', 'diningtable\n', 'dog\n', 'horse\n', 'motorbike\n', 'person\n', 'pottedplant\n', 'sheep\n', 'sofa\n', 'train\n', 'tvmonitor'] ['背景\n', '飛行機\n', '自転車\n', '鳥\n', 'ボート\n', 'ボトル\n', 'バス\n', '車\n', '猫\n', '椅子\n', '牛\n', 'ダイニングテーブル\n', '犬\n', '馬\n', 'バイク\n', '人\n', '鉢植え\n', '羊\n', 'ソファー\n', '列車\n', 'テレビ'] [ WARN:0] global ../opencv/modules/videoio/src/cap_gstreamer.cpp (919) open OpenCV | GStreamer warning: unable to query duration of stream [ WARN:0] global ../opencv/modules/videoio/src/cap_gstreamer.cpp (956) open OpenCV | GStreamer warning: Cannot query video position: status=1, value=0, duration=-1
classification3.py:15: DeprecationWarning: 'inputs' property of IENetwork class is deprecated. To access DataPtrs user need to use 'input_data' property of InputInfoPtr objects which can be accessed by 'input_info' property.
Classification3.py:15:DeprecationWarning:IENetworkクラスの 'inputs'プロパティは非推奨になりました。 DataPtrsにアクセスするには、ユーザーは、「input_info」プロパティでアクセスできるInputInfoPtrオブジェクトの「input_data」プロパティを使用する必要があります。
# 入力データと出力データのキーを取得 input_blob = next(iter(net.inputs))
# 入力データと出力データのキーを取得 input_blob = net.input_info['data'].name