OpenVINO7

ゼロから学ぶディープラーニング推論 -学習済みモデル- †

※ 最終更新:2021/02/11　

↑

Intel® OpenVINO™ 学習済みモデルの検証 †

　OpenVINO™のInference Engineを使ってディープラーニング推論を実習する。

↑

Inference Engine で顔画像からランドマーク回帰 (目・鼻・口の位置推定) †

↑

事前準備 †

入力画像

顔画像のサンプル

学習済みモデルの取得
OpenVINO™ ツールキットのバージョンに合った学習済みモデルをダウンロードする。
インストール・バージョンは 2021.2
- 学習済みモデルの場所
  https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/
- 学習済みモデルのドキュメント
  https://docs.openvinotoolkit.org/latest/omz_models_intel_index.html

使用するモデル名
```
landmarks-regression-retail-0009
```

/workspace/FP16 フォルダに学習済みモデルをダウンロードする。

pi@raspberrypi:~/workspace $ cd FP16
pi@raspberrypi:~/workspace/FP16 $ wget --no-check-certificate 
https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/landmarks-regression-retail-0009/FP16/landmarks-regression-retail-0009.bin
--2021-01-11 09:46:34--  
https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/landmarks-regression-retail-0009/FP16/landmarks-regression-retail-0009.bin
download.01.org (download.01.org) をDNSに問いあわせています... 23.41.94.105, 2600:140b:8800:283::4b21, 2600:140b:8800:28f::4b21
download.01.org (download.01.org)|23.41.94.105|:443 に接続しています... 接続しました。
HTTP による接続要求を送信しました、応答を待っています... 200 OK
長さ: 381248 (372K) [application/octet-stream]
`landmarks-regression-retail-0009.bin' に保存中

landmarks-regression-retail-0009 100%[=======================================================>] 372.31K  1.20MB/s 時間 0.3s     

2021-01-11 09:46:41 (1.20 MB/s) - `landmarks-regression-retail-0009.bin' へ保存完了 [381248/381248]

pi@raspberrypi:~/workspace/FP16 $ wget --no-check-certificate 
https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/landmarks-regression-retail-0009/FP16/landmarks-regression-retail-0009.xml
--2021-01-11 09:46:55--  
https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/landmarks-regression-retail-0009/FP16/landmarks-regression-retail-0009.xml
download.01.org (download.01.org) をDNSに問いあわせています... 23.41.94.105, 2600:140b:8800:283::4b21, 2600:140b:8800:28f::4b21
download.01.org (download.01.org)|23.41.94.105|:443 に接続しています... 接続しました。
HTTP による接続要求を送信しました、応答を待っています... 200 OK
長さ: 42842 (42K) [text/xml]
`landmarks-regression-retail-0009.xml' に保存中

landmarks-regression-retail-0009 100%[=======================================================>]  41.84K  --.-KB/s 時間 0.1s     

2021-01-11 09:46:56 (390 KB/s) - `landmarks-regression-retail-0009.xml' へ保存完了 [42842/42842]

pi@raspberrypi:~/workspace/FP16 $ ls
emotions-recognition-retail-0003.bin  face-detection-retail-0004.bin  landmarks-regression-retail-0009.bin
emotions-recognition-retail-0003.xml  face-detection-retail-0004.xml  landmarks-regression-retail-0009.xml

↑

ディープラーニング推論実行 †

プログラム landmarks.py を新規作成

vi landmarks.py

# -*- coding: utf-8 -*-
##------------------------------------------
## OpenVINO™ model -landmarks-regression-retail-0009-
##  ** Face Landmark ** model check
##               2021.01.11 Masahiro Izutsu
##
##               2021.02.10 warning error
##------------------------------------------

# import 
import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込み
ie = IECore()
net = ie.read_network(model='FP16/landmarks-regression-retail-0009.xml', weights='FP16/landmarks-regression-retail-0009.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")

# 入出力データのキー取得 
input_blob = net.input_info['0'].name
out_blob = next(iter(net.outputs))
 
# 画像読み込み 
frame = cv2.imread('image/photo_face.jpg')
 
# 入力データフォーマットへ変換 
img = cv2.resize(frame, (48, 48)) # HeightとWidth変更 
img = img.transpose((2, 0, 1))    # HWC > CHW 
img = np.expand_dims(img, axis=0) # CHW > BCHW 
 
# 推論実行 
out = exec_net.infer(inputs={input_blob: img})
 
# 出力から必要なデータのみ取り出し 
out = out[out_blob]
 
# 不要な次元を削減 
out = np.squeeze(out)
 
# 中身を出力 
print(out)

実行結果

pi@raspberrypi:~/workspace $ python3 landmarks.py
[0.26098633 0.17492676 0.6948242  0.18823242 0.46704102 0.4494629
 0.3083496  0.7006836  0.61865234 0.72509766]

入出力データのフォーマット確認
インテルのサイト → landmarks-regression-retail-0009

入力データ

Inputs
Name: "data" , shape: [1x3x48x48] - An input image in the format [BxCxHxW], where:

B - batch size
C - number of channels
H - image height
W - image width
The expected color order is BGR.

※ 入力 Name は間違っているようだ。'data' → '0'

48x48のカラー画像
フォーマットは BCHW という順番
B は大きさ1の次元

出力データ

Outputs
The net outputs a blob with the shape: [1, 10], containing a row-vector of 10 floating point values for five landmarks coordinates in the form (x0, y0, x1, y1, ..., x4, y4). All the coordinates are normalized to be in range [0,1].

要素数が10個のリスト
値は小数で、５つのランドマーク座標が含まれる
具体的には(x0, y0), (x1, y1), (x2, y2), (x3, y3), (x4, y4) 座標は0.0～1.0の範囲に正規化されている

ランドマークは、２つの目、鼻、２つの唇の位置のこと。

The model predicts five facial landmarks: two eyes, nose, and two lip corners.

入力データを整える
- 画像の読み込み
```
frame = cv2.imread('image/photo_face.jpg')
```
- 入力サイズの48x48にリサイズ
```
img = cv2.resize(frame, (48, 48)) # HeightとWidth変更 
```
- フォーマットは BCHWの順なので、それに合わせる。元々の入力画像は HWC なので、まず CHW に入れ替る。
```
img = img.transpose((2, 0, 1))    # HWC > CHW 
```
- expand_dimsを使って B（大きさ1の次元）を加える。
```
img = np.expand_dims(img, axis=0) # CHW > BCHW 
```
  以上の操作で画像データimgは要求されたフォーマットになる。

出力データを取り出す
- 推論後の結果はoutに辞書型で入っている。キーであるout_blobを使うことで中身の値を取り出すことができる。
```
out = out[out_blob]

[[[[0.29711914]]

 [[0.17468262]]

 [[0.68408203]]

 [[0.18615723]]

 [[0.47851562]]

 [[0.43237305]]

 [[0.34448242]]

 [[0.66308594]]

 [[0.62158203]]

 [[0.69091797]]]]
```
- ほとんどの括弧は大きさが１である次元を示している邪魔なものなので、squeezeを使って次元を削除。
```
out = np.squeeze(out)
```
- これでprintすると先程の出力結果になる。
```
[0.26098633 0.17492676 0.6948242  0.18823242 0.46704102 0.4494629
 0.3083496  0.7006836  0.61865234 0.72509766]
```
- この値は以下の表のような内容を意味している。
  
  Landmarks x座標 y座標
  
  左目 0.26098633 0.17492676
  
  右目 0.6948242 0.18823242
  
  鼻 0.46704102 0.4494629
  
  左唇 0.3083496 0.7006836
  
  右唇 0.61865234 0.72509766
  
  値は正規化されているため、入力画像のx座標の最小～最大を0.0～1.0、y座標の最小～最大を0.0～1.0としたときの座標値。

Landmarks	x座標	y座標
左目	0.26098633	0.17492676
右目	0.6948242	0.18823242
鼻	0.46704102	0.4494629
左唇	0.3083496	0.7006836
右唇	0.61865234	0.72509766

↑

入力画像に出力結果を描画 †

プログラム landmarks1.py を新規作成

vi landmarks1.py

# -*- coding: utf-8 -*-
##------------------------------------------
## OpenVINO™ model -landmarks-regression-retail-0009-
##  ** Face Landmark **
##               2021.01.11 Masahiro Izutsu
##
##               2021.02.10 warning error
##------------------------------------------

# import 
import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込み
ie = IECore()
net = ie.read_network(model='FP16/landmarks-regression-retail-0009.xml', 
weights='FP16/landmarks-regression-retail-0009.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")

# 入出力データのキー取得 
input_blob = net.input_info['0'].name
out_blob = next(iter(net.outputs))
 
# 画像読み込み 
frame = cv2.imread('image/photo_face.jpg')
 
# 入力データフォーマットへ変換 
img = cv2.resize(frame, (48, 48)) # HeightとWidth変更 
img = img.transpose((2, 0, 1))    # HWC > CHW 
img = np.expand_dims(img, axis=0) # CHW > BCHW 
 
# 推論実行 
out = exec_net.infer(inputs={input_blob: img})
 
# 出力から必要なデータのみ取り出し 
out = out[out_blob]
 
# 不要な次元を削減 
out = np.squeeze(out)
 
# 中身を出力 
print(out)

# Landmarks検出位置にcircle表示 
for i in range(0, 10, 2):
    x = int(out[i] * frame.shape[1])
    y = int(out[i+1] * frame.shape[0])
    cv2.circle(frame, (x, y), 10, (89, 199, 243), thickness=-1)
 
# 画像表示 
cv2.imshow('frame', frame)
 
# キーが入力されるまで待つ 
cv2.waitKey(0)
 
# 終了処理 
cv2.destroyAllWindows()

for文について。
- outの要素からx座標とy座標をセットで2個ずつ取り出し、５つの円を描く。
- range(0, 10, 2)という書き方で、0, 2, 4, 6, 8 という要素の構成が得られる。
- out[i]にはx座標、out[i+1]にはy座標が取り出されるが、0.0～1.0に正規化されているため、元の画像サイズの領域に変換する必要がある。
- frame.shape[1]で元画像widthのサイズ、frame.shape[0]でheightのサイズが得られるので、それぞを掛け合わせる。
- cv2.circleは整数値で座標を指定する必要があるため、intを使って整数化している。このようにして得られた x, yの位置にcv2.circleを使って黄色の塗りつぶし円を描く。

実行結果

pi@raspberrypi:~/workspace $ python3 landmarks1.py
[0.34887695 0.3581543  0.6845703  0.37329102 0.5073242  0.56689453
0.36572266 0.71191406 0.6147461  0.7421875 ]

↑

Inference Engine で顔検出 †

↑

事前準備 †

入力画像

顔画像のサンプル1
顔画像のサンプル2

学習済みモデルの取得
OpenVINO™ ツールキットのバージョンに合った学習済みモデルをダウンロードする。
インストール・バージョンは 2021.2
- 学習済みモデルの場所
  https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/
- 学習済みモデルのドキュメント
  https://docs.openvinotoolkit.org/latest/omz_models_intel_index.html

使用するモデル名
```
face-detection-retail-0005
```

/workspace/FP16 フォルダに学習済みモデルをダウンロードする。

pi@raspberrypi:~/workspace $ cd FP16
pi@raspberrypi:~/workspace/FP16 $ wget --no-check-certificate https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/face-detection-retail-0005/FP16/face-detection-retail-0005.bin
--2021-01-11 13:49:11--  https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/face-detection-retail-0005/FP16/face-detection-retail-0005.bin
download.01.org (download.01.org) をDNSに問いあわせています... 23.41.94.105, 2600:140b:8800:283::4b21, 2600:140b:8800:28f::4b21
download.01.org (download.01.org)|23.41.94.105|:443 に接続しています... 接続しました。
HTTP による接続要求を送信しました、応答を待っています... 200 OK
長さ: 2041598 (1.9M) [application/octet-stream]
`face-detection-retail-0005.bin' に保存中

face-detection-retail-0005.bin   100 [=======================================================>]   1.95M  2.41MB/s 時間 0.8s     

2021-01-11 13:49:13 (2.41 MB/s) - `face-detection-retail-0005.bin' へ保存完了 [2041598/2041598]

pi@raspberrypi:~/workspace/FP16 $ wget --no-check-certificate https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/face-detection-retail-0005/FP16/face-detection-retail-0005.xml
--2021-01-11 13:49:31--  https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/face-detection-retail-0005/FP16/face-detection-retail-0005.xml
download.01.org (download.01.org) をDNSに問いあわせています... 23.41.94.105, 2600:140b:8800:283::4b21, 2600:140b:8800:28f::4b21
download.01.org (download.01.org)|23.41.94.105|:443 に接続しています... 接続しました。
HTTP による接続要求を送信しました、応答を待っています... 200 OK
長さ: 159741 (156K) [text/xml]
`face-detection-retail-0005.xml' に保存中

face-detection-retail-0005.xml   100%[=======================================================>] 156.00K   299KB/s 時間 0.5s     

2021-01-11 13:49:32 (299 KB/s) - `face-detection-retail-0005.xml' へ保存完了 [159741/159741]

pi@raspberrypi:~/workspace/FP16 $ ls
emotions-recognition-retail-0003.bin  face-detection-retail-0004.xml  landmarks-regression-retail-0009.bin
emotions-recognition-retail-0003.xml  face-detection-retail-0005.bin  landmarks-regression-retail-0009.xml
face-detection-retail-0004.bin        face-detection-retail-0005.xml

↑

ディープラーニング推論実行 †

プログラム face_detect.py を新規作成

vi face_detect.py

# -*- coding: utf-8 -*-
##------------------------------------------
## OpenVINO™ model -face-detection-retail-0005-
##  ** Face Detect ** model check
##               2021.01.11 Masahiro Izutsu
##
##               2021.02.10 warning error
##------------------------------------------

# import 
import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込み
ie = IECore()
net = ie.read_network(model='FP16/face-detection-retail-0005.xml', weights='FP16/face-detection-retail-0005.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")

# 入出力データのキー取得 
input_blob = net.input_info['input.1'].name
out_blob = next(iter(net.outputs))

# 画像読み込み 
frame = cv2.imread('image/photo.jpg')

# 入力データフォーマットへ変換 
img = cv2.resize(frame, (300, 300)) # HeightとWidth変更 
img = img.transpose((2, 0, 1))      # HWC > CHW 
img = np.expand_dims(img, axis=0)   # CHW > BCHW 

# 推論実行 
out = exec_net.infer(inputs={input_blob: img})

# 出力から必要なデータのみ取り出し 
out = out[out_blob]

# 不要な次元を削減 
out = np.squeeze(out)

# 中身を出力 
print(out)

実行結果

pi@raspberrypi:~/workspace $ python3 face_detect.py
[[ 0.          1.          1.         ...  0.18139648  0.5961914
   0.5332031 ]
 [-1.          0.          0.         ...  0.          0.
   0.        ]
 [ 0.          0.          0.         ...  0.          0.
   0.        ]
 ...
 [ 0.          0.          0.         ...  0.          0.
   0.        ]
 [ 0.          0.          0.         ...  0.          0.
   0.        ]
 [ 0.          0.          0.         ...  0.          0.
   0.        ]]

入出力データのフォーマット確認
インテルのサイト → landmarks-regression-retail-0009

入力データ

Inputs
Name: input, shape: [1x3x300x300] - An input image in the format [BxCxHxW], where:

B - batch size
C - number of channels
H - image height
W - image width
Expected color order: BGR.

※ 入力 Name は間違っているようだ。'input' → 'input.1'

前回との違いは HxW のサイズが 300x300

出力データ

Outputs
The net outputs blob with shape: [1, 1, N, 7], where N is the number of detected bounding boxes. Each detection has the format [image_id, label, conf, x_min, y_min, x_max, y_max], where:

image_id - ID of the image in the batch
label - predicted class ID (1 - face)
conf - confidence for the predicted class
(x_min, y_min) - coordinates of the top left bounding box corner
(x_max, y_max) - coordinates of the bottom right bounding box corner.

要素数 7 個のリストが N 個ある
N は検出したバウンディングボックスの数
7 個の中身は [ image_id, label, conf, x_min, y_min, x_max, y_max ]
image_id はbatchのID番号
label は予測クラスID
conf は顔検出の信頼度
(x_min, y_min) はバウンディングボックスの左上の角座標
(x_max, y_max) はバウンディングボックスの右下の角座標

入力データを整える
- 入力画像
```
frame = cv2.imread('image/photo.jpg')
```
HeightとWidthのサイズを300x300。
```
img = cv2.resize(frame, (300, 300))
```

出力データを取り出す

[[ 0.          1.          1.         ...  0.18713379  0.5932617
  0.53271484]
 [-1.          0.          0.         ...  0.          0.
  0.        ]
 [ 0.          0.          0.         ...  0.          0.
  0.        ]
 ...
 [ 0.          0.          0.         ...  0.          0.
  0.        ]
 [ 0.          0.          0.         ...  0.          0.
  0.        ]
 [ 0.          0.          0.         ...  0.          0.
  0.        ]]

このままだと...で結果が全部見えないので、out[0]のみ表示 print(out[0])。
実行すると、7つの要素を持ったリストが得られる。
```
[0.         1.         1.         0.41064453 0.18713379 0.5932617
0.53271484]
```
この値は以下の表のような内容を意味する。

image_id label conf x_min y_min x_max y_max

0.0 1.0 1.0 0.41064453 0.18713379 0.5932617 0.53271484

image_id	label	conf	x_min	y_min	x_max	y_max
0.0	1.0	1.0	0.41064453	0.18713379	0.5932617	0.53271484

↑

入力画像に出力結果を描画 †

プログラム face_detect1.py を新規作成

vi face_detect1.py

# -*- coding: utf-8 -*-
##------------------------------------------
## OpenVINO™ model -face-detection-retail-0005-
##  ** Face Detect **
##               2021.01.11 Masahiro Izutsu
##
##               2021.02.10 warning error
##------------------------------------------

# import 
import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込み
ie = IECore()
net = ie.read_network(model='FP16/face-detection-retail-0005.xml', weights='FP16/face-detection-retail-0005.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")
 
# 入出力データのキー取得 
input_blob = net.input_info['input.1'].name
out_blob = next(iter(net.outputs))
 
# 画像読み込み 
frame = cv2.imread('image/photo.jpg')
 
# 入力データフォーマットへ変換 
img = cv2.resize(frame, (300, 300)) # HeightとWidth変更 
img = img.transpose((2, 0, 1))      # HWC > CHW 
img = np.expand_dims(img, axis=0)   # CHW > BCHW 
 
# 推論実行 
out = exec_net.infer(inputs={input_blob: img})
 
# 出力から必要なデータのみ取り出し 
out = out[out_blob]
 
# 不要な次元を削減 
out = np.squeeze(out)
 
# 中身を出力 
print(out)
 
# バウンディングボックス座標を入力画像のスケールに変換 
xmin = int(out[0][3] * frame.shape[1])
ymin = int(out[0][4] * frame.shape[0])
xmax = int(out[0][5] * frame.shape[1])
ymax = int(out[0][6] * frame.shape[0])
 
# バウンディングボックス表示 
cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(89, 199, 243), thickness=3)
 
# 画像表示 
cv2.imshow('frame', frame)
 
# キーが入力されるまで待つ 
cv2.waitKey(0)
 
# 終了処理 
cv2.destroyAllWindows()

実行結果

pi@raspberrypi:~/workspace $ python3 face_detect1.py
[[0.         1.         0.9995117  ... 0.3256836  0.7792969  0.53564453]
 [0.         1.         0.95214844 ... 0.34814453 0.37109375 0.5102539 ]
 [0.         1.         0.03417969 ... 0.8173828  0.45581055 0.8671875 ]
 ...
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]

↑

複数の顔検出 †

プログラム face_detect2.py を新規作成

vi face_detect2.py

# -*- coding: utf-8 -*-
##------------------------------------------
## OpenVINO™ model -face-detection-retail-0005-
##  ** Multi Face Detect ** model check
##               2021.01.11 Masahiro Izutsu
##
##               2021.02.10 warning error
##------------------------------------------

# import 
import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込み
ie = IECore()
net = ie.read_network(model='FP16/face-detection-retail-0005.xml', weights='FP16/face-detection-retail-0005.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")
 
# 入出力データのキー取得 
input_blob = net.input_info['input.1'].name
out_blob = next(iter(net.outputs))

# 画像読み込み 
frame = cv2.imread('image/photo2.jpg')

# 入力データフォーマットへ変換 
img = cv2.resize(frame, (300, 300)) # HeightとWidth変更 
img = img.transpose((2, 0, 1))      # HWC > CHW 
img = np.expand_dims(img, axis=0)   # CHW > BCHW 

# 推論実行 
out = exec_net.infer(inputs={input_blob: img})

# 出力から必要なデータのみ取り出し 
out = out[out_blob]

# 不要な次元を削減 
out = np.squeeze(out)

# 中身を20行だけ出力 
for i in range(20):
    print(out[i])

実行結果

pi@raspberrypi:~/workspace $ python3 face_detect2.py
[0.         1.         0.9995117  0.6118164  0.21508789 0.7416992 0.4567871 ]
[0.         1.         0.99902344 0.29467773 0.23339844 0.4387207 0.47314453]
[0.         1.         0.10400391 0.4790039  0.32080078 0.5258789 0.3959961 ]
[0.         1.         0.04589844 0.9506836  0.27416992 1.0029297 0.39916992]
[0.         1.         0.04296875 0.12524414 0.27514648 0.14794922 0.31079102]
[0.         1.         0.04101562 0.49072266 0.26586914 0.5151367 0.30444336]
[0.         1.         0.03564453 0.58203125 0.27001953 0.61816406 0.33496094]
[0.         1.         0.03125    0.58251953 0.31201172 0.62353516 0.3876953 ]
[0.         1.         0.02490234 0.5366211  0.2770996  0.55908203 0.3137207 ]
[0.         1.         0.02490234 0.47216797 0.33911133 0.5175781 0.41723633]
[0.         1.         0.02392578 0.49169922 0.23181152 0.5161133 0.27148438]
[0.         1.         0.02392578 0.6635742  0.62597656 0.81103516 0.78222656]
[0.         1.         0.02197266 0.03396606 0.23803711 0.06161499 0.2775879 ]
[0.         1.         0.02001953 0.5859375  0.36669922 0.6279297 0.4423828 ]
[-1.  0.  0.  0.  0.  0.  0.]
[0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 0. 0. 0.]

[ image_id, label, conf, x_min, y_min, x_max, y_max ]
- 列方向について、0列目から数えるものとする。
- この結果から、顔検出は14箇所で行われていそうだ。
  3列目～6列目のバウンディングボックス座標(x_min y_min x_max y_max)に着目すると、全て0.になっている行以外を数えると14行ある。
  0列目のimage_idに着目すると、基本的には0だが、顔検出できなくなった行で-1になっている。
  1列目のlabelに着目すると、顔検出されている行では1.、それ以外では0.。
  2列目のconfに着目すると、降順（大きい数値から順）に並んでいる。
- 再び2列目のconfに着目。confは顔検出の信用度であり、1.0 に近いほど信用が高いことを表す。最初の２行は0.999でほぼ 1.0 であるのに対し、次の行は0.121となっている。
- 今回は 0.5 という数値にしてみる。この数値のことを閾値（しきいち）と呼ぶ。

↑

複数の顔検出結果を表示 †

プログラム face_detect3.py を新規作成

vi face_detect3.py

# -*- coding: utf-8 -*-
##------------------------------------------
## OpenVINO™ model -face-detection-retail-0005-
##  ** Multi Face Detect
##               2021.01.11 Masahiro Izutsu
##
##               2021.02.10 warning error
##------------------------------------------

# import 
import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込み
ie = IECore()
net = ie.read_network(model='FP16/face-detection-retail-0005.xml', weights='FP16/face-detection-retail-0005.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")

# 入出力データのキー取得 
input_blob = net.input_info['input.1'].name
out_blob = next(iter(net.outputs))

# 画像読み込み 
frame = cv2.imread('image/photo2.jpg')

# 入力データフォーマットへ変換 
img = cv2.resize(frame, (300, 300)) # HeightとWidth変更 
img = img.transpose((2, 0, 1))      # HWC > CHW 
img = np.expand_dims(img, axis=0)   # CHW > BCHW 

# 推論実行 
out = exec_net.infer(inputs={input_blob: img})

# 出力から必要なデータのみ取り出し 
out = out[out_blob]

# 不要な次元を削減 
out = np.squeeze(out)

# 検出されたすべての顔領域に対して１つずつ処理 
for detection in out:
    # conf値の取得 
    confidence = float(detection[2])

    # バウンディングボックス座標を入力画像のスケールに変換 
    xmin = int(detection[3] * frame.shape[1])
    ymin = int(detection[4] * frame.shape[0])
    xmax = int(detection[5] * frame.shape[1])
    ymax = int(detection[6] * frame.shape[0])

    # conf値が0.5より大きい場合のみバウンディングボックス表示 
    if confidence > 0.5:
        # バウンディングボックス表示 
        cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(89, 199, 243), thickness=3)

# 画像表示 
cv2.imshow('frame', frame)

# キーが入力されるまで待つ 
cv2.waitKey(0)

# 終了処理 
cv2.destroyAllWindows()

for inで１つずつ行データをdetectionに取り出し、confの値と閾値0.5を比較した結果、大きいときだけバウンディングボックスを表示するようにしてる。
conf値に対して閾値を設けて判断すれば、必要な箇所にのみバウンディングボックスを描くことが出来る。

実行結果

pi@raspberrypi:~/workspace $ python3 face_detect3.py

↑

リアルタイムに目の位置推定 †

↑

リアルタイム顔検出 †

プログラム face_detect4.py を新規作成

vi face_detect4.py

# -*- coding: utf-8 -*-
##------------------------------------------
## OpenVINO™ model -face-detection-retail-0005-
##  ** Multi Face Detect ** camera
##               2021.01.11 Masahiro Izutsu
##
##               2021.02.10 warning error
##------------------------------------------

#=========================================== 
# 準備 
#=========================================== 
# import 
import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込み
ie = IECore()
net = ie.read_network(model='FP16/face-detection-retail-0005.xml', weights='FP16/face-detection-retail-0005.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")

# 入出力データのキー取得 
input_blob = net.input_info['input.1'].name
out_blob = next(iter(net.outputs))

# カメラ準備 
cap = cv2.VideoCapture(0)

#=========================================== 
# メインループ 
#=========================================== 
while True:
    # キー押下で終了 
    key = cv2.waitKey(1)
    if key != -1:
        break

    # カメラ画像読み込み 
    ret, frame = cap.read()

    # 入力データフォーマットへ変換 
    img = cv2.resize(frame, (300, 300)) # HeightとWidth変更 
    img = img.transpose((2, 0, 1))      # HWC > CHW 
    img = np.expand_dims(img, axis=0)   # CHW > BCHW 

    # 推論実行 
    out = exec_net.infer(inputs={input_blob: img})

    # 出力から必要なデータのみ取り出し 
    out = out[out_blob]

    # 不要な次元を削減 
    out = np.squeeze(out)

    # 検出されたすべての顔領域に対して１つずつ処理 
    for detection in out:
        # conf値の取得 
        confidence = float(detection[2])

        # バウンディングボックス座標を入力画像のスケールに変換 
        xmin = int(detection[3] * frame.shape[1])
        ymin = int(detection[4] * frame.shape[0])
        xmax = int(detection[5] * frame.shape[1])
        ymax = int(detection[6] * frame.shape[0])

        # conf値が0.5より大きい場合のみバウンディングボックス表示 
        if confidence > 0.5:
            # バウンディングボックス表示 
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(89, 199, 243), thickness=3)

    # 画像表示 
    cv2.imshow('frame', frame)

#=========================================== 
# 終了処理 
#=========================================== 
cap.release()
cv2.destroyAllWindows()

入力画像はカメラから常に異なる画像が入ってくるのでその都度処理が必要だが、モデル読み込みは常に同じモデルを使うため、最初の１回だけで良い。
処理時間が長かったのはモデルの読み込みの部分で推論自体は非常に速く処理されている。

実行結果

pi@raspberrypi:~/workspace $ python3 face_detect4.py

↑

顔検出後にランドマーク回帰 †

プログラム face_landmarks.py を新規作成

vi face_landmarks.py

# -*- coding: utf-8 -*-
##------------------------------------------
## OpenVINO™ model -face-detection-retail-0005-
##  ** Multi Face Detect & Landmarks ** camera
##               2021.01.11 Masahiro Izutsu
##
##               2021.02.10 warning error
##------------------------------------------

#=========================================== 
# 準備 
#=========================================== 
# import 
import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込みと入出力データのキー取得（顔検出） 
ie = IECore()
net_face = ie.read_network(model='FP16/face-detection-retail-0005.xml', weights='FP16/face-detection-retail-0005.bin')
exec_net_face = ie.load_network(network=net_face, device_name="MYRIAD")
input_blob_face = net_face.input_info['input.1'].name
out_blob_face  = next(iter(net_face.outputs))

# モデルの読み込みと入出力データのキー取得（landmarks） 
net_landmarks = ie.read_network(model='FP16/landmarks-regression-retail-0009.xml', weights='FP16/landmarks-regression-retail-0009.bin')
exec_net_landmarks = ie.load_network(network=net_landmarks, device_name="MYRIAD")
input_blob_landmarks = net_landmarks.input_info['0'].name
out_blob_landmarks = next(iter(net_landmarks.outputs))

# カメラ準備 
cap = cv2.VideoCapture(0)

#=========================================== 
# メインループ 
#=========================================== 
while True:
    # キー押下で終了 
    key = cv2.waitKey(1)
    if key != -1:
        break

    # カメラ画像読み込み 
    ret, frame = cap.read()

    # 入力データフォーマットへ変換 
    img = cv2.resize(frame, (300, 300)) # HeightとWidth変更 
    img = img.transpose((2, 0, 1))      # HWC > CHW 
    img = np.expand_dims(img, axis=0)   # CHW > BCHW 

    # 推論実行 
    out = exec_net_face.infer(inputs={input_blob_face: img})

    # 出力から必要なデータのみ取り出し 
    out = out[out_blob_face]

    # 不要な次元を削減 
    out = np.squeeze(out)

    # 検出されたすべての顔領域に対して１つずつ処理 
    for detection in out:
        # conf値の取得 
        confidence = float(detection[2])

        # バウンディングボックス座標を入力画像のスケールに変換 
        xmin = int(detection[3] * frame.shape[1])
        ymin = int(detection[4] * frame.shape[0])
        xmax = int(detection[5] * frame.shape[1])
        ymax = int(detection[6] * frame.shape[0])

        # conf値が0.5より大きい場合のみLandmarks推論とバウンディングボックス表示 
        if confidence > 0.5:
           # 顔検出領域はカメラ範囲内に補正する。特にminは補正しないとエラーになる 
            if xmin < 0:
                xmin = 0
            if ymin < 0:
                ymin = 0
            if xmax > frame.shape[1]:
                xmax = frame.shape[1]
            if ymax > frame.shape[0]:
                ymax = frame.shape[0]

            #------------------------------------ 
            #  ディープラーニングLandmarks推定 
            #------------------------------------ 
            # 顔領域のみ切り出し 
            img_face = frame[ ymin:ymax, xmin:xmax ]

            # 入力データフォーマットへ変換 
            img = cv2.resize(img_face, (48, 48)) # HeightとWidth変更 
            img = img.transpose((2, 0, 1))       # HWC > CHW 
            img = np.expand_dims(img, axis=0)    # CHW > BCHW 

            # 推論実行 
            out = exec_net_landmarks.infer(inputs={input_blob_landmarks: img})

            # 出力から必要なデータのみ取り出し 
            out = out[out_blob_landmarks]

            # 不要な次元を削減 
            out = np.squeeze(out)

            # Landmarks検出位置にcircle表示 
            for i in range(0, 10, 2):
                x = int(out[i] * img_face.shape[1]) + xmin
                y = int(out[i+1] * img_face.shape[0]) + ymin
                cv2.circle(frame, (x, y), 10, (89, 199, 243), thickness=-1)

            # バウンディングボックス表示 
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(89, 199, 243), thickness=3)

    # 画像表示 
    cv2.imshow('frame', frame)

#=========================================== 
# 終了処理 
#=========================================== 
cap.release()
cv2.destroyAllWindows()

1つのコードの中にモデルが２種類あるため、変数名に下記文字列を追加して顔検出とランドマーク回帰でそれぞれ分けている。
- _face
- _landmarks
カメラ映像から得られた画像frameに対し顔検出を行い、スライスを使って顔のみの画像img_faceを作ってる。
さらにimg_faceに対してランドマーク回帰を行っている。
顔領域を切り出した後にバウンディングボックス描画を行う。
スライスを使って顔を切り出す際に、カメラ画像の範囲外に座標があるとエラーになるため、事前にカメラ範囲内になるように補正する。
各ランドマーク位置に円を表示する際の注意。
- 正規化された座標から元の座標のスケールに戻す際は、画像全体のframe.shape ではなく　顔画部分のみのimg_face.shapeを使う。
- 全体画像の座標系に位置を合わせるため、最後にそれぞれxmin、yminを加える。
```
# Landmarks検出位置にcircle表示 
for i in range(0, 10, 2):
    x = int(out[i] * img_face.shape[1]) + xmin
    y = int(out[i+1] * img_face.shape[0]) + ymin
    cv2.circle(frame, (x, y), 10, (89, 199, 243), thickness=-1)
```

実行結果

pi@raspberrypi:~/workspace $ python3 face_landmarks.py

↑

簡易サングラス描画 †

プログラム face_landmarks1.py を新規作成

vi face_landmarks1.py

# -*- coding: utf-8 -*-
##------------------------------------------
## OpenVINO™ model -face-detection-retail-0005-
##  ** Multi Face Detect & Landmarks ** glass
##               2021.01.11 Masahiro Izutsu
##
##               2021.02.10 warning error
##------------------------------------------

#=========================================== 
# 準備 
#=========================================== 
# import 
import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込みと入出力データのキー取得（顔検出） 
ie = IECore()
net_face = ie.read_network(model='FP16/face-detection-retail-0005.xml', weights='FP16/face-detection-retail-0005.bin')
exec_net_face = ie.load_network(network=net_face, device_name="MYRIAD")
input_blob_face = net_face.input_info['input.1'].name
out_blob_face  = next(iter(net_face.outputs))

# モデルの読み込みと入出力データのキー取得（landmarks） 
net_landmarks = ie.read_network(model='FP16/landmarks-regression-retail-0009.xml', weights='FP16/landmarks-regression-retail-0009.bin')
exec_net_landmarks = ie.load_network(network=net_landmarks, device_name="MYRIAD")
input_blob_landmarks = net_landmarks.input_info['0'].name
out_blob_landmarks = next(iter(net_landmarks.outputs))

# カメラ準備 
cap = cv2.VideoCapture(0)

#=========================================== 
# メインループ 
#=========================================== 
while True:
    # キー押下で終了 
    key = cv2.waitKey(1)
    if key != -1:
        break

    # カメラ画像読み込み 
    ret, frame = cap.read()

    # 入力データフォーマットへ変換 
    img = cv2.resize(frame, (300, 300)) # HeightとWidth変更 
    img = img.transpose((2, 0, 1))      # HWC > CHW 
    img = np.expand_dims(img, axis=0)   # CHW > BCHW 

    # 推論実行 
    out = exec_net_face.infer(inputs={input_blob_face: img})

    # 出力から必要なデータのみ取り出し 
    out = out[out_blob_face]

    # 不要な次元を削減 
    out = np.squeeze(out)

    # 検出されたすべての顔領域に対して１つずつ処理 
    for detection in out:
        # conf値の取得 
        confidence = float(detection[2])

        # バウンディングボックス座標を入力画像のスケールに変換 
        xmin = int(detection[3] * frame.shape[1])
        ymin = int(detection[4] * frame.shape[0])
        xmax = int(detection[5] * frame.shape[1])
        ymax = int(detection[6] * frame.shape[0])

        # conf値が0.5より大きい場合のみLandmarks推論とバウンディングボックス表示 
        if confidence > 0.5:
           # 顔検出領域はカメラ範囲内に補正する。特にminは補正しないとエラーになる 
            if xmin < 0:
                xmin = 0
            if ymin < 0:
                ymin = 0
            if xmax > frame.shape[1]:
                xmax = frame.shape[1]
            if ymax > frame.shape[0]:
                ymax = frame.shape[0]

            #------------------------------------ 
            #  ディープラーニングLandmarks推定 
            #------------------------------------ 
            # 顔領域のみ切り出し 
            img_face = frame[ ymin:ymax, xmin:xmax ]

            # 入力データフォーマットへ変換 
            img = cv2.resize(img_face, (48, 48)) # HeightとWidth変更 
            img = img.transpose((2, 0, 1))       # HWC > CHW 
            img = np.expand_dims(img, axis=0)    # CHW > BCHW 

            # 推論実行 
            out = exec_net_landmarks.infer(inputs={input_blob_landmarks: img})

            # 出力から必要なデータのみ取り出し 
            out = out[out_blob_landmarks]

            # 不要な次元を削減 
            out = np.squeeze(out)

            # 目の座標を顔画像のスケールに変換し、オフセット考慮 
            eye_left_x = int(out[0] * img_face.shape[1]) + xmin
            eye_left_y = int(out[1] * img_face.shape[0]) + ymin
            eye_right_x = int(out[2] * img_face.shape[1]) + xmin
            eye_right_y = int(out[3] * img_face.shape[0]) + ymin

            # 目の位置に表示 
            r = int((xmax - xmin) / 6)
            cv2.circle(frame, (eye_left_x, eye_left_y), r, (0, 0, 0), thickness=-1)
            cv2.circle(frame, (eye_right_x, eye_right_y), r, (0, 0, 0), thickness=-1)
            cv2.line(frame, (eye_left_x, eye_left_y), (eye_right_x, eye_right_y), (0, 0, 0), thickness=3)

            # バウンディングボックス表示 
            #cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(89, 199, 243), thickness=3) 

    # 画像表示 
    cv2.imshow('frame', frame)

#=========================================== 
# 終了処理 
#=========================================== 
cap.release()
cv2.destroyAllWindows()

ランドマーク５個のうちの両目の位置座標だけを活用。
- それぞれの目の座標位置に半径rの黒い円を描画
- rは一定値ではなく、顔の大きさの1/6になるように設定
- メガネのフレームをイメージして、目と目の間に線分を描画※
- バウンディングボックスは非表示
- ※メガネのフレーム（ブリッジ部）の太さは簡易的に3で一定とした

実行結果

pi@raspberrypi:~/workspace $ python3 face_landmarks1.py

↑

プログラムの考察など †

↑

ワーニングエラーについて †

発生していたワーニングエラー

classification3.py:15: DeprecationWarning: 'inputs' property of IENetwork class is deprecated. To access DataPtrs user need to use 'input_data' property of InputInfoPtr objects which can be accessed by 'input_info' property.

(機械翻訳)

Classification3.py:15：DeprecationWarning：IENetworkクラスの 'inputs'プロパティは非推奨になりました。 DataPtrsにアクセスするには、ユーザーは、「input_info」プロパティでアクセスできるInputInfoPtrオブジェクトの「input_data」プロパティを使用する必要があります。

APIのバージョンアップに伴う変更のよう。以下の個所を修正した。

修正前

# 入力データと出力データのキーを取得 
input_blob = next(iter(net.inputs))

修正後

# 入力データと出力データのキーを取得 
input_blob = net.input_info['XXXX'].name

「XXXX」は入力キー名

↑

更新履歴 †

2021/02/11 ワーニングエラー対応

↑

参考資料 †

お手本のサイト → AI CORE XスターターキットとOpenVINO™ ですぐに始めるディープラーニング推論

Open Model Zoo (INTEL® 学習済みモデルファイルのアーカイブ) を利用する
- OpenVINO上でのアプリケーション開発の方法を学ぶ

INTEL® オフィシャル・ドキュメント
- OpenVINO™ Toolkit Overview
- Install OpenVINO™ toolkit for Raspbian* OS
- API ドキュメント → Overview of Inference Engine Python* API
- 学習済みモデルの場所 → INTEL OPENSOURCE.org
- 学習済みモデルのドキュメント → Overview of OpenVINO™ Toolkit Intel's Pre-Trained Models

最新の20件

ゼロから学ぶディープラーニング推論 -学習済みモデル- †

Intel® OpenVINO™ 学習済みモデルの検証 †

Inference Engine で顔画像からランドマーク回帰 (目・鼻・口の位置推定) †

事前準備 †

ディープラーニング推論実行 †

入力画像に出力結果を描画 †

Inference Engine で顔検出 †

事前準備 †

ディープラーニング推論実行 †

入力画像に出力結果を描画 †

複数の顔検出 †

複数の顔検出結果を表示 †

リアルタイムに目の位置推定 †

リアルタイム顔検出 †

顔検出後にランドマーク回帰 †

簡易サングラス描画 †

プログラムの考察など †

ワーニングエラーについて †

更新履歴 †

参考資料 †