OpenVINO4 - PukiWiki

[ トップ ] [ 一覧 | 検索 | 履歴 | ログイン ]

最新の20件

2024-04-19

RevYOLOv5_2

2024-04-18

私的AI研究会

2024-04-16

RevYOLOv5

2024-04-14

ミーティング履歴

2024-04-10

YOLOv7_Colab3

2024-03-18

PyLearn

2024-03-17

2024-03-15

2024-03-05

RecentDeleted

2024-03-02

OpenVINOv2

2024-03-01

Anaconda1

2024-02-16

ProjectEnv3

2024-02-15

2024-02-14

SendMail

2024-01-21

GanFOMM

2024-01-18

ハードウェアTips

2024-01-17

進捗メモ-mizutu

私的AI研究会 > OpenVINO4

ゼロから学ぶディープラーニング推論 -リアルタイム顔検出- †

ゼロから学ぶディープラーニング推論 -リアルタイム顔検出-
リアルタイム顔検出
参考資料

※ 最終更新:2021/01/02　

リアルタイム顔検出 †

「第8回リアルタイム顔検出」に従って進める。
サイトで解説している OpenVINO™ ツールキットのバージョンは 2019_R3.1
今回インストールしたのは最新バージョン 2021.2
バージョン違いによる対応は赤印で適宜記述する。
カメラ映像で顔検出する前に静止画像で顔検出を行う。

静止画像で顔検出 †

　インテルの学習済みモデルを使用する。

学習済みモデルの取得 †

OpenVINO™ ツールキットのバージョンに合った学習済みモデルをダウンロードする。
インストール・バージョンは 2021.2

学習済みモデルの場所
https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/face-detection-retail-0004/

学習済みモデルのドキュメント
https://docs.openvinotoolkit.org/latest/omz_models_intel_face_detection_retail_0004_description_face_detection_retail_0004.html

使用するモデル名
```
face-detection-retail-0004
```

~/workspace/FP16 フォルダに学習済みモデルをダウンロードする。

pi@raspberrypi:~ $ cd workspace/FP16
pi@raspberrypi:~/workspace/FP16 $ wget --no-check-certificate https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/face-detection-retail-0004/FP16/face-detection-retail-0004.bin
--2020-12-28 17:01:57--  https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/face-detection-retail-0004/FP16/face-detection-retail-0004.bin
download.01.org (download.01.org) をDNSに問いあわせています... 2600:140b:d400:1ad::4b21, 2600:140b:d400:188::4b21, 23.42.230.170
   :
2020-12-28 17:01:59 (923 KB/s) - `face-detection-retail-0004.bin' へ保存完了 [1176544/1176544]

pi@raspberrypi:~/workspace/FP16 $ wget --no-check-certificate https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/face-detection-retail-0004/FP16/face-detection-retail-0004.xml
--2021-01-02 15:23:33--  https://download.01.org/opencv/2021/openvinotoolkit/2021.2/open_model_zoo/models_bin/3/face-detection-retail-0004/FP16/face-detection-retail-0004.xml
download.01.org (download.01.org) をDNSに問いあわせています... 2600:140b:d400:1ad::4b21, 2600:140b:d400:188::4b21, 23.42.230.170
   :
2021-01-02 15:23:34 (324 KB/s) - `face-detection-retail-0004.xml' へ保存完了 [100995/100995]

pi@raspberrypi:~/workspace/FP16 $ ls -l
合計 6144
-rw-r--r-- 1 pi pi 4965014 12月 11 02:12 emotions-recognition-retail-0003.bin
-rw-r--r-- 1 pi pi   37985 12月 11 02:12 emotions-recognition-retail-0003.xml
-rw-r--r-- 1 pi pi 1176544 12月 11 02:14 face-detection-retail-0004.bin
-rw-r--r-- 1 pi pi  100995 12月 11 02:14 face-detection-retail-0004.xml

入力データ †

ダウンロードした学習済みモデルの詳細については→
https://docs.openvinotoolkit.org/latest/omz_models_intel_face_detection_retail_0004_description_face_detection_retail_0004.html
上記ページより
```
Inputs
name: "input" , shape: [1x3x300x300] - An input image in the format [BxCxHxW], where:

B - batch size
C - number of channels
H - image height
W - image width
Expected color order - BGR.
```
入力データの名前に 'input' とあるが 'data' が正解。
型は [1x3x300x300]、画像フォーマットは[BxCxHxW]で、カラーの順番は BGR。
前回の感情分類との違いは、高さ×幅の画像サイズだけ。batch sizeは画像の枚数に相当する。

具体的なコード

# 画像サイズを300x300にする 
img = cv2.resize(img, (300, 300))

# HWCからCHWに変更 
img = img.transpose((2, 0, 1))

# 大きさ1の次元を追加し4次元にする。省略OK 
img = np.expand_dims(img, axis=0)

画像データを準備する。最低300x300ピクセル以上のサイズがあった方が良い。人の顔が写っていればもOK。
サンプルのサイト例 →
https://www.pakutaso.com/20190610177post-21595.html

ファイル名は photo4.jpg photo5.jpg として ~/workspace/image フォルダに入れる。

推論して出力 †

detection1.py を新規作成

# -*- coding: utf-8 -*-

import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込み 
ie = IECore()
net = ie.read_network(model='FP16/face-detection-retail-0004.xml', 
weights='FP16/face-detection-retail-0004.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")

# 入力画像読み込み 
img_face = cv2.imread('./image/photo4.jpg')

# 入力データフォーマットへ変換 
img = cv2.resize(img_face, (300, 300))   # サイズ変更 
img = img.transpose((2, 0, 1))    # HWC > CHW 
img = np.expand_dims(img, axis=0) # 次元合せ 

# 推論実行 
out = exec_net.infer(inputs={'data': img})

# 出力 
print(out)

実行結果

pi@raspberrypi:~/workspace $ python3 detection1.py
{'detection_out': array([[[[0.        , 1.        , 0.89990234, ..., 0.17883301,
         0.7060547 , 0.44335938],
        [0.        , 1.        , 0.08251953, ..., 0.4831543 ,
         0.44506836, 0.6064453 ],
        [0.        , 1.        , 0.04394531, ..., 0.52197266,
         0.42236328, 0.5913086 ],
        ...,
        [0.        , 0.        , 0.        , ..., 0.        ,
         0.        , 0.        ],
        [0.        , 0.        , 0.        , ..., 0.        ,
         0.        , 0.        ],
        [0.        , 0.        , 0.        , ..., 0.        ,
         0.        , 0.        ]]]], dtype=float32)}

出力データ名は'detection_out'であることがわかる。

出力データ †

先のモデル説明ページから出力フォーマットをみる。

Outputs
The net outputs a blob with shape: [1, 1, N, 7], where N is the number of detected bounding boxes. For each detection, the description has the format: [image_id, label, conf, x_min, y_min, x_max, y_max], where:
・image_id - ID of the image in the batch
・￥label - predicted class ID
・conf - confidence for the predicted class
・(x_min, y_min) - coordinates of the top left bounding box corner
・(x_max, y_max) - coordinates of the bottom right bounding box corner.

型は[1, 1, N, 7]で、Nは検出したバウンディングボックスの数。
バウンディングボックスは顔検出した際に顔に描かれる四角い枠。
最後の次元の7個の要素に対する詳細フォーマットは[image_id, label, conf, x_min, y_min, x_max, y_max]。
- image_id - batchのID番号（今回バッチサイズは1なので無視）
- label - 予測クラスID（今回検出するのは顔のみなので無視）
- conf - 予測の信用度
- (x_min, y_min) - バウンディングボックスの左上座標
- (x_max, y_max) - バウンディングボックスの右下座標

前回の感情分類は「分類」なので、入力画像全体に対して５つの感情に対する割合を出力する。
今回の顔検出は「物体検出」なので、入力画像全体に対して、顔っぽい領域すべてを列挙する。
バウンディングボックスを表示する際は、見つけた所全てを表示するのではなく、confをチェックしてある程度信用度の高い領域のみを表示させる。

実際のコード
データの取り出し

# 出力から必要なデータのみ取り出し 
out = out['detection_out']
out = np.squeeze(out) #不要な次元の削減

outには顔っぽい領域を検出したデータ配列が複数含まれている。forを使って１つずつ変数detectionに配列を取り出す。

detectionの中には7つの要素があるが、このうち必要な5つだけを取り扱う。

# 検出されたすべての顔領域に対して１つずつ処理 
for detection in out:
    # conf値の取得 
    confidence = float(detection[2])
 
    # バウンディングボックス座標を入力画像のスケールに変換 
    xmin = int(detection[3] * frame.shape[1])
    ymin = int(detection[4] * frame.shape[0])
    xmax = int(detection[5] * frame.shape[1])
    ymax = int(detection[6] * frame.shape[0])
 
    # conf値が0.5より大きい場合のみバウンディングボックス表示 
    if confidence > 0.5:
        cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(240, 180, 0), thickness=3)

detection[2]にconf値データが入っているが、floatを使って小数に型変換。
detection[3]～detection[6]にはバウンディングボックスの座標が入っているが、0.0～1.0の正規化された値が入っている。入力画像の幅と高さを最大とする座標値に変換。
frame.shape[0]には画像の高さ、frame.shpe[1]には画像の幅が入っているので、intで整数値へ型変換。
ifでconf値をみて0.5を超えている場合のみバウンディングボックスを描画する。

全体プログラム †

detection2.py を新規作成

vi detection2.py

# -*- coding: utf-8 -*-

import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込み 
ie = IECore()
net = ie.read_network(model='FP16/face-detection-retail-0004.xml', 
weights='FP16/face-detection-retail-0004.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")

# 入力画像読み込み 
frame = cv2.imread('./image/photo4.jpg')

# 入力データフォーマットへ変換 
img = cv2.resize(frame, (300, 300))   # サイズ変更 
img = img.transpose((2, 0, 1))    # HWC > CHW 
img = np.expand_dims(img, axis=0) # 次元合せ 

# 推論実行 
out = exec_net.infer(inputs={'data': img})

# 出力から必要なデータのみ取り出し 
out = out['detection_out']
out = np.squeeze(out) #サイズ1の次元を全て削除 

# 検出されたすべての顔領域に対して１つずつ処理 
for detection in out:
    # conf値の取得 
    confidence = float(detection[2])

    # バウンディングボックス座標を入力画像のスケールに変換 
    xmin = int(detection[3] * frame.shape[1])
    ymin = int(detection[4] * frame.shape[0])
    xmax = int(detection[5] * frame.shape[1])
    ymax = int(detection[6] * frame.shape[0])

    # conf値が0.5より大きい場合のみバウンディングボックス表示 
    if confidence > 0.5:
        cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(240, 180, 0),  thickness=3)

# 画像表示 
cv2.imshow('frame', frame)

# キーが押されたら終了 
cv2.waitKey(0)
cv2.destroyAllWindows()

photo5.jpg の実行結果

カメラ映像でリアルタイム顔検出 †

カメラ映像入力のプログラム †

detection3.py を新規作成

vi detection3.py

# -*- coding: utf-8 -*-

import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込み 
ie = IECore()
net = ie.read_network(model='FP16/face-detection-retail-0004.xml', 
weights='FP16/face-detection-retail-0004.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")

# カメラ準備 
cap = cv2.VideoCapture(0)

# メインループ 
while True:
    ret, frame = cap.read()

    # Reload on error 
    if ret == False:
        continue

    # 入力データフォーマットへ変換 
    img = cv2.resize(frame, (300, 300))   # サイズ変更 
    img = img.transpose((2, 0, 1))    # HWC > CHW 
    img = np.expand_dims(img, axis=0) # 次元合せ 

    # 推論実行 
    out = exec_net.infer(inputs={'data': img})

    # 出力から必要なデータのみ取り出し 
    out = out['detection_out']
    out = np.squeeze(out) #サイズ1の次元を全て削除 

    # 検出されたすべての顔領域に対して１つずつ処理 
    for detection in out:
        # conf値の取得 
        confidence = float(detection[2])

        # バウンディングボックス座標を入力画像のスケールに変換 
        xmin = int(detection[3] * frame.shape[1])
        ymin = int(detection[4] * frame.shape[0])
        xmax = int(detection[5] * frame.shape[1])
        ymax = int(detection[6] * frame.shape[0])

        # conf値が0.5より大きい場合のみバウンディングボックス表示 
        if confidence > 0.5:
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(240, 180, 0), thickness=3)

    # 画像表示 
    cv2.imshow('frame', frame)

    # 何らかのキーが押されたら終了 
    key = cv2.waitKey(1)
    if key != -1:
        break

# 終了処理 
cap.release()
cv2.destroyAllWindows()

画像入力の代わりにカメラ準備を行い、whileループでフレーム取得・ディープラーニング推論・画像表示を繰り返し、何かキーが押されたらループを抜ける。

動画ファイルから顔検出 †

　カメラ画像とほとんど同じ処理で動画ファイルからも顔検出ができる。

全体プログラム †

detection2.py をコピーして修正する。

pi@raspberrypi:~/workspace $ cp detection2.py detect2-video.py
pi@raspberrypi:~/workspace $ vi detect2-video.py

# -*- coding: utf-8 -*-

import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込み 
ie = IECore()
net = ie.read_network(model='FP16/face-detection-retail-0004.xml', weights='FP16/face-detection-retail-0004.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")

# カメラ準備 
filepath = "/home/pi/Videos/video-test.mp4"
cap = cv2.VideoCapture(filepath)

# メインループ 
while True:
    ret, frame = cap.read()

    # Reload on error 
    if ret == False:
        continue

    # 入力データフォーマットへ変換 
    img = cv2.resize(frame, (300, 300))   # サイズ変更 
    img = img.transpose((2, 0, 1))    # HWC > CHW 
    img = np.expand_dims(img, axis=0) # 次元合せ 

    # 推論実行 
    out = exec_net.infer(inputs={'data': img})

    # 出力から必要なデータのみ取り出し 
    out = out['detection_out']
    out = np.squeeze(out) #サイズ1の次元を全て削除 

    # 検出されたすべての顔領域に対して１つずつ処理 
    for detection in out:
        # conf値の取得 
        confidence = float(detection[2])

        # バウンディングボックス座標を入力画像のスケールに変換 
        xmin = int(detection[3] * frame.shape[1])
        ymin = int(detection[4] * frame.shape[0])
        xmax = int(detection[5] * frame.shape[1])
        ymax = int(detection[6] * frame.shape[0])

        # conf値が0.5より大きい場合のみバウンディングボックス表示 
        if confidence > 0.5:
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(240, 180, 0), thickness=3)

    # 画像表示 
    cv2.imshow('frame', frame)

    # 何らかのキーが押されたら終了 
    key = cv2.waitKey(1)
    if key != -1:
        break

# 終了処理 
cap.release()
cv2.destroyAllWindows()

VideoCapture() のパラメーターに動画ファイルのパス名を入れるだけでよい。

以上で「第8回リアルタイム顔検出」終了。

次回は「第9回リアルタイム感情分析アプリ」

参考資料 †

INTEL® オフィシャル・ドキュメント
- OpenVINO™ Toolkit Overview
- Install OpenVINO™ toolkit for Raspbian* OS
- API ドキュメント → Overview of Inference Engine Python* API
- 学習済みモデルの場所 → INTEL OPENSOURCE.org
- 学習済みモデルのドキュメント → Overview of OpenVINO™ Toolkit Intel's Pre-Trained Models

第8回リアルタイム顔検出

Last-modified: 2021-02-24 (水) 17:12:53