OpenVINO5 - PukiWiki

[ トップ ] [ 一覧 | 検索 | 履歴 | ログイン ]

最新の20件

2024-04-18

2024-04-16

RevYOLOv5

2024-04-14

ミーティング履歴

2024-04-10

YOLOv7_Colab3

2024-03-18

PyLearn

2024-03-17

2024-03-15

2024-03-05

RecentDeleted

2024-03-02

OpenVINOv2

2024-03-01

Anaconda1

2024-02-16

ProjectEnv3

2024-02-15

2024-02-14

SendMail

2024-01-21

GanFOMM

2024-01-18

ハードウェアTips

2024-01-17

進捗メモ-mizutu

私的AI研究会 > OpenVINO5

ゼロから学ぶディープラーニング推論 -リアルタイム感情分析アプリ- †

ゼロから学ぶディープラーニング推論 -リアルタイム感情分析アプリ-
リアルタイム感情分析アプリ
参考資料

※ 最終更新:2021/01/03　

リアルタイム感情分析アプリ †

「第9回リアルタイム感情分析アプリ」に従って進める。
「顔検出」と「感情分類」を組合せ、リアルタイムにグラフ表示や画像表示を行うアプリケーションを完成させる。

顔検出 + 感情分類 †

　前回のリアルタイム顔検出に、感情分類のコードを加えることで出来る。

全体プログラム †

detection4.py を新規作成

vi detection4.py

# -*- coding: utf-8 -*-

import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込み （顔検出）
ie = IECore()
net = ie.read_network(model='FP16/face-detection-retail-0004.xml', 
weights='FP16/face-detection-retail-0004.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")

# モデルの読み込み （感情分類）
net_emotion = ie.read_network(model='FP16/emotions-recognition-retail-0003.xml', weights='FP16/emotions-recognition-retail-0003.bin')
exec_net_emotion = ie.load_network(network=net_emotion, device_name="MYRIAD")

# カメラ準備 
cap = cv2.VideoCapture(0)

# メインループ 
while True:
    ret, frame = cap.read()

    # Reload on error 
    if ret == False:
        continue

    # 入力データフォーマットへ変換 
    img = cv2.resize(frame, (300, 300))   # サイズ変更 
    img = img.transpose((2, 0, 1))    # HWC > CHW 
    img = np.expand_dims(img, axis=0) # 次元合せ 

    # 推論実行 
    out = exec_net.infer(inputs={'data': img})

    # 出力から必要なデータのみ取り出し 
    out = out['detection_out']
    out = np.squeeze(out) #サイズ1の次元を全て削除 

    # 検出されたすべての顔領域に対して１つずつ処理 
    for detection in out:
        # conf値の取得 
        confidence = float(detection[2])

        # バウンディングボックス座標を入力画像のスケールに変換 
        xmin = int(detection[3] * frame.shape[1])
        ymin = int(detection[4] * frame.shape[0])
        xmax = int(detection[5] * frame.shape[1])
        ymax = int(detection[6] * frame.shape[0])

        # conf値が0.5より大きい場合のみ感情推論とバウンディングボックス表示 
        if confidence > 0.5:
           # 顔検出領域はカメラ範囲内に補正する。特にminは補正しないとエラーになる 
            if xmin < 0:
                xmin = 0
            if ymin < 0:
                ymin = 0
            if xmax > frame.shape[1]:
                xmax = frame.shape[1]
            if ymax > frame.shape[0]:
                ymax = frame.shape[0]

            # 顔領域のみ切り出し 
            frame_face = frame[ ymin:ymax, xmin:xmax ]

            # 入力データフォーマットへ変換 
            img = cv2.resize(frame_face, (64, 64))   # サイズ変更 
            img = img.transpose((2, 0, 1))    # HWC > CHW 
            img = np.expand_dims(img, axis=0) # 次元合せ 

            # 推論実行 
            out = exec_net_emotion.infer(inputs={'data': img})

            # 出力から必要なデータのみ取り出し 
            out = out['prob_emotion']
            out = np.squeeze(out) #不要な次元の削減 

            # 出力値が最大のインデックスを得る 
            index_max = np.argmax(out)

            # 各感情の文字列をリスト化 
            list_emotion = ['neutral', 'happy', 'sad', 'surprise', 'anger']

            # 文字列描画 
            cv2.putText(frame, list_emotion[index_max], (20, 60), cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 255, 255), 4)

            # バウンディングボックス表示 
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(240, 180, 0), thickness=3)

            # １つの顔で終了 
            break

    # 画像表示 
    cv2.imshow('frame', frame)

    # 何らかのキーが押されたら終了 
    key = cv2.waitKey(1)
    if key != -1:
        break

# 終了処理 
cap.release()
cv2.destroyAllWindows()

プログラムのポイント「モデルの読み込み」 †

バージョン 2021.2 ではモデルの読み込みのコードが変更になっている。

顔検出ではnet, exec_netという変数を用いているので、感情分類ではnet_emotion, exec_net_emotionという名前を使用。

# モデルの読み込み （顔検出）
ie = IECore()
net = ie.read_network(model='FP16/face-detection-retail-0004.xml', 
weights='FP16/face-detection-retail-0004.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")

# モデルの読み込み （感情分類）
net_emotion = ie.read_network(model='FP16/emotions-recognition-retail-0003.xml', weights='FP16/emotions-recognition-retail-0003.bin')
exec_net_emotion = ie.load_network(network=net_emotion, device_name="MYRIAD")

プログラムのポイント「顔領域の切り出し」 †

「スライス」を活用することで簡単にできる。

# 顔領域のみ切り出し 
frame_face = frame[ ymin:ymax, xmin:xmax ]

顔検出はカメラの枠から顔がはみ出た場合に、負の値になることがあり、cv2.resize時にエラーになる。顔領域切り出し前に、顔検出領域をカメラ範囲内に補正する。

# 顔検出領域はカメラ範囲内に補正する。特にminは補正しないとエラーになる 
if xmin < 0:
   xmin = 0
if ymin < 0:
   ymin = 0
if xmax > frame.shape[1]:
   xmax = frame.shape[1]
if ymax > frame.shape[0]:
   ymax = frame.shape[0]

プログラムのポイント「感情推論結果の表示」 †

感情の文字列は、顔検出領域frame_faceではなく、カメラフレームframaeに対して描画する。

# 文字列描画 
cv2.putText(frame, list_emotion[index_max], (20, 60), cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 255, 255), 4)

処理を軽くするために、感情推論は1フレーム辺り１つの顔のみとする。
```
# １つの顔で終了 
break
```

棒グラフの表示 †

　感情の値を棒グラフを使って見える化する。

全体プログラム †

detection5.py を新規作成

vi detection5.py

# -*- coding: utf-8 -*-

import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# モデルの読み込み （顔検出）
ie = IECore()
net = ie.read_network(model='FP16/face-detection-retail-0004.xml', weights='FP16/face-detection-retail-0004.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")

# モデルの読み込み （感情分類）
net_emotion = ie.read_network(model='FP16/emotions-recognition-retail-0003.xml', weights='FP16/emotions-recognition-retail-0003.bin')
exec_net_emotion = ie.load_network(network=net_emotion, device_name="MYRIAD")

# カメラ準備 
cap = cv2.VideoCapture(0)

# メインループ 
while True:
    ret, frame = cap.read()

    # Reload on error 
    if ret == False:
        continue

    # 入力データフォーマットへ変換 
    img = cv2.resize(frame, (300, 300))   # サイズ変更 
    img = img.transpose((2, 0, 1))    # HWC > CHW 
    img = np.expand_dims(img, axis=0) # 次元合せ 

    # 推論実行 
    out = exec_net.infer(inputs={'data': img})

    # 出力から必要なデータのみ取り出し 
    out = out['detection_out']
    out = np.squeeze(out) #サイズ1の次元を全て削除 

    # 検出されたすべての顔領域に対して１つずつ処理 
    for detection in out:
        # conf値の取得 
        confidence = float(detection[2])

        # バウンディングボックス座標を入力画像のスケールに変換 
        xmin = int(detection[3] * frame.shape[1])
        ymin = int(detection[4] * frame.shape[0])
        xmax = int(detection[5] * frame.shape[1])
        ymax = int(detection[6] * frame.shape[0])

        # conf値が0.5より大きい場合のみ感情推論とバウンディングボックス表示 
        if confidence > 0.5:
           # 顔検出領域はカメラ範囲内に補正する。特にminは補正しないとエラーになる 
            if xmin < 0:
                xmin = 0
            if ymin < 0:
                ymin = 0
            if xmax > frame.shape[1]:
                xmax = frame.shape[1]
            if ymax > frame.shape[0]:
                ymax = frame.shape[0]

            # 顔領域のみ切り出し 
            frame_face = frame[ ymin:ymax, xmin:xmax ]

            # 入力データフォーマットへ変換 
            img = cv2.resize(frame_face, (64, 64))   # サイズ変更 
            img = img.transpose((2, 0, 1))    # HWC > CHW 
            img = np.expand_dims(img, axis=0) # 次元合せ 

            # 推論実行 
            out = exec_net_emotion.infer(inputs={'data': img})

            # 出力から必要なデータのみ取り出し 
            out = out['prob_emotion']
            out = np.squeeze(out) #不要な次元の削減 

            # 出力値が最大のインデックスを得る 
            index_max = np.argmax(out)

            # 各感情の文字列をリスト化 
            list_emotion = ['neutral', 'happy', 'sad', 'surprise', 'anger']

            # 文字列描画 
            cv2.putText(frame, list_emotion[index_max], (20, 60), cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 255, 255), 4)

            # バウンディングボックス表示 
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(240, 180, 0), thickness=3)

            # 棒グラフ表示 
            str_emotion = ['neu', 'hap', 'sad', 'sur', 'ang']
            text_x = 10
            text_y = frame.shape[0] - 180
            rect_x = 80
            rect_y = frame.shape[0] - 200
            for i in range(5):
                cv2.putText(frame, str_emotion[i], (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, 1, (240, 180, 0), 2)
                cv2.rectangle(frame, (rect_x, rect_y), (rect_x + int(300 * out[i]), rect_y + 20), color=(240, 180, 0), thickness=-1)
                text_y = text_y + 40
                rect_y = rect_y + 40

            # １つの顔で終了 
            break

    # 画像表示 
    cv2.imshow('frame', frame)

    # 何らかのキーが押されたら終了 
    key = cv2.waitKey(1)
    if key != -1:
        break

# 終了処理 
cap.release()
cv2.destroyAllWindows()

プログラムのポイント †

前のコードに以下の部分を追加する。

テキスト位置や長方形の位置を for ... in range( )を活用して繰り返し表示。最後に40を加えることにより、Y軸方向に40ピクセルの間隔でテキストと棒グラフが並ぶ。

# 棒グラフ表示 
str_emotion = ['neu', 'hap', 'sad', 'sur', 'ang']
text_x = 10
text_y = frame.shape[0] - 180
rect_x = 80
rect_y = frame.shape[0] - 200
for i in range(5):
    cv2.putText(frame, str_emotion[i], (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, 1, (240, 180, 0), 2)
    cv2.rectangle(frame, (rect_x, rect_y), (rect_x + int(300 * out[i]), rect_y + 20), color=(240, 180, 0), thickness=-1)
    text_y = text_y + 40
    rect_y = rect_y + 40

out[i]にはそれぞれの感情推論の値が0～1.0の数値で入っているので、適当に300を掛けて長方形の横幅として使うことで、リアルタイムな棒グラフが表現できる。

画像オーバーレイ †

　インデックスが最大である感情をテキスト表示からPNG顔アイコン画像に変える。

画像の準備 †

５つの感情それぞれに顔アイコンを用意する。
workspace内にimageフォルダを作成し、その中に各画像を入れる。
- 参考ページ → かおもじ♡アイコン

画像に画像を重ねるクラスの準備 †

オーバーレイ関数のクラスをダウンロードする。
- OpenCVで透過PNGファイルの重ね合わせ
ダウンロードフォルダのファイルを展開してできる pngoverlay.pyをworkspaceフォルダへ移動する。

全体プログラム †

detection6.py を新規作成

vi detection6.py

# -*- coding: utf-8 -*-

import cv2
import numpy as np

# モジュール読み込み 
from openvino.inference_engine import IECore

# pngoverlayクラス読み込み 
from pngoverlay import PNGOverlay

# インスタンス生成 
icon_neutral = PNGOverlay('image/icon_neutral.png')
icon_happy = PNGOverlay('image/icon_happy.png')
icon_sad = PNGOverlay('image/icon_sad.png')
icon_surprise = PNGOverlay('image/icon_surprise.png')
icon_anger = PNGOverlay('image/icon_anger.png')

# インスタンス変数をリストにまとめる 
icon_emotion = [icon_neutral, icon_happy, icon_sad, icon_surprise, icon_anger]

# モデルの読み込み （顔検出）
ie = IECore()
net = ie.read_network(model='FP16/face-detection-retail-0004.xml', weights='FP16/face-detection-retail-0004.bin')
exec_net = ie.load_network(network=net, device_name="MYRIAD")

# モデルの読み込み （感情分類）
net_emotion = ie.read_network(model='FP16/emotions-recognition-retail-0003.xml', weights='FP16/emotions-recognition-retail-0003.bin')
exec_net_emotion = ie.load_network(network=net_emotion, device_name="MYRIAD")

# カメラ準備 
cap = cv2.VideoCapture(0)

# メインループ 
while True:
    ret, frame = cap.read()

    # Reload on error 
    if ret == False:
        continue

    # 入力データフォーマットへ変換 
    img = cv2.resize(frame, (300, 300))   # サイズ変更 
    img = img.transpose((2, 0, 1))    # HWC > CHW 
    img = np.expand_dims(img, axis=0) # 次元合せ 

    # 推論実行 
    out = exec_net.infer(inputs={'data': img})

    # 出力から必要なデータのみ取り出し 
    out = out['detection_out']
    out = np.squeeze(out) #サイズ1の次元を全て削除 

    # 検出されたすべての顔領域に対して１つずつ処理 
    for detection in out:
        # conf値の取得 
        confidence = float(detection[2])

        # バウンディングボックス座標を入力画像のスケールに変換 
        xmin = int(detection[3] * frame.shape[1])
        ymin = int(detection[4] * frame.shape[0])
        xmax = int(detection[5] * frame.shape[1])
        ymax = int(detection[6] * frame.shape[0])

        # conf値が0.5より大きい場合のみ感情推論とバウンディングボックス表示 
        if confidence > 0.5:
           # 顔検出領域はカメラ範囲内に補正する。特にminは補正しないとエラーになる 
            if xmin < 0:
                xmin = 0
            if ymin < 0:
                ymin = 0
            if xmax > frame.shape[1]:
                xmax = frame.shape[1]
            if ymax > frame.shape[0]:
                ymax = frame.shape[0]

            # 顔領域のみ切り出し 
            frame_face = frame[ ymin:ymax, xmin:xmax ]

            # 入力データフォーマットへ変換 
            img = cv2.resize(frame_face, (64, 64))   # サイズ変更 
            img = img.transpose((2, 0, 1))    # HWC > CHW 
            img = np.expand_dims(img, axis=0) # 次元合せ 

            # 推論実行 
            out = exec_net_emotion.infer(inputs={'data': img})

            # 出力から必要なデータのみ取り出し 
            out = out['prob_emotion']
            out = np.squeeze(out) #不要な次元の削減 

            # 出力値が最大のインデックスを得る 
            index_max = np.argmax(out)

            # 各感情の文字列をリスト化 
            list_emotion = ['neutral', 'happy', 'sad', 'surprise', 'anger']

            # 文字列描画 
            #cv2.putText(frame, list_emotion[index_max], (20, 60), cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 255, 255), 4) 

            # バウンディングボックス表示 
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(240, 180, 0), thickness=3)

            # 棒グラフ表示 
            str_emotion = ['neu', 'hap', 'sad', 'sur', 'ang']
            text_x = 10
            text_y = frame.shape[0] - 180
            rect_x = 80
            rect_y = frame.shape[0] - 200
            for i in range(5):
                cv2.putText(frame, str_emotion[i], (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, 1, (240, 180, 0), 2)
                cv2.rectangle(frame, (rect_x, rect_y), (rect_x + int(300 * out[i]), rect_y + 20), color=(240, 180, 0), thickness=-1)
                text_y = text_y + 40
                rect_y = rect_y + 40

            # 顔アイコン表示 
            icon_emotion[index_max].show(frame, frame.shape[1] - 110, frame.shape[0] - 110)

            # １つの顔で終了 
            break

    # 画像表示 
    cv2.imshow('frame', frame)

    # 何らかのキーが押されたら終了 
    key = cv2.waitKey(1)
    if key != -1:
        break

# 終了処理 
cap.release()
cv2.destroyAllWindows()

プログラムのポイント †

モジュール読み込みの後に pngoverlay クラスの読み込みを行い、５つの画像それぞれに対しインスタンスを生成。

プログラムで扱いやすいように５つのインスタンスをリストにまとめている。

# pngoverlayクラス読み込み 
from pngoverlay import PNGOverlay
 
# インスタンス生成 
icon_neutral = PNGOverlay('image/icon_neutral.png')
icon_happy = PNGOverlay('image/icon_happy.png')
icon_sad = PNGOverlay('image/icon_sad.png')
icon_surprise = PNGOverlay('image/icon_surprise.png')
icon_anger = PNGOverlay('image/icon_anger.png')
 
# インスタンス変数をリストにまとめる 
icon_emotion = [icon_neutral, icon_happy, icon_sad, icon_surprise, icon_anger]

左上に表示していた文字列をコメントアウトして非表示にする。

# 文字列描画
#cv2.putText(frame, list_emotion[index_max], (20, 60), cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 255, 255), 4)

forループの最後のbrakeの前に感情分類が最大のインデックスであるインスタンスに対しshowメソッド呼び出しを行い、顔アイコン画像を表示する。
```
# 顔アイコン表示 
icon_emotion[index_max].show(frame, frame.shape[1] - 110, frame.shape[0] - 110)
```

以上で「Neural Compute Stick と OpenVINO™ でゼロから学ぶディープラーニング推論 (第1回～第9回)」終了

参考資料 †

INTEL® オフィシャル・ドキュメント
- OpenVINO™ Toolkit Overview
- Install OpenVINO™ toolkit for Raspbian* OS
- API ドキュメント → Overview of Inference Engine Python* API
- 学習済みモデルの場所 → INTEL OPENSOURCE.org
- 学習済みモデルのドキュメント → Overview of OpenVINO™ Toolkit Intel's Pre-Trained Models

今後の展開へのヒント
- AIを始めよう！OpenVINOで使うモデルを整備する
- AIを始めよう！PythonでOpenVINOの仕組みを理解する

第9回リアルタイム感情分析アプリ

Last-modified: 2021-02-12 (金) 10:35:11