NCAppVol2

Neural Compute Application を作る 2 †

　OpenVINO™ ツールキットの学習済みモデルとサンプルプログラムを参考に Neural Compute Application を製作してみる。

Neural Compute Application を作る 2
- 人物追跡アプリケーション
- 更新履歴
参考資料

※ 最終更新:2021/09/30　

↑

人物追跡アプリケーション †

　機械学習で画像から人物を検出し、人物のフレーム間での一致を調べて、追跡する。
しくみの解説ほかサイトから引用 → https://dev.classmethod.jp/articles/person-reidentification/

↑

しくみ †

　人物の追跡は、２つのモデルを組み合わせることで動作する。

人物検出モデル
個人識別モデル　人物検出モデルで、人を検出しその部分を切り出し、個人識別モデルで、部分画像の識別情報を生成、インデックスを付けてデータベース化する。
　各フレームでは、人物画像をデータベースと照合し、類似性から同じ人物であるかどうかを判断している。

↑

学習済みモデル †

person-detection-retail-0013（人物検出）

Inputs
Name: input, shape: [1x3x320x544] - An input image in the format [BxCxHxW], where:
・B - batch size
・C - number of channels
・H - image height
・W - image width
Expected color order is BGR.
Outputs
[1, 1, N, 7]
Nは検出された境界ボックスの数
[image_id, label, conf, x_min, y_min, x_max, y_max]
・image_id - バッチ内の画像のID
・label - クラスID
・conf - 信頼度
・(x_min, y_min) - 境界ボックス
・(x_max, y_max) - 境界ボックス

person-reidentification-retail-0287（個人識別）
- Inputs
  The net expects one input image of the shape [1x3x256x128] in the [BxCxHxW] format, where:
  ・B - batch size
  ・C - number of channels
  ・H - image height
  ・W - image width
  The expected color order is BGR.
- Outputs
  [1、256] コサイン類似度で他の識別子と比較できる

↑

識別情報のデータベース †

　識別情報にインデックスを付けて、データベース化する。引用サイトによれば「適用する分野（動画の種類）に応じて、ロジックの変更が必要になる」とある。

高い類似度
　高い類似度でインデックスが得られた場合は、そのインデックスをそのまま利用すればい良い。
低い類似度
低い類似度でインデックスが得られた場合、たまたま画像の類似性が低いだけなのか、それとも、別の人物なのかの判断が必要になる。
類似性が低いというだけで単純にデータベースに追加してしまうと、同じ人物のインデックスが複数となってしまい、その後の判定に悪影響がある。
データベースに位置情報を追加し、最後に検出された位置を記録している。類似度が低かった場合、その距離で、同一人物か、別人なのかを判定している。
別人と判定された場合は、データベースにインデックスが追加される。
データベース更新
データベースの識別情報の更新にも注意が必要。
フレームによっては、他の人物と重なっていたりして、その特徴をうまく表現できていない場合もある。
他の人物と重なりがない場合のみ、その識別情報を更新することにしている。

↑

動作環境 †

プログラムの場所：~/workspace/demo/
プログラムの実行：python3 person-tracking.py, person-tracking2.py

↑

「person-tracking.py」 †

▼　入力簡易版

コマンドオプション	デフォールト設定	意味
-i, --image	../../Videos/video003.mp4	カメラ(cam)または入力動画ファイル
--threshold	0.8	検出の閾値

▼「person-tracking.py」

# -*- coding: utf-8 -*-
##------------------------------------------
## OpenVINO™ toolkit
##   Person Re-Identificationで人物を追跡
##
## model: person-detection-retail-0013
##        person-reidentification-retail-0287
##
##               2021.03.10 Masahiro Izutsu
##------------------------------------------
## person-tracking.py

import sys
import argparse
import numpy as np
import time
import random
import cv2
from openvino.inference_engine import get_version
from openvino.inference_engine import IECore
from model import Model

# Color Escape Code
GREEN = '\033[1;32m'
RED = '\033[1;31m'
NOCOLOR = '\033[0m'
YELLOW = '\033[1;33m'

# 定数定義
DEVICE = "MYRIAD"
MODULE_DETECTOR = '../FP16/person-detection-retail-0013'
MODULE_REIDENTIFCATION  = '../FP16/person-reidentification-retail-0287'
MOVIE = "../../Videos/video003.mp4"
THRESHOLD= 0.8
TRACKING_MAX=50
SCALE = 1.0

# タイトル・バージョン情報
title = 'Person Tracking'
print(GREEN)
print('--- {} ---'.format(title))
print(cv2.__version__)
print("OpenVINO inference_engine:", get_version())
print(NOCOLOR)

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser(description = 'Image classifier using \
                         Intel® Neural Compute Stick 2.' )
    parser.add_argument( '-i', '--image', metavar = 'IMAGE_FILE', 
                        type=str, default = MOVIE,
                        help = 'Absolute path to movie file or cam for camera stream.')
    parser.add_argument( '--threshold', metavar = 'FLOAT', 
                        type=float, default = THRESHOLD,
                        help = 'Threshold for detection.')
    return parser

class PersonDetector(Model):

    def __init__(self, model_path, device, ie_core, threshold, num_requests):
        super().__init__(model_path, device, ie_core, num_requests, None)
        _, _, h, w = self.input_size
        self.__input_height = h
        self.__input_width = w
        self.__threshold = threshold

    def __prepare_frame(self, frame):
        initial_h, initial_w = frame.shape[:2]
        scale_h, scale_w = initial_h / float(self.__input_height), initial_w / float(self.__input_width)
        in_frame = cv2.resize(frame, (self.__input_width, self.__input_height))
        in_frame = in_frame.transpose((2, 0, 1))
        in_frame = in_frame.reshape(self.input_size)
        return in_frame, scale_h, scale_w

    def infer(self, frame):
        in_frame, _, _ = self.__prepare_frame(frame)
        result = super().infer(in_frame)

        detections = []
        height, width = frame.shape[:2]
        for r in result[0][0]:
            conf = r[2]
            if(conf > self.__threshold):
                x1 = int(r[3] * width)
                y1 = int(r[4] * height)
                x2 = int(r[5] * width)
                y2 = int(r[6] * height)
                detections.append([x1, y1, x2, y2, conf])
        return detections

class PersonReidentification(Model):

    def __init__(self, model_path, device, ie_core, threshold, num_requests):
        super().__init__(model_path, device, ie_core, num_requests, None)
        _, _, h, w = self.input_size
        self.__input_height = h
        self.__input_width = w
        self.__threshold = threshold

    def __prepare_frame(self, frame):
        initial_h, initial_w = frame.shape[:2]
        scale_h, scale_w = initial_h / float(self.__input_height), initial_w / float(self.__input_width)
        in_frame = cv2.resize(frame, (self.__input_width, self.__input_height))
        in_frame = in_frame.transpose((2, 0, 1))
        in_frame = in_frame.reshape(self.input_size)
        return in_frame, scale_h, scale_w

    def infer(self, frame):
        in_frame, _, _ = self.__prepare_frame(frame)
        result =  super().infer(in_frame)
        return np.delete(result, 1)

class Tracker:
    def __init__(self):
        # 識別情報のDB
        self.identifysDb = None
        # 中心位置のDB
        self.center = []
    
    def __getCenter(self, person):
        x = person[0] - person[2]
        y = person[1] - person[3]
        return (x,y)

    def __getDistance(self, person, index):
        (x1, y1) = self.center[index]
        (x2, y2) = self.__getCenter(person)
        a = np.array([x1, y1])
        b = np.array([x2, y2])
        u = b - a
        return np.linalg.norm(u)

    def __isOverlap(self, persons, index):
        [x1, y1, x2, y2] = persons[index]
        for i, person in enumerate(persons):
            if(index == i):
                continue
            if(max(person[0], x1) <= min(person[2], x2) and max(person[1], y1) <= min(person[3], y2)):
                return True
        return False

    def getIds(self, identifys, persons):
        if(identifys.size==0):
            return []
        if self.identifysDb is None:
            self.identifysDb = identifys
            for person in persons:
                self.center.append(self.__getCenter(person))
        
        print("input: {} DB:{}".format(len(identifys), len(self.identifysDb)))
        similaritys = self.__cos_similarity(identifys, self.identifysDb)
        similaritys[np.isnan(similaritys)] = 0
        ids = np.nanargmax(similaritys, axis=1)

        for i, similarity in enumerate(similaritys):
            persionId = ids[i]
            d = self.__getDistance(persons[i], persionId)
            print("persionId:{} {} distance:{}".format(persionId,similarity[persionId], d))
            # 0.95以上で、重なりの無い場合、識別情報を更新する
            if(similarity[persionId] > 0.95):
                if(self.__isOverlap(persons, i) == False):
                    self.identifysDb[persionId] = identifys[i]
            # 0.5以下で、距離が離れている場合、新規に登録する
            elif(similarity[persionId] < 0.5):
                if(d > 500):
                    print("distance:{} similarity:{}".format(d, similarity[persionId]))
                    self.identifysDb = np.vstack((self.identifysDb, identifys[i]))
                    self.center.append(self.__getCenter(persons[i]))
                    ids[i] = len(self.identifysDb) - 1
                    print("> append DB size:{}".format(len(self.identifysDb)))

        print(ids)
        # 重複がある場合は、信頼度の低い方を無効化する
        for i, a in enumerate(ids):
            for e, b in enumerate(ids):
                if(e == i):
                    continue
                if(a == b):
                    if(similarity[a] > similarity[b]):
                        ids[i] = -1
                    else:
                        ids[e] = -1
        print(ids)
        return ids

    # コサイン類似度
    # 参考にさせて頂きました: https://github.com/kodamap/person_reidentification
    def __cos_similarity(self, X, Y):
        m = X.shape[0]
        Y = Y.T
        return np.dot(X, Y) / (
            np.linalg.norm(X.T, axis=0).reshape(m, 1) * np.linalg.norm(Y, axis=0)
        )

# モデル基本情報の表示
def display_info(image, threshold):
    print(YELLOW + title + ': Starting application...' + NOCOLOR)
    print('   - ' + YELLOW + 'Image File:   ' + NOCOLOR, image)
    print('   - ' + YELLOW + 'Threshold:    ' + NOCOLOR, threshold)

# 画像の種類を判別する
#   戻り値: 'jeg''png'... 画像ファイル
#           'None'        画像ファイル以外 (動画ファイル)
#           'NotFound'    ファイルが存在しない
import imghdr
def is_pict(filename):
    try:
        imgtype = imghdr.what(filename)
    except FileNotFoundError as e:
        imgtype = 'NotFound'
    return str(imgtype)

# ** main関数 **
def main():
    # Argument parsing and parameter setting
    ARGS = parse_args().parse_args()
    input_stream = ARGS.image
    if ARGS.image.lower() == "cam" or ARGS.image.lower() == "camera":
        input_stream = 0
    else:
        filetype = is_pict(input_stream)
        if (filetype == 'NotFound' or filetype !='None'):
            print(RED + "\ninput file Not found." + NOCOLOR)
            quit()

    detection_threshold = ARGS.threshold

    device = DEVICE
    cpu_extension = None
    ie_core = IECore()
    if device == "CPU" and cpu_extension:
        ie_core.add_extension(cpu_extension, "CPU")

    # 情報表示
    display_info(input_stream, detection_threshold)

    person_detector = PersonDetector("../FP16/person-detection-retail-0013", device, ie_core, detection_threshold, num_requests=2)
    personReidentification = PersonReidentification("../FP16/person-reidentification-retail-0287", device, ie_core, detection_threshold, num_requests=2)
    tracker = Tracker()

    cap = cv2.VideoCapture (input_stream)
    
    colors = []
    for i in range(TRACKING_MAX):
        b = random.randint(0, 255)
        g = random.randint(0, 255)
        r = random.randint(0, 255) 
        colors.append((b,g,r))

    while True:
        grabbed, frame = cap.read()
        if not grabbed:# ループ再生
            cap.set(cv2.CAP_PROP_POS_FRAMES, 0)
            continue
        if(frame is None):
            continue

        # Personを検知する
        persons = []
        detections =  person_detector.infer(frame)
        if(len(detections) > 0):
            print("-------------------")
            for detection in detections:
                x1 = int(detection[0])
                y1 = int(detection[1])
                x2 = int(detection[2])
                y2 = int(detection[3])
                conf = detection[4]
                print("{:.1f} ({},{})-({},{})".format(conf, x1, y1, x2, y2))
                persons.append([x1,y1,x2,y2])

        print("====================")
        # 各Personの画像から識別情報を取得する
        identifys = np.zeros((len(persons), 255))
        for i, person in enumerate(persons):
            # 各Personのimage取得
            img = frame[person[1] : person[3], person[0]: person[2]]
            h, w = img.shape[:2]
            if(h==0 or w==0):
                continue
            # identification取得
            identifys[i] = personReidentification.infer(img)

        # Idの取得
        ids = tracker.getIds(identifys, persons)

        # 枠及びIdを画像に追加
        for i, person in enumerate(persons):
            if(ids[i]!=-1):
                color = colors[int(ids[i])]
                frame = cv2.rectangle(frame, (person[0], person[1]), (person[2] ,person[3]), color, int(2))
                frame = cv2.putText(frame, str(ids[i]),  (person[0], person[1]), cv2.FONT_HERSHEY_PLAIN, int(2), color, int(2), cv2.LINE_AA)

        # 画像の縮小
        h, w = frame.shape[:2]
        frame = cv2.resize(frame, ((int(w * SCALE), int(h * SCALE))))
        # 画像の表示
        window_name = title + '  (hit key to exit)'
        cv2.imshow(window_name, frame)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()

# main関数エントリーポイント(実行開始)
if __name__ == "__main__":
    sys.exit(main())

↑

「person-tracking2.py」 †

▲　モデル対応版

実行時に利用できるコマンドオプション

コマンドオプション	デフォールト設定	意味
-h, --help	-	ヘルプ表示
-i, --image	../../Videos/video003.mp4	カメラ(cam)または動画・静止画像ファイル
-m_dt, --m_detector	必須指定	IR フォーマットの人物検出モデル
-m_re, --m_recognition	必須指定	IR フォーマット人物識別モデル
-d, --device	必須指定	デバイス指定 (CPU/MYRIAD)
--threshold	0.5	人物検出レベル
-s, --speed	y	スピード計測表示 (y/n)
-o, --out	non	処理結果を出力する場合のファイルパス

$ python3 person-tracking2.py -h

--- Person Tracking 2 ---
4.5.2-openvino
OpenVINO inference_engine: 2.1.2021.3.0-2787-60059f2c755-releases/2021/3

usage: person-tracking2.py [-h] [-i IMAGE_FILE] [-m_dt M_DETECTOR]
                           [-m_re M_REIDENTIFICATION] [-d DEVICE]
                           [--threshold FLOAT] [-s SPEED] [-o IMAGE_OUT]

optional arguments:
  -h, --help            show this help message and exit
  -i IMAGE_FILE, --image IMAGE_FILE
                        Absolute path to movie file or cam for camera stream.
  -m_dt M_DETECTOR, --m_detector M_DETECTOR
                        Detector Path to an .xml file with a trained
                        model.Default value is
                        /home/mizutu/model/intel/FP32/person-detection-
                        retail-0013.xml
  -m_re M_REIDENTIFICATION, --m_reidentification M_REIDENTIFICATION
                        Reidentification Path to an .xml file with a trained
                        model.Default value is
                        /home/mizutu/model/intel/FP32/person-reidentification-
                        retail-0287.xml
  -d DEVICE, --device DEVICE
                        Optional. Specify a target device to infer on. CPU,
                        GPU, FPGA, HDDL or MYRIAD is acceptable. The demo will
                        look for a suitable plugin for the device specified.
                        Default value is CPU
  --threshold FLOAT     Threshold for detection.
  -s SPEED, --speed SPEED
                        Speed display flag.(y/n) Default calue is 'y'
  -o IMAGE_OUT, --out IMAGE_OUT
                        Processed image file path. Default value is 'non'

実行結果

$ python3 person-tracking2.py

--- Person Tracking 2 ---
4.5.2-openvino
OpenVINO inference_engine: 2.1.2021.3.0-2787-60059f2c755-releases/2021/3

Person Tracking 2: Starting application...
   - Image File   :  ../../Videos/video003.mp4
   - m_detect     :  /home/mizutu/model/intel/FP32/person-detection-retail-0013.xml
   - m_redient.   :  /home/mizutu/model/intel/FP32/person-reidentification-retail-0287.xml
   - Device       :  CPU
   - Threshold    :  0.8
   - Speed flag   :  y
   - Processed out:  non
-------------------
1.0 (239,60)-(272,147)
1.0 (519,148)-(561,231)
1.0 (44,107)-(80,192)
1.0 (333,136)-(364,213)
1.0 (200,62)-(229,148)
1.0 (298,188)-(348,316)
1.0 (337,71)-(365,146)
0.9 (432,76)-(460,155)
0.9 (166,84)-(196,163)
0.9 (309,73)-(340,147)
0.9 (353,62)-(377,129)
0.9 (450,141)-(495,247)
0.8 (295,32)-(320,111)
0.8 (257,152)-(297,238)
====================
input: 14 DB:14
persionId:0 0.9999999999999998 distance:0.0
persionId:1 0.9999999999999996 distance:0.0
persionId:2 0.9999999999999987 distance:0.0
persionId:3 0.9999999999999986 distance:0.0
    :
    :
persionId:6 0.24618593948520365 distance:20.248456731316587
persionId:13 0.8072376997078758 distance:4.47213595499958
persionId:0 0.8331106142710226 distance:5.0990195135927845
persionId:7 0.14242560451747907 distance:14.142135623730951
persionId:6 0.854265302201669 distance:3.0
[ 9  7  5 10  2 12 11  4  6 13  0  7  6]
[ 9  7  5 10  2 12 11  4  6 13  0 -1 -1]

FPS average:       6.50

 Finished.

その他の実行例

$ python3 person-tracking2.py -i ~/Videos/video001.mp4

--- Person Tracking 2 ---
4.5.2-openvino
OpenVINO inference_engine: 2.1.2021.3.0-2787-60059f2c755-releases/2021/3

Person Tracking 2: Starting application...
   - Image File   :  /home/mizutu/Videos/video001.mp4
   - m_detect     :  /home/mizutu/model/intel/FP32/person-detection-retail-0013.xml
   - m_redient.   :  /home/mizutu/model/intel/FP32/person-reidentification-retail-0287.xml
   - Device       :  CPU
   - Threshold    :  0.8
   - Speed flag   :  y
   - Processed out:  non
-------------------
1.0 (93,39)-(174,306)
1.0 (230,46)-(321,361)
0.9 (148,67)-(216,336)
0.9 (209,45)-(268,290)
====================
input: 4 DB:4
persionId:0 0.9999999999999988 distance:0.0
persionId:1 0.999999999999998 distance:0.0
persionId:2 0.9999999999999991 distance:0.0
persionId:3 0.9999999999999991 distance:0.0
[0 1 2 3]
[0 1 2 3]
    :
    :
persionId:3 0.6374016092300142 distance:13.038404810405298
persionId:2 0.7930883751295879 distance:6.0
[0 1 3 2]
[0 1 3 2]

FPS average:      14.80

 Finished.

その他のコマンド入力例 (Command Helper)

$ cd ~/workspace/demo

◦ CPU

$ python3 person-tracking2.py -i ~/Videos/video001.mp4

$ python3 person-tracking2.py -i ~/Videos/video002.mp4

$ python3 person-tracking2.py -i cam

◦ NCS2(MYRIAD)

$ python3 person-tracking2.py -m_dt ~/model/intel/FP16/person-detection-retail-0013.xml -m_re ~/model/intel/FP16/person-reidentification-retail-0287.xml -d MYRIAD

$ python3 person-tracking2.py -i ~/Videos/video001.mp4 -m_dt ~/model/intel/FP16/person-detection-retail-0013.xml -m_re ~/model/intel/FP16/person-reidentification-retail-0287.xml -d MYRIAD

$ python3 person-tracking2.py -i ~/Videos/video002.mp4 -m_dt ~/model/intel/FP16/person-detection-retail-0013.xml -m_re ~/model/intel/FP16/person-reidentification-retail-0287.xml -d MYRIAD

$ python3 person-tracking2.py -i cam -m_dt ~/model/intel/FP16/person-detection-retail-0013.xml -m_re ~/model/intel/FP16/person-reidentification-retail-0287.xml -d MYRIAD

▼「person-tracking2.py」

# -*- coding: utf-8 -*-
# -*- coding: utf-8 -*-
##------------------------------------------
## OpenVINO™ toolkit
##   Person Re-Identificationで人物を追跡
##
## model: person-detection-retail-0013
##        person-reidentification-retail-0287
##
##               2021.03.10 Masahiro Izutsu
##------------------------------------------
##   2021.03.25 model/device parameter
##   2021.06.23 fps display

import sys
import argparse
import numpy as np
import time
import random
import cv2
from openvino.inference_engine import get_version
from openvino.inference_engine import IECore
from model import Model
import mylib

# Color Escape Code
GREEN = '\033[1;32m'
RED = '\033[1;31m'
NOCOLOR = '\033[0m'
YELLOW = '\033[1;33m'

# 定数定義
MOVIE = "../../Videos/video003.mp4"
THRESHOLD= 0.8
TRACKING_MAX=50
SCALE = 1.0

from os.path import expanduser
MODEL_DEF_DETECT = expanduser('~/model/intel/FP32/person-detection-retail-0013.xml')
MODEL_DEF_REIDE  = expanduser('~/model/intel/FP32/person-reidentification-retail-0287.xml')

# タイトル・バージョン情報
title = 'Person Tracking 2'
print(GREEN)
print('--- {} ---'.format(title))
print(cv2.__version__)
print("OpenVINO inference_engine:", get_version())
print(NOCOLOR)

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type=str, default = MOVIE,
            help = 'Absolute path to movie file or cam for camera stream.')
    parser.add_argument('-m_dt', '--m_detector', type = str,
            default = MODEL_DEF_DETECT,
            help = 'Detector Path to an .xml file with a trained model.'
            'Default value is '+MODEL_DEF_DETECT)
    parser.add_argument('-m_re', '--m_reidentification', type = str,
            default = MODEL_DEF_REIDE,
            help = 'Reidentification Path to an .xml file with a trained model.'
            'Default value is '+MODEL_DEF_REIDE)
    parser.add_argument('-d', '--device', default = 'CPU', type = str,
            help = 'Optional. Specify a target device to infer on. CPU, GPU, FPGA, HDDL or MYRIAD is '
                   'acceptable. The demo will look for a suitable plugin for the device specified. '
                   'Default value is CPU')
    parser.add_argument('--threshold', metavar = 'FLOAT', type = float, default = THRESHOLD,
            help = 'Threshold for detection.')
    parser.add_argument('-s', '--speed', metavar = 'SPEED',
            default = 'y',
            help = 'Speed display flag.(y/n) Default calue is \'y\'')
    parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT',
            default = 'non',
            help = 'Processed image file path. Default value is \'non\'')
    return parser

class PersonDetector(Model):

    def __init__(self, model_path, device, ie_core, threshold, num_requests):
        super().__init__(model_path, device, ie_core, num_requests, None)
        _, _, h, w = self.input_size
        self.__input_height = h
        self.__input_width = w
        self.__threshold = threshold

    def __prepare_frame(self, frame):
        initial_h, initial_w = frame.shape[:2]
        scale_h, scale_w = initial_h / float(self.__input_height), initial_w / float(self.__input_width)
        in_frame = cv2.resize(frame, (self.__input_width, self.__input_height))
        in_frame = in_frame.transpose((2, 0, 1))
        in_frame = in_frame.reshape(self.input_size)
        return in_frame, scale_h, scale_w

    def infer(self, frame):
        in_frame, _, _ = self.__prepare_frame(frame)
        result = super().infer(in_frame)

        detections = []
        height, width = frame.shape[:2]
        for r in result[0][0]:
            conf = r[2]
            if(conf > self.__threshold):
                x1 = int(r[3] * width)
                y1 = int(r[4] * height)
                x2 = int(r[5] * width)
                y2 = int(r[6] * height)
                detections.append([x1, y1, x2, y2, conf])
        return detections

class PersonReidentification(Model):

    def __init__(self, model_path, device, ie_core, threshold, num_requests):
        super().__init__(model_path, device, ie_core, num_requests, None)
        _, _, h, w = self.input_size
        self.__input_height = h
        self.__input_width = w
        self.__threshold = threshold

    def __prepare_frame(self, frame):
        initial_h, initial_w = frame.shape[:2]
        scale_h, scale_w = initial_h / float(self.__input_height), initial_w / float(self.__input_width)
        in_frame = cv2.resize(frame, (self.__input_width, self.__input_height))
        in_frame = in_frame.transpose((2, 0, 1))
        in_frame = in_frame.reshape(self.input_size)
        return in_frame, scale_h, scale_w

    def infer(self, frame):
        in_frame, _, _ = self.__prepare_frame(frame)
        result =  super().infer(in_frame)
        return np.delete(result, 1)

class Tracker:
    def __init__(self):
        # 識別情報のDB
        self.identifysDb = None
        # 中心位置のDB
        self.center = []
    
    def __getCenter(self, person):
        x = person[0] - person[2]
        y = person[1] - person[3]
        return (x,y)

    def __getDistance(self, person, index):
        (x1, y1) = self.center[index]
        (x2, y2) = self.__getCenter(person)
        a = np.array([x1, y1])
        b = np.array([x2, y2])
        u = b - a
        return np.linalg.norm(u)

    def __isOverlap(self, persons, index):
        [x1, y1, x2, y2] = persons[index]
        for i, person in enumerate(persons):
            if(index == i):
                continue
            if(max(person[0], x1) <= min(person[2], x2) and max(person[1], y1) <= min(person[3], y2)):
                return True
        return False

    def getIds(self, identifys, persons):
        if(identifys.size==0):
            return []
        if self.identifysDb is None:
            self.identifysDb = identifys
            for person in persons:
                self.center.append(self.__getCenter(person))
        
        print("input: {} DB:{}".format(len(identifys), len(self.identifysDb)))
        similaritys = self.__cos_similarity(identifys, self.identifysDb)
        similaritys[np.isnan(similaritys)] = 0
        ids = np.nanargmax(similaritys, axis=1)

        for i, similarity in enumerate(similaritys):
            persionId = ids[i]
            d = self.__getDistance(persons[i], persionId)
            print("persionId:{} {} distance:{}".format(persionId,similarity[persionId], d))
            # 0.95以上で、重なりの無い場合、識別情報を更新する
            if(similarity[persionId] > 0.95):
                if(self.__isOverlap(persons, i) == False):
                    self.identifysDb[persionId] = identifys[i]
            # 0.5以下で、距離が離れている場合、新規に登録する
            elif(similarity[persionId] < 0.5):
                if(d > 500):
                    print("distance:{} similarity:{}".format(d, similarity[persionId]))
                    self.identifysDb = np.vstack((self.identifysDb, identifys[i]))
                    self.center.append(self.__getCenter(persons[i]))
                    ids[i] = len(self.identifysDb) - 1
                    print("> append DB size:{}".format(len(self.identifysDb)))

        print(ids)
        # 重複がある場合は、信頼度の低い方を無効化する
        for i, a in enumerate(ids):
            for e, b in enumerate(ids):
                if(e == i):
                    continue
                if(a == b):
                    if(similarity[a] > similarity[b]):
                        ids[i] = -1
                    else:
                        ids[e] = -1
        print(ids)
        return ids

    # コサイン類似度
    # 参考にさせて頂きました: https://github.com/kodamap/person_reidentification
    def __cos_similarity(self, X, Y):
        m = X.shape[0]
        Y = Y.T
        return np.dot(X, Y) / (
            np.linalg.norm(X.T, axis=0).reshape(m, 1) * np.linalg.norm(Y, axis=0)
        )

# モデル基本情報の表示
def display_info(image, detector, reidentification, device, threshold, speedflg, outpath):
    print(YELLOW + title + ': Starting application...' + NOCOLOR)
    print('   - ' + YELLOW + 'Image File   : ' + NOCOLOR, image)
    print('   - ' + YELLOW + 'm_detect     : ' + NOCOLOR, detector)
    print('   - ' + YELLOW + 'm_redient.   : ' + NOCOLOR, reidentification)
    print('   - ' + YELLOW + 'Device       : ' + NOCOLOR, device)
    print('   - ' + YELLOW + 'Threshold    : ' + NOCOLOR, threshold)
    print('   - ' + YELLOW + 'Speed flag   : ' + NOCOLOR, speedflg)
    print('   - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath)

# 画像の種類を判別する
#   戻り値: 'jeg''png'... 画像ファイル
#           'None'        画像ファイル以外 (動画ファイル)
#           'NotFound'    ファイルが存在しない
import imghdr
def is_pict(filename):
    try:
        imgtype = imghdr.what(filename)
    except FileNotFoundError as e:
        imgtype = 'NotFound'
    return str(imgtype)

# ** main関数 **
def main():
    # Argument parsing and parameter setting
    ARGS = parse_args().parse_args()
    input_stream = ARGS.image
    if ARGS.image.lower() == "cam" or ARGS.image.lower() == "camera":
        input_stream = 0
    else:
        filetype = is_pict(input_stream)
        if (filetype == 'NotFound' or filetype !='None'):
            print(RED + "\ninput file Not found." + NOCOLOR)
            quit()

    isstream = True
    detection_threshold = ARGS.threshold
    speedflg = ARGS.speed
    model_detector=ARGS.m_detector
    model_reidentification=ARGS.m_reidentification
    outpath = ARGS.out
    
    device = ARGS.device
    cpu_extension = None
    ie_core = IECore()
    if device == "CPU" and cpu_extension:
        ie_core.add_extension(cpu_extension, "CPU")

    # 情報表示
    display_info(input_stream, model_detector, model_reidentification, device, detection_threshold, speedflg, outpath)

    person_detector = PersonDetector(model_detector, device, ie_core, detection_threshold, num_requests=2)
    personReidentification = PersonReidentification(model_reidentification, device, ie_core, detection_threshold, num_requests=2)
    tracker = Tracker()

    # 入力準備
    cap = cv2.VideoCapture (input_stream)

    # 処理結果の記録 step1
    if (outpath != 'non'):
        if (isstream):
            fps = int(cap.get(cv2.CAP_PROP_FPS))
            out_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
            out_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
            fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
            outvideo = cv2.VideoWriter(outpath, fourcc, fps, (out_w, out_h))

    colors = []
    for i in range(TRACKING_MAX):
        b = random.randint(0, 255)
        g = random.randint(0, 255)
        r = random.randint(0, 255) 
        colors.append((b,g,r))

    # 計測値初期化
    fpsWithTick = mylib.fpsWithTick()
    frame_count = 0
    fps_total = 0
    fpsWithTick.get()                       # fps計測開始

    # メインループ 
    while True:
        grabbed, frame = cap.read()
        if not grabbed:# ループ再生
            break
        if(frame is None):
            break

        # Personを検知する
        persons = []
        detections =  person_detector.infer(frame)
        if(len(detections) > 0):
            print("-------------------")
            for detection in detections:
                x1 = int(detection[0])
                y1 = int(detection[1])
                x2 = int(detection[2])
                y2 = int(detection[3])
                conf = detection[4]
                print("{:.1f} ({},{})-({},{})".format(conf, x1, y1, x2, y2))
                persons.append([x1,y1,x2,y2])

        print("====================")
        # 各Personの画像から識別情報を取得する
        identifys = np.zeros((len(persons), 255))
        for i, person in enumerate(persons):
            # 各Personのimage取得
            img = frame[person[1] : person[3], person[0]: person[2]]
            h, w = img.shape[:2]
            if(h==0 or w==0):
                continue
            # identification取得
            identifys[i] = personReidentification.infer(img)

        # Idの取得
        ids = tracker.getIds(identifys, persons)

        # 枠及びIdを画像に追加
        for i, person in enumerate(persons):
            if(ids[i]!=-1):
                color = colors[int(ids[i])]
                frame = cv2.rectangle(frame, (person[0], person[1]), (person[2] ,person[3]), color, int(2))
                frame = cv2.putText(frame, str(ids[i]),  (person[0], person[1]), cv2.FONT_HERSHEY_PLAIN, int(2), color, int(2), cv2.LINE_AA)

        # 画像の縮小
        h, w = frame.shape[:2]
        frame = cv2.resize(frame, ((int(w * SCALE), int(h * SCALE))))

        # FPSを計算する
        fps = fpsWithTick.get()
        st_fps = 'fps: {:>6.2f}'.format(fps)
        if (speedflg == 'y'):
            cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1)
            cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA)

        # 画像の表示
        window_name = title + "  (hit 'q' or 'esc' key to exit)"
        cv2.namedWindow(window_name, cv2.WINDOW_AUTOSIZE)    # 2021/0820
        cv2.imshow(window_name, frame)

        # 処理結果の記録 step2
        if (outpath != 'non'):
            if (isstream):
                outvideo.write(frame)
            else:
                cv2.imwrite(outpath, frame)

        key = cv2.waitKey(1)
        ESC_KEY = 27
        if key in {ord('q'), ord('Q'), ESC_KEY}:
            break

    # 終了処理 
    if (isstream):
        cap.release()

        # 処理結果の記録 step3
        if (outpath != 'non'):
            if (isstream):
                outvideo.release()

    cv2.destroyAllWindows()

    print('\nFPS average: {:>10.2f}'.format(fpsWithTick.get_average()))
    print('\n Finished.')

# main関数エントリーポイント(実行開始)
if __name__ == "__main__":
    sys.exit(main())

▼「model.py」

# -*- coding: utf-8 -*-
##------------------------------------------
## OpenVINO™ toolkit
##   model access base class
##
##               2021.03.10 Masahiro Izutsu
##------------------------------------------
## https://github.com/openvinotoolkit/open_model_zoo/blob/master/demos/python_demos/asl_recognition_demo/asl_recognition_demo/common.py
## model.py

class Model:
    def __init__(self, model_path, device, ie_core, num_requests, output_shape=None):
        if model_path.endswith((".xml", ".bin")):
            model_path = model_path[:-4]
        self.net = ie_core.read_network(model_path + ".xml", model_path + ".bin")
        self.exec_net = ie_core.load_network(network=self.net, device_name=device, num_requests=num_requests)

        self.input_name = next(iter(self.net.input_info))
        if len(self.net.outputs) > 1:
            if output_shape is not None:
                candidates = []
                for candidate_name in self.net.outputs:
                    candidate_shape = self.exec_net.requests[0].output_blobs[candidate_name].buffer.shape
                    if len(candidate_shape) != len(output_shape):
                        continue

                    matches = [src == trg or trg < 0
                               for src, trg in zip(candidate_shape, output_shape)]
                    if all(matches):
                        candidates.append(candidate_name)

                if len(candidates) != 1:
                    raise Exception("One output is expected")

                self.output_name = candidates[0]
            else:
                raise Exception("One output is expected")
        else:
            self.output_name = next(iter(self.net.outputs))

        self.input_size = self.net.input_info[self.input_name].input_data.shape
        self.output_size = self.exec_net.requests[0].output_blobs[self.output_name].buffer.shape
        self.num_requests = num_requests

    def infer(self, data):
        input_data = {self.input_name: data}
        infer_result = self.exec_net.infer(input_data)
        return infer_result[self.output_name]

↑

更新履歴 †

2021/03/10 初版
2021/04/01 ソースコード畳み込み表示
2021/04/02 モデル入力対応版
2021/06/23 fps計測表示対応版
2021/09/30 ソース修正

↑

最新の20件