TinyYolo1

TinyYolo v1 で物体検出 †

　TinyYolo v1 による物体検出のサンプルプログラムを解読する。

※ 最終更新:2021/04/01　

TinyYolo推論実行 †

　TinyYoloは「物体検出」が可能なモデルでありGoogLeNetやGenderNetなどの「画像認識」モデルは１画像に対して１つの認識しか出来ないが、「物体検出」は１画像に対して複数の認識が可能であり、認識した物体の位置や大きさも知ることができる。
　Neural Compute Application Zoo (NC App Zoo) 公開のサンプルから原理を検証する。

↑

推論実行結果 †

プログラムの場所
```
~/ncappzoo/networks/tiny-yolo-v1/
```

プログラムの実行

~/ncappzoo/networks/tiny_yolo_v1 $ python3 tiny_yolo_v1.py
~/ncappzoo/networks/tiny_yolo_v1 $ python3 tiny_yolo_v1.py -i dog.jpg
~/ncappzoo/networks/tiny_yolo_v1 $ python3 tiny_yolo_v1.py -i person.jpg

パラメータ・オプション
```
-i (画像ファイルのパス)
```

↑

認識できる物体の種類 (label.txt) †

object	(日本語)
aeroplane	飛行機
bicycle	自転車
bird	鳥
boat	ボート
bottle	ボトル
bus	バス
car	車
cat	猫
chair	椅子
cow	牛
diningtable	ダイニングテーブル
dog	犬
horse	馬
motorbike	バイク
person	人
pottedplant	植木鉢
sheep	羊
sofa	ソファー
train	列車
tvmonitor	テレビ

↑

Yoloアルゴリズム †

　物体検出Yoloのアルゴリズムについて末尾の参考資料のサイトの記事を理解のためにまとめ直す。

・Yolo以前の物体検出は、物体領域を検出してから各領域に対して画像認識を行うというアルゴリズムが主流であったのに対し、YOLO(You Only Look Once)は物体領域検出と画像認識を同時に行うというアルゴリズムで、従来の物体検出より速く行うことが可能である。Tiny版は、小さなメモリで実行可能なモデルで、認識率よりも検出のスピードを重視している。
・物体領域検出はこのような感じで表現される。それぞれの矩形はバウンディングボックス(BB)と呼ばれていて、その領域検出の信用度が高いほど太い枠で表示されてる。

・Yolo以前のアルゴリズムでは、各BBについてCNN(Convolutional Neural Network 畳み込みニューラルネットワーク)による画像認識を行うという手法であった。CNNによる画像認識は処理時間がかかるので、できるだけ回数を減らした方が良い。
そこでYoloは、BBに関係なく画像を7×7のセルに分割し、各セル49箇所に対してのみCNN画像認識を行っている。
・BBは各セルに対して候補を２個だけに絞っている。

・各セルに対してのみCNN画像認識を行うイメージは左のようになる。

・各セルの２個のBBに対し、先ほどの画像認識の結果を融合すると左のようなイメージで成果物が得られる。
・ここまでディープラーニングを使って一気に処理される。

・このままだと非常に情報量の多い表示となる。そこで、その後プログラミングですっきりさせる処理を行う。
・閾値設定に基づき、信用度の高いところのみを抜き出し、重複箇所を消す。すると、左のような「すっきりした情報」になる。

↑

TinyYolo推論コード解読 †

　~/ncappzoo/networks/tiny_yolo_v1 にある tiny_yolo_v1.py を参考に「tiny_yolo_v1_a.py」とコードを書き直しながら理解を進めることにする。

↑

全体のプログラム構成 †

import関連
コマンド・オプション・パラメータ定義
filter_objects関数
get_duplicate_box_mask関数
boxes_to_pixel_units関数
get_intersection_over_union関数
display_objects_in_gui関数
main関数

↑

import関連 †

import sys
import numpy as np
import cv2
import argparse

↑

コマンド・オプション・パラメータ定義 †

def parse_args():
    parser = argparse.ArgumentParser(description = 'Image classifier using \
                         Intel® Neural Compute Stick 2.' )
    parser.add_argument( '--ir', metavar = 'IR_File',
                        type=str, default = 'tiny-yolo-v1_53000.xml', 
                        help = 'Absolute path to the neural network IR xml file.')
    parser.add_argument( '-l', '--labels', metavar = 'LABEL_FILE', 
                        type=str, default = 'labels.txt',
                        help='Absolute path to labels file.')
    parser.add_argument( '-i', '--image', metavar = 'IMAGE_FILE', 
                        type=str, default = '../../data/images/nps_chair.png',
                        help = 'Absolute path to image file.')
    parser.add_argument( '--threshold', metavar = 'FLOAT', 
                        type=float, default = DETECTION_THRESHOLD,
                        help = 'Threshold for detection.')
    parser.add_argument( '--iou', metavar = 'FLOAT', 
                        type=float, default = IOU_THRESHOLD,
                        help = 'Intersection Over Union.')
    return parser

コマンドオプション	デフォールト設定	意味
--ir	tiny-yolo-v1_53000.xml	学習済みIRファイル
-l, --label	labels.txt	ラベルファイル
-i, --image	../../data/images/nps_chair.jpg	入力画像ファイルパス
--threshold	0.1	表示する閾値
--iou	0.25	重なりを許す閾値 ※

※ コマンドで設定できるようにソース変更

↑

main関数 †

関数の構成
```
Device準備
Graph準備
Graphの割り当て
入力画像準備
推論実行
filter_objects関数
display_objects_in_gui関数
後片付け
```
　推論実行後に推論結果を表示するのではなく、filter_objects関数とdisplay_objects_in_gui関数を実行している。

TinyYoloの推論結果のデータ

　入力画像は7×7のセルに分割されている。
各セルに対して、２個のバウンディングボックス(BB)に関する情報があり、それが青い部分と緑色の部分。
情報の詳細は以下５つのデータ
- P(Object):「領域の存在確率」と呼ぶことにする。つまりこの値が高いと、物体領域としての信用度が高いということ。
- X:セルにおけるBBの中心X位置
- Y:セルにおけるBBの中心Y位置
- Width:画像におけるBBの幅
- Height:画像におけるBBの高さ

さらに各セルに対して、20個のクラス確率情報がある。(図の白い部分)
20種類の画像に対するそれぞれの確率で、ほかの推論モデルと同じ。

推論結果で filter_objects関数を実行する。
```
filtered_objs = filter_objects(output.astype(np.float32), input_image.shape[1], input_image.shape[2], label_list, threshold)
```
　filter_objectsの第一引数に推論結果 outputをデータの型を float32に合わせて渡す。第二引数は入力画像の幅、第三引数には入力画像の高さを渡す。戻り値が filterd_objs に入る。
　filterd_objs には20個のクラス確率情報から閾値で選択された情報が入ることになる。

得られた filterd_objs で display_objects_in_gui関数を実行する。
```
display_objects_in_gui(display_image, filtered_objs, input_image.shape[1], input_image.shape[2])
```
「入力画像」と先程の「すっきりした情報(filterd_objs)」を使って可視化、ウィンドウが開いて画像が表示される。

↑

filter_objects関数 †

推論結果を受け取る
```
# the raw number of floats returned from the inference
 num_inference_results = len(inference_result)
```
　main関数のoutput はinference_resultという変数で受け取りその配列の要素数を取得する。
1470要素あります。推論データの資料から計算してみると一致する。
1470 = 7 × 7 × ( 2 × 5 + 20 )

TinyYoloで認識できるクラスの文字列リスト
```
num_classes = len(labels) # should be 20
```
main関数で読み込まれたラベルファイルのデータ(20個)

BBを表示させるかどうかを決める閾値
　実行時のコマンド・オプションで自由に決めることができる

# only keep boxes with probabilities greater than this
 probability_threshold = threshold

数値が小さい程BBの表示数は増るが、ノイズも増える。

--threshold 0.001 の場合

pi@raspberrypi:~/ncappzoo/networks/tiny_yolo_v1 $ python3 tiny_yolo_v1.py -i dog.jpg --threshold 0.001

--- ** tiny-Yolo-v1** Object identification ---
4.5.1-openvino
OpenVINO inference_engine: 2.1.2021.2.0-1877-176bdf51370-releases/2021/2

Running NCS Caffe TinyYolo example...
tiny_yolo_v1_a.py:372: DeprecationWarning: 'inputs' property of IENetwork class is deprecated. To access DataPtrs user need to use 'input_data' property of InputInfoPtr objects which can be accessed by 'input_info' property.
  input_blob = next(iter(net.inputs))

Tiny Yolo v1: Starting application...
   - Plugin:       Myriad
   - IR File:      tiny-yolo-v1_53000.xml
   - Input Shape:  [1, 3, 448, 448]
   - Output Shape: [1, 1470]
   - Labels File:  labels.txt
   - Image File:    dog.jpg
   - Threshold:    0.001
   - Intersection Over Union:    0.25

 Displaying image with objects detected in GUI...
 Click in the GUI window and hit any key to exit.

 Found this many objects in the image: 30
 - object: car is at left: 444, top: 95, right: 614, bottom: 153
 - object: bicycle is at left: 105, top: 93, right: 641, bottom: 475
 - object: dog is at left: 100, top: 219, right: 314, bottom: 529
 - object: bicycle is at left: 242, top: 217, right: 310, bottom: 373
 - object: bicycle is at left: 301, top: 204, right: 439, bottom: 358
 - object: car is at left: 346, top: 93, right: 412, bottom: 129
 - object: bicycle is at left: 138, top: 129, right: 392, bottom: 307
 - object: dog is at left: 146, top: 292, right: 252, bottom: 448
 - object: car is at left: 130, top: 101, right: 228, bottom: 159
 - object: car is at left: 20, top: 59, right: 82, bottom: 143
 - object: car is at left: 473, top: 92, right: 685, bottom: 256
 - object: car is at left: 734, top: 106, right: 758, bottom: 162
 - object: car is at left: 194, top: 88, right: 326, bottom: 148
 - object: person is at left: 673, top: 152, right: 753, bottom: 398
 - object: bicycle is at left: 128, top: 151, right: 226, bottom: 225
 - object: car is at left: 555, top: 139, right: 659, bottom: 191
 - object: chair is at left: 444, top: 310, right: 504, bottom: 402
 - object: car is at left: 661, top: 99, right: 729, bottom: 143
 - object: bicycle is at left: 1, top: 104, right: 31, bottom: 300
 - object: bicycle is at left: 20, top: 82, right: 92, bottom: 316
 - object: car is at left: 0, top: 30, right: 42, bottom: 98
 - object: car is at left: 346, top: 56, right: 446, bottom: 90
 - object: boat is at left: 358, top: 192, right: 422, bottom: 240
 - object: chair is at left: 362, top: 333, right: 628, bottom: 537
 - object: horse is at left: 26, top: 225, right: 118, bottom: 483
 - object: horse is at left: 2, top: 285, right: 16, bottom: 423
 - object: horse is at left: 226, top: 181, right: 294, bottom: 231
 - object: aeroplane is at left: 132, top: 223, right: 200, bottom: 361
 - object: bus is at left: 172, top: 489, right: 364, bottom: 549
 - object: motorbike is at left: 439, top: 158, right: 515, bottom: 220

 Finished.

--threshold 0.10 (デフォールト値)の場合)

pi@raspberrypi:~/ncappzoo/networks/tiny_yolo_v1 $ python3 tiny_yolo_v1_a.py -i dog.jpg

--- ** tiny-Yolo-v1** Object identification ---
4.5.1-openvino
OpenVINO inference_engine: 2.1.2021.2.0-1877-176bdf51370-releases/2021/2

Running NCS Caffe TinyYolo example...
tiny_yolo_v1_a.py:372: DeprecationWarning: 'inputs' property of IENetwork class is deprecated. To access DataPtrs user need to use 'input_data' property of InputInfoPtr objects which can be accessed by 'input_info' property.
  input_blob = next(iter(net.inputs))

Tiny Yolo v1: Starting application...
   - Plugin:       Myriad
   - IR File:      tiny-yolo-v1_53000.xml
   - Input Shape:  [1, 3, 448, 448]
   - Output Shape: [1, 1470]
   - Labels File:  labels.txt
   - Image File:    dog.jpg
   - Threshold:    0.1
   - Intersection Over Union:    0.25

 Displaying image with objects detected in GUI...
 Click in the GUI window and hit any key to exit.

 Found this many objects in the image: 3
 - object: car is at left: 444, top: 95, right: 614, bottom: 153
 - object: bicycle is at left: 105, top: 93, right: 641, bottom: 475
 - object: dog is at left: 100, top: 219, right: 314, bottom: 529

 Finished.

学習時に決まっている値

num_classes = len(labels) # should be 20

grid_size = 7 # the image is a 7x7 grid.  Each box in the grid is 64x64 pixels
anchor_boxes_per_grid_cell = 2 # the number of anchor boxes returned for each grid cell

num_classesはクラス数です。20クラス
grid_sizeはグリッドサイズ 7
boxes_per_grid_cellは１セルあたりのBBの数で 2

class_probabilities は「各画像クラスの確率」
推論結果inference_resultから 980個の要素をスライスで取得。
980 - 0 = 7 × 7 × 20

# Class probabilities: 
# 7x7 = 49 grid cells. 
# 49 grid cells x 20 classes per grid cell = 980 total class probabilities
 class_probabilities = inference_result[0:980]

box_confidence_scores は BBの「領域の存在確率」
推論結果inference_resultから 98個の要素をスライスで取得。
1078 - 980 = 7 × 7 × 2

# Box confidence scores: 7x7 = 49 grid cells. "how likely the box contains an object" 
# 49 grid cells x 2 boxes per grid cell = 98 box scales
box_confidence_scores = inference_result[980:1078]

box_coordinates はBBの位置情報
```
# Box coordinates for all boxes 
# 98 boxes * 4 box coordinates each = 392
box_coordinates = inference_result[1078:]
```
推論結果 inference_result から残りの要素をスライスで取得。
1470 -1078 = 7 × 7 × 2 × 4
位置情報には以下の４つの情報が入っている。
- X:セルにおけるBBの中心X位置
- Y:セルにおけるBBの中心Y位置
- Width:画像におけるBBの幅
- Height:画像におけるBBの高さ

変換 (Reshaping)

# These values are the class probabilities for each grid
# Reshape the probabilities to 7x7x20 (980 total values)
class_probabilities = np.reshape(class_probabilities, (grid_size, grid_size, num_classes))

# These values are how likely each box contains an object
# Reshape the box confidence scores to 7x7x2 (98 total values)
box_confidence_scores = np.reshape(box_confidence_scores, (grid_size, grid_size, anchor_boxes_per_grid_cell))

# These values are the box coordinates for each box
# Reshape the boxes coordinates to 7x7x2x4 (392 total values)
box_coordinates = np.reshape(box_coordinates, (grid_size, grid_size, anchor_boxes_per_grid_cell, num_coordinates))

関数boxes_to_pixel_units に box_coordinates を渡して中身を変換。
BBの位置情報を入力画像を基準とした具体的なピクセル座標に変換する。

# -------------------- Scale the box coordinates to the input image size --------------------
boxes_to_pixel_units(box_coordinates, input_image_width, input_image_height, grid_size)

0で初期化していた box_coordinates に「各画像クラスの確率」と「領域の存在確率」の掛け算を行った値を代入
この値を「BB確率」と呼ぶことにする。

   # -------------------- Calculate class confidence scores --------------------
   # Find the class confidence scores for each grid. 
   # This is done by multiplying the class probabilities by the box confidence scores 
   # Shape of class confidence scores: 7x7x2x20 (1960 values)
   class_confidence_scores = np.zeros((grid_size, grid_size, anchor_boxes_per_grid_cell, num_classes))
   for box_index in range(anchor_boxes_per_grid_cell): # loop over boxes
       for class_index in range(num_classes): # loop over classifications
           class_confidence_scores[:,:,box_index,class_index] = np.multiply(class_probabilities[:,:,class_index], box_confidence_scores[:,:,box_index])

score_threshold_mask は box_coordinates で閾値 probability_threshold (現状0.07)以上のみをTrueとするマスクの配列

# -------------------- Filter object scores/coordinates/indexes >= threshold --------------------
# Find all scores that are larger than or equal to the threshold using a mask.
# Array of 1960 bools: True if >= threshold. otherwise False. 
score_threshold_mask = np.array(class_confidence_scores>=probability_threshold, dtype='bool')

filtered_scores は score_threshold_mask においてTrueのみ（閾値以上のみ）の BB確率配列

# Using the array of bools, filter all scores >= threshold
filtered_scores = class_confidence_scores[score_threshold_mask]

box_threshold_mask は上記のTrueの要素のインデックスのみを集めた配列

# Get tuple of arrays of indexes from the bool array that have a >= score than the threshold
# These tuple of array indexes will help to filter out our box coordinates and class indexes 
# tuple 0 and 1 are the coordinates of the 7x7 grid (values = 0-6)
# tuple 2 is the anchor box index (values = 0-1)
# tuple 3 is the class indexes (labels) (values = 0-19)
box_threshold_mask = np.nonzero(score_threshold_mask)
}

- filtered_box_coordinates は 上記のTrueのみ（閾値以上のみ）のBB座標情報配列~
#codeprettify(){{
# Use those indexes to find the coordinates for box confidence scores >= than the threshold
filtered_box_coordinates = box_coordinates[box_threshold_mask[0], box_threshold_mask[1], box_threshold_mask[2]]

filtered_class_indexes は box_threshold_mask においてクラス確率が最大のインデックス配列

# Use those indexes to find the class indexes that have a score >= threshold 
 filtered_class_indexes = np.argmax(class_confidence_scores, axis=3)[box_threshold_mask[0], box_threshold_mask[1], box_threshold_mask[2]]

BB確率の大きい順に並べる
argsort は filtered_scoresを降順に並べた配列

# -------------------- Sort the filtered scores/coordinates/indexes --------------------
# Sort the indexes from highest score to lowest 
# and then use those indexes to sort box coordinates, scores, class indexes
sort_by_highest_score = np.array(np.argsort(filtered_scores))[::-1]

filtered_box_coordinates は BB確率が大きい順に並べたBB座標情報配列(閾値以下は含まない)
filtered_class_indexes は BB確率が大きい順に並べたクラス確率が最大のインデックス配列(閾値以下は含まない)
filtered_scores は BB確率が大きい順に並べたBB確率配列(閾値以下は含まない)

# Sort the box coordinates, scores, and class indexes to match 
filtered_box_coordinates = filtered_box_coordinates[sort_by_highest_score]
filtered_scores = filtered_scores[sort_by_highest_score]    
filtered_class_indexes = filtered_class_indexes[sort_by_highest_score]

関数get_duplicate_box_mas kに filtered_box_coordinates を入れて、戻り値として duplicate_box_mask を得る
重複しているBBの除去。

# -------------------- Filter out duplicates --------------------
# Get mask for boxes that seem to be the same object by calculating iou (intersection over union)
# these will filter out duplicate objects
duplicate_box_mask = get_duplicate_box_mask(filtered_box_coordinates)

各変数の更新
filtered_box_coordinates, filtered_scores, filtered_class_indexes について重複したBBを除去

# Update the boxes, probabilities and classifications removing duplicates.
filtered_box_coordinates = filtered_box_coordinates[duplicate_box_mask]
filtered_scores = filtered_scores[duplicate_box_mask]
filtered_class_indexes = filtered_class_indexes[duplicate_box_mask]

filtered_results というリストを作成して、以下の内容を追加

# -------------------- Gather the results --------------------
# Set up list and return class labels, coordinates and scores
filtered_results = []
for object_index in range( len( filtered_box_coordinates ) ):
    filtered_results.append([
    labels [ filtered_class_indexes [ object_index ] ], # label of the object
    filtered_box_coordinates [ object_index ] [ 0 ],    # xmin (before image scaling)
    filtered_box_coordinates [ object_index ] [ 1 ],    # ymin (before image scaling)
    filtered_box_coordinates [ object_index ] [ 2 ],    # width (before image scaling)
    filtered_box_coordinates [ object_index ] [ 3 ],    # height (before image scaling)
    filtered_scores [ object_index ]                    # object score
    ])

最後にはこのリストを戻り値としてmain関数へ返す。

labels[filtered_class_indexes[object_index]] : 分類文字列（"car" や "person" など）
filtered_box_coordinates[object_index][0] : BBの中心X座標
filtered_box_coordinates[object_index][1] : BBの中心Y座標
filtered_box_coordinates[object_index][2] : BBの幅
filtered_box_coordinates[object_index][3] : BBの高さ
filtered_scores[object_index] : BB確率各配列は、BB確率が閾値以上で、BBの重複除去処理されていて、BB確率の大きい順に並んでいる。

↑

boxes_to_pixel_units関数 †

# Converts the boxes in box list to pixel units
# assumes box_list is the output from the box output from
# the tiny yolo network and is [grid_size x grid_size x 2 x 4].
def boxes_to_pixel_units(box_list, image_width, image_height, grid_size):

    # number of boxes per grid cell
    boxes_per_cell = 2

    # setup some offset values to map boxes to pixels
    # box_offset will be [[ [0, 0], [1, 1], [2, 2], [3, 3], [4, 4], [5, 5], [6, 6]] ...repeated for 7 ]
    box_offset = np.transpose(np.reshape(np.array([np.arange(grid_size)]*(grid_size*2)),(boxes_per_cell,grid_size, grid_size)),(1,2,0))

    # adjust the box center
    box_list[:,:,:,0] += box_offset
    box_list[:,:,:,1] += np.transpose(box_offset,(1, 0, 2))
    box_list[:,:,:,0:2] = box_list[:,:,:,0:2] / (grid_size * 1.0)

    # adjust the lengths and widths
    box_list[:,:,:,2] = np.multiply(box_list[:,:,:,2], box_list[:,:,:,2])
    box_list[:,:,:,3] = np.multiply(box_list[:,:,:,3], box_list[:,:,:,3])

    #scale the boxes to the image size in pixels
    box_list[:,:,:,0] *= image_width
    box_list[:,:,:,1] *= image_height
    box_list[:,:,:,2] *= image_width
    box_list[:,:,:,3] *= image_height

最初にオフセット用の配列 box_offsetを作る。２個セットになっている理由は、１セルに２つのBBがあるため。
"adjust the box center"ではオフセットを加算することで、中心位置をBB基準から全体入力画像の位置へ変換している。grid_sizeで割っているのは 0.0～1.0 に正規化するため。
"adjust the lengths and widths"では、widthとheightを二乗している。実は元々のwidhtとheightのデータは平方根の値ということ。
"scale the boxes to the image size in pixels"で、中心位置X,Yと幅,高さを正規化された値から、画像のピクセル座標値へ変換している。
関数の戻り値がないが、Pythonで関数の引数でリストを渡した場合は「参照渡し」になる。つまり box_list の値が変化することで、呼び出し元の filter_objects関数の box_coordinatesの値も変化する。

↑

get_duplicate_box_mask関数 †

# creates a mask to remove duplicate objects (boxes) and their related probabilities and classifications
# that should be considered the same object.  This is determined by how similar the boxes are
# based on the intersection-over-union metric.
# box_list is as list of boxes (4 floats for centerX, centerY and Length and Width)
def get_duplicate_box_mask(box_list):
    # The intersection-over-union threshold to use when determining duplicates.
    # objects/boxes found that are over this threshold will be
    # considered the same object
    max_iou = IOU_THRESHOLD

    box_mask = np.ones(len(box_list))

    for i in range(len(box_list)):
        if box_mask[i] == 0: continue
        for j in range(i + 1, len(box_list)):
            if get_intersection_over_union(box_list[i], box_list[j]) > max_iou:
                box_mask[j] = 0.0

    filter_iou_mask = np.array(box_mask > 0.0, dtype='bool')
    return filter_iou_mask

引数 box_list には、BB確率が大きい順に並べたBB座標情報配列(閾値以下は含まない)が入っている。
max_iou はこの関数でのみ使用する閾値。iou は Intersection Over Union の略で、get_intersection_over_union関数で詳細説明する。
各 box_list を２つ選んで、get_intersection_over_union関数に渡し、結果が max_iou 以上の場合は 0.0 にしている。
戻り値として返す filter_iou_mask には、max_iou 未満を満たす配列のみが入る。

重なり具合の閾値 max_iouを 1.00 で実行した場合

pi@raspberrypi:~/ncappzoo/networks/tiny_yolo_v1 $ python3 tiny_yolo_v1_a.py -i dog.jpg --iou 1.0

--- ** tiny-Yolo-v1** Object identification ---
4.5.1-openvino
OpenVINO inference_engine: 2.1.2021.2.0-1877-176bdf51370-releases/2021/2

Running NCS Caffe TinyYolo example...
tiny_yolo_v1_a.py:373: DeprecationWarning: 'inputs' property of IENetwork class is deprecated. To access DataPtrs user need to use 'input_data' property of Inpu tInfoPtr objects which can be accessed by 'input_info' property.
  input_blob = next(iter(net.inputs))

Tiny Yolo v1: Starting application...
   - Plugin:       Myriad
   - IR File:      tiny-yolo-v1_53000.xml
   - Input Shape:  [1, 3, 448, 448]
   - Output Shape: [1, 1470]
   - Labels File:  labels.txt
   - Image File:    dog.jpg
   - Threshold:    0.1
   - Intersection Over Union:    1.0

 Displaying image with objects detected in GUI...
 Click in the GUI window and hit any key to exit.

 Found this many objects in the image: 7
 - object: car is at left: 444, top: 95, right: 614, bottom: 153
 - object: car is at left: 484, top: 103, right: 662, bottom: 161
 - object: bicycle is at left: 105, top: 93, right: 641, bottom: 475
 - object: car is at left: 433, top: 85, right: 549, bottom: 139
 - object: bicycle is at left: 110, top: 84, right: 464, bottom: 500
 - object: car is at left: 525, top: 94, right: 627, bottom: 150
 - object: dog is at left: 100, top: 219, right: 314, bottom: 529

 Finished.

1.0だと全ての重なりを許可したのに等しく、get_duplicate_box_mask関数を行っていないことと同等になる。

↑

get_intersection_over_union関数 †

# Evaluate the intersection-over-union for two boxes
# The intersection-over-union metric determines how close
# two boxes are to being the same box.  The closer the boxes
# are to being the same, the closer the metric will be to 1.0
# box_1 and box_2 are arrays of 4 numbers which are the (x, y)
# points that define the center of the box and the length and width of
# the box.
# Returns the intersection-over-union (between 0.0 and 1.0)
# for the two boxes specified.
def get_intersection_over_union(box_1, box_2):

    # one diminsion of the intersecting box
    intersection_dim_1 = min(box_1[0]+0.5*box_1[2],box_2[0]+0.5*box_2[2])-\
                         max(box_1[0]-0.5*box_1[2],box_2[0]-0.5*box_2[2])

    # the other dimension of the intersecting box
    intersection_dim_2 = min(box_1[1]+0.5*box_1[3],box_2[1]+0.5*box_2[3])-\
                         max(box_1[1]-0.5*box_1[3],box_2[1]-0.5*box_2[3])

    if intersection_dim_1 < 0 or intersection_dim_2 < 0 :
        # no intersection area
        intersection_area = 0
    else :
        # intersection area is product of intersection dimensions
        intersection_area =  intersection_dim_1*intersection_dim_2

    # calculate the union area which is the area of each box added
    # and then we need to subtract out the intersection area since
    # it is counted twice (by definition it is in each box)
    union_area = box_1[2]*box_1[3] + box_2[2]*box_2[3] - intersection_area;

    # now we can return the intersection over union
    iou = intersection_area / union_area

    return iou

変数intersection_dim_1についての詳細
box_1[0]+0.5*box_1[2] はボックス1の中心X座標 + 0.5 × ボックス1の幅。下図 a のX座標に相当する。
他も同様に、b, c, d のX座標に相当していることが分かる。

a, b, c, d を用いると intersection_dim_1 は以下のように書ける。
```
intersection_dim_1 = min(a, b) - max(c - d)
```
minは小さい方、maxは大きい方を返すので、最終的には a - d。
intersection_dim_1 は、２つのボックスが重なった領域（黄色い領域）の幅のこと。
intersection_dim_2 は、２つのボックスが重なった領域（黄色い領域）の高さ。
途中のif文は、重なり領域が無い場合は変数 intersection_area に0を、重なり領域がある場合は変数 intersection_area にその面積を代入している。
変数 union_area はボックス１の面積とボックス２の面積の和から重なり領域の面積を引いた面積。
iouとは、２つのボックスが重なっている割合を数値化したもので、数値が大きいほど重なり度合いが高いという意味になる。
最後に変数iouを返す。

↑

display_objects_in_gui関数 †

入力画像 source_image のコピーを display_image に代入する。
source_image_width は入力画像の幅、source_image_height は入力画像の高さ。
x_ratio, y_rationは推論時の画像サイズとの比。

# Displays a gui window with an image that contains
# boxes and lables for found objects.  will not return until
# user presses a key.
# source_image is the original image for the inference before it was resized or otherwise changed.
# filtered_objects is a list of lists (as returned from filter_objects()
# each of the inner lists represent one found object and contain
# the following 6 values:
#    string that is network classification ie 'cat', or 'chair' etc
#    float value for box center X pixel location within source image
#    float value for box center Y pixel location within source image
#    float value for box width in pixels within source image
#    float value for box height in pixels within source image
#    float value that is the probability for the network classification.
def display_objects_in_gui(source_image, filtered_objects, network_input_w, network_input_h):
    # copy image so we can draw on it. Could just draw directly on source image if not concerned about that.
    display_image = source_image.copy()
    source_image_width = source_image.shape[1]
    source_image_height = source_image.shape[0]

    x_ratio = float(source_image_width) / network_input_w
    y_ratio = float(source_image_height) / network_input_h

入力画像の上にバウンディングボックスを描画

    # loop through each box and draw it on the image along with a classification label
    print('\n Found this many objects in the image: ' + str(len(filtered_objects)))
    for obj_index in range(len(filtered_objects)):
        center_x = int(filtered_objects[obj_index][1] * x_ratio) 
        center_y = int(filtered_objects[obj_index][2] * y_ratio)
        half_width = int(filtered_objects[obj_index][3] * x_ratio)//2
        half_height = int(filtered_objects[obj_index][4] * y_ratio)//2

        # calculate box (left, top) and (right, bottom) coordinates
        box_left = max(center_x - half_width, 0)
        box_top = max(center_y - half_height, 0)
        box_right = min(center_x + half_width, source_image_width)
        box_bottom = min(center_y + half_height, source_image_height)

        print(' - object: ' + YELLOW + str(filtered_objects[obj_index][0]) + NOCOLOR + ' is at left: ' + str(box_left) + ', top: ' + str(box_top) + ', right: ' + str(box_right) + ', bottom: ' + str(box_bottom))  

        #draw the rectangle on the image.  This is hopefully around the object
        box_color = (0, 255, 0)  # green box
        box_thickness = 2
        cv2.rectangle(display_image, (box_left, box_top),(box_right, box_bottom), box_color, box_thickness)

バウンディングボックスの上に、クラスとBB確率のラベルを描画
ラベルは、塗りつぶしの矩形領域を描画後に、テキストを描画してる。

        # draw the classification label string just above and to the left of the rectangle
        label_background_color = (70, 120, 70) # greyish green background for text
        label_text_color = (255, 255, 255)   # white text
        cv2.rectangle(display_image,(box_left, box_top+20),(box_right,box_top), label_background_color, -1)
        cv2.putText(display_image,filtered_objects[obj_index][0] + ' : %.2f' % filtered_objects[obj_index][5], (box_left+5, box_top+15), cv2.FONT_HERSHEY_SIMPLEX, 0.5, label_text_color, 1)

ウィンドウに画像を表示して、キーが押されたら終了

    window_name = 'TinyYolo (hit key to exit)'
    cv2.imshow(window_name, display_image)
    cv2.moveWindow(window_name, 10, 10)
    
    while (True):
        raw_key = cv2.waitKey(1)

        # check if the window is visible, this means the user hasn't closed
        # the window via the X button (may only work with opencv 3.x
        prop_val = cv2.getWindowProperty(window_name, cv2.WND_PROP_ASPECT_RATIO)
        if ((raw_key != -1) or (prop_val < 0.0)):
            # the user hit a key or closed the window (in that order)
            break

↑

カメラ・動画ファイルによる物体検出アプリケーション †

　前節の結果、何となくソースコードが解りかけてきたのでカメラと動画ファイルに対応してみる。

↑

プログラムの実行 (静止画)「tiny_yolo_v1_a.py」 †

pi@raspberrypi:~/ncappzoo/networks/tiny_yolo_v1 $ python3 tiny_yolo_v1_a.py

--- ** tiny_Yolo_v1_a ** Object identification ---
4.5.1-openvino
OpenVINO inference_engine: 2.1.2021.2.0-1877-176bdf51370-releases/2021/2

Running NCS Caffe TinyYolo example...
tiny_yolo_v1_a.py:366: DeprecationWarning: 'inputs' property of IENetwork class is deprecated. To access DataPtrs user need to use 'input_data' property of InputInfoPtr objects which can be accessed by 'input_info' property.
  input_blob = next(iter(net.inputs))

Tiny Yolo v1: Starting application...
   - Plugin:       Myriad
   - IR File:      tiny-yolo-v1_53000.xml
   - Input Shape:  [1, 3, 448, 448]
   - Output Shape: [1, 1470]
   - Labels File:  labels_jp.txt
   - Image File:    ../../data/images/nps_chair.png
   - Threshold:    0.1
   - Intersection Over Union:    0.25

 Displaying image with objects detected in GUI...
 Click in the GUI window and hit any key to exit.

 Found this many objects in the image: 1
 - object: chair is at left: 221, top: 176, right: 601, bottom: 688

 Finished.

コマンドオプションを付加しての実行例

pi@raspberrypi:~/ncappzoo/networks/tiny_yolo_v1 $ python3 tiny_yolo_v1_a.py -i ./dog.jpg
pi@raspberrypi:~/ncappzoo/networks/tiny_yolo_v1 $ python3 tiny_yolo_v1_a.py -i ./person.jpg
pi@raspberrypi:~/ncappzoo/networks/tiny_yolo_v1 $ python3 tiny_yolo_v1_a.py -i ./dog.jpg --threshold 0.001

↑

プログラムの実行 (カメラ・動画ファイル)「tiny_yolo_v1_b.py」 †

pi@raspberrypi-mas:~/ncappzoo/networks/tiny_yolo_v1 $ python3 tiny_yolo_v1_b.py

--- ** tiny_Yolo_v1_b ** Object identification ---
4.5.1-openvino
OpenVINO inference_engine: 2.1.2021.2.0-1877-176bdf51370-releases/2021/2

Running NCS Caffe TinyYolo example...
tiny_yolo_v1_b.py:368: DeprecationWarning: 'inputs' property of IENetwork class is deprecated. To access DataPtrs user need to use 'input_data' property of InputInfoPtr objects which can be accessed by 'input_info' property.
  input_blob = next(iter(net.inputs))

Tiny Yolo v1: Starting application...
   - Plugin:       Myriad
   - IR File:      tiny-yolo-v1_53000.xml
   - Input Shape:  [1, 3, 448, 448]
   - Output Shape: [1, 1470]
   - Labels File:  labels_jp.txt
   - Input File:    0
   - Threshold:    0.1
   - Intersection Over Union:    0.25

 Finished.