RevYOLOv5

【復習】物体検出アルゴリズム「YOLO V5」 †

「PyTorch ではじめる AI開発」Chapter04 で使用する「YOLO V5」について復習する。
以前の作成ページ物体検出アルゴリズム「YOLO V5」を全面改定する

▲　目　次

【復習】物体検出アルゴリズム「YOLO V5」
参考資料

※ 最終更新:2024/04/16　

↑

Official YOLOv5 考察１推論/モデル変換編 †

下記のプロジェクト・パッケージをダウンロード
update_20240405.zip (60.7MB) <アップデートファイル>
解凍してできた「workspace_pylearn/」内のディレクトリは下記の「git clone」コマンド実行後に作成されたディレクトリ「yolov5」「yolov5_demo」にそれぞ入れコピーする

↑

物体検出とは †

物体検出とは画像の中から「犬」や「自転車」といった特定のオブジェクトを検出する技術。
物体検知モデルは画像を入力として Bounding Boxという物体を囲む矩形とそれに対応するクラスラベルを出力する。
物体検知は画像処理技術の中では基本的なタスクの一つで、物体追跡や姿勢検知など様々な応用タスクの土台となる技術である。
近年では Yolo、Faster-RCNN、SSD、RetinaNet、CenterNet等様々な手法が提案されており、多くの研究者が高精度で高速な物体検知モデルを発表している。
物体検知の精度としては既に実用に足る水準に達しつつあり、実際、画像処理技術を応用したソリューションが次々と発表されている。
引用 → https://www.ariseanalytics.com/activities/report/20210521/ より
参考 → 画像認識 (Image Recognition) とは

↑

YOLO について †

　『YOLO』とは "You only live once”「人生一度きり」を引用した "You Only Look Once"「見るのは一度きり」が名の由来。

リアルタイム画像認識を行うアルゴリズムで Darknet というフレームワークを使用して実装している。
FCN というネットワークを使用しているが、これは darknet 以外の機械学習フレームワークでも実現可能であり、すでに Yolo の Tensorflow版や PyTorch版などがを実装されている。
「人間のように一目見ただけで物体検出ができることが強み」だそう。

モデルが学習しているラベルファイル（プロジェクト・パッケージ「update_20240405.zip」に同梱）
- 80 クラスの COCO* データセットで学習されている

サイトから coco.names をダウンロードする
→ https://raw.githubusercontent.com/pjreddie/darknet/master/data/coco.names
coco.names をテキストエディタで開き、最終行のスペースだけになっている 81行目を削除して上書き保存
coco.names を翻訳して coco.names_jp を作成

ラベル・インデックス一覧

ID	coco.names	coco.names_jp	ID	coco.names	coco.names_jp
0	person	人	40	wine glass	ワイングラス
1	bicycle	自転車	41	cup	カップ
2	car	車	42	fork	フォーク
3	motorbike	バイク	43	knife	ナイフ
4	aeroplane	飛行機	44	spoon	スプーン
5	bus	バス	45	bowl	丼鉢
6	train	列車	46	banana	バナナ
7	truck	トラック	47	apple	リンゴ
8	boat	ボート	48	sandwich	サンドイッチ
9	traffic light	信号機	49	orange	オレンジ
10	fire hydrant	消火栓	50	broccoli	ブロッコリー
11	stop sign	一時停止標識	51	carrot	人参
12	parking meter	パーキングメーター	52	hot dog	ホットドッグ
13	bench	ベンチ	53	pizza	ピザ
14	bird	鳥	54	donut	ドーナッツ
15	cat	猫	55	cake	ケーキ
16	dog	犬	56	chair	椅子
17	horse	馬	57	sofa	ソファー
18	sheep	羊	58	pottedplant	鉢植え
19	cow	牛	59	bed	ベッド
20	elephant	象	60	diningtable	ダイニングテーブル
21	bear	熊	61	toilet	トイレ
22	zebra	シマウマ	62	tvmonitor	テレビ
23	giraffe	キリン	63	laptop	ラップトップコンピューター
24	backpack	バックパック	64	mouse	マウス
25	umbrella	傘	65	remote	リモコン
26	handbag	ハンドバック	66	keyboard	キーボード
27	tie	ネクタイ	67	cell phone	携帯電話
28	suitcase	スーツケース	68	microwave	電子レンジ
29	frisbee	フリスビー	69	oven	オーブン
30	skis	スキー板	70	toaster	トースター
31	snowboard	スノーボード	71	sink	キッチン・シンク
32	sports ball	スポーツボール	72	refrigerator	冷蔵庫
33	kite	凧	73	book	本
34	baseball bat	野球のバット	74	clock	時計
35	baseball glove	野球のグローブ	75	vase	花瓶
36	skateboard	スケートボード	76	scissors	ハサミ
37	surfboard	サーフボード	77	teddy bear	テディベア
38	tennis racket	テニスラケット	78	hair drier	ヘアドライヤー
39	bottle	瓶	79	toothbrush	歯ブラシ

↑

YOLOv5 をローカルマシンにインストール †

仮想環境「py_learn」をアクティブにする
```
(base) conda activate py_learn
```
プロジェクトの実行ディレクトリに切り替える

　Windows の場合　
```
(py_learn) PS > cd /anaconda_win/workspace_pylearn
```
　Linux の場合　
```
(py_learn) $ cd ~/workspace_pylearn
```
次のコマンドでサイト https://github.com/ultralytics/yolov5 から「YOLOv5」をインストール
```
(py_learn) git clone https://github.com/ultralytics/yolov5
```
・パッケージ構成ファイル「requirements.txt」は使わず現在の環境で不足パッケージのみインストールする
・プロジェクトのディレクトリ
```
c:\anaconda_win\workspace_pylearn\     ← Windows の場合
~/workspace_pylearn/                   ← Linux   の場合
  ├ chapter01
  ├ chapter02 
  ├ forest-path-movie-dataset
  ├ sample
  │    :
  └ yolov5
```
冒頭の update_20240405.zip を解凍してできた「workspace_pylearn/yolov5」を「git clone」でできた「yolov5」にコピーする

↑

YOLOv5 推論プログラムの実行 †

プロジェクトの実行ディレクトリ「workspace_pylearn/yolov5/」

カメラ画像を推論する

(py_learn) cd yolov5
(py_learn) python detect.py --source 0

・実行結果

(py_learn) python detect.py --source 0
Traceback (most recent call last):
  File "C:\anaconda_win\workspace_pylearn\yolov5\detect.py", line 46, in <module>
    from ultralytics.utils.plotting import Annotator, colors, save_one_box
ModuleNotFoundError: No module named 'ultralytics'

パッケージ「ultralytics」が無いようなのでインストール
```
(py_learn) pip install ultralytics
```

もう一度カメラ画像で実行
・終了はターミナル画面で 'Ctrl' + 'c' を押す

(py_learn) python detect.py --source 0

・実行結果

(py_learn) python detect.py --source 0
detect: weights=yolov5s.pt, source=0, data=data\coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_csv=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5  v7.0-294-gdb125a20 Python-3.11.7 torch-2.2.0+cu121 CUDA:0 (NVIDIA GeForce RTX 4070 Ti, 12282MiB)

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
1/1: 0...  Success (inf frames 640x480 at 30.00 FPS)

0: 480x640 1 person, 1 chair, 198.1ms
0: 480x640 1 person, 8.0ms
0: 480x640 1 person, 1 chair, 5.0ms
0: 480x640 1 person, 1 chair, 4.0ms
    :
    :
0: 480x640 1 person, 2 chairs, 6.0ms
0: 480x640 1 person, 1 chair, 16.0ms
Traceback (most recent call last):
    :
    :
KeyboardInterrupt

↑

実行プログラムの修正「detect.py」→「detect2.py」 †

入力ソースをカメラ('0') に指定したとき、終了する手段がないので正常に実行結果を保存できない。
'Esc'キー入力で終了できるように変更する。
（修正済みプログラムをプロジェクト・パッケージ「update_20240405.zip」に同梱）

## Official YOLOv5 https://github.com/ultralytics/yolov5
##
## detect2.py        (original: detect.py)
##  ver 0.01    2024.03.12      'Esc' key Break
        :

        :
    # Run inference
    model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz))                    # warmup
    seen, windows, dt = 0, [], (Profile(device=device), Profile(device=device), Profile(device=device))

    break_flag = False                                      # 'Esc' key Break     2024/03/12

    for path, im, im0s, vid_cap, s in dataset:

        if break_flag:                                      # 'Esc' key Break     2024/03/12
            break

        with dt[0]:
        :

        :
            # Stream results
            im0 = annotator.result()
            if view_img:
#                if platform.system() == "Linux" and p not in windows:
#                    windows.append(p)
#                    cv2.namedWindow(str(p), cv2.WINDOW_NORMAL | cv2.WINDOW_KEEPRATIO)  # allow window resize (Linux)
#                    cv2.resizeWindow(str(p), im0.shape[1], im0.shape[0])
                cv2.namedWindow(str(p), flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) # 2024/03/12
                cv2.imshow(str(p), im0)
#                cv2.waitKey(1)                             # 1 millisecond

                ## 'Esc' key Break    2023/06/18
                c = cv2.waitKey(1)                          # 1 millisecond
                if c == 27: 
                    break_flag = True
                    break 

            # Save results (image with detections)
        :

推論実行結果の実行中ログをターミナルに出力しないようにする

        # Print time (inference-only)
        # 途中表示なし 2024/03/12
#        LOGGER.info(f"{s}{'' if len(det) else '(no detections), '}{dt[1].dt * 1E3:.1f}ms")

修正済みソースコード

▼「detect2.py」

# -*- coding: utf-8 -*-
##------------------------------------------
##  Object detection YOLO V5      Ver 0.01
##    Inference program
##
##               2024.03.12 Masahiro Izutsu
##------------------------------------------
## Official YOLOv5 https://github.com/ultralytics/yolov5
##
## detect2.py        (original: detect.py)
##  ver 0.01    2024.03.12      'Esc' key Break

# YOLOv5 陜}~ by Ultralytics, AGPL-3.0 license
"""
Run YOLOv5 detection inference on images, videos, directories, globs, YouTube, webcam, streams, etc.

Usage - sources:
    $ python detect.py --weights yolov5s.pt --source 0                               # webcam
                                                     img.jpg                         # image
                                                     vid.mp4                         # video
                                                     screen                          # screenshot
                                                     path/                           # directory
                                                     list.txt                        # list of images
                                                     list.streams                    # list of streams
                                                     'path/*.jpg'                    # glob
                                                     'https://youtu.be/LNwODJXcvt4'  # YouTube
                                                     'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP stream

Usage - formats:
    $ python detect.py --weights yolov5s.pt                 # PyTorch
                                 yolov5s.torchscript        # TorchScript
                                 yolov5s.onnx               # ONNX Runtime or OpenCV DNN with --dnn
                                 yolov5s_openvino_model     # OpenVINO
                                 yolov5s.engine             # TensorRT
                                 yolov5s.mlmodel            # CoreML (macOS-only)
                                 yolov5s_saved_model        # TensorFlow SavedModel
                                 yolov5s.pb                 # TensorFlow GraphDef
                                 yolov5s.tflite             # TensorFlow Lite
                                 yolov5s_edgetpu.tflite     # TensorFlow Edge TPU
                                 yolov5s_paddle_model       # PaddlePaddle
"""

import argparse
import csv
import os
import platform
import sys
from pathlib import Path

import torch

FILE = Path(__file__).resolve()
ROOT = FILE.parents[0]                                      # YOLOv5 root directory
if str(ROOT) not in sys.path:
    sys.path.append(str(ROOT))                              # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd()))              # relative

from ultralytics.utils.plotting import Annotator, colors, save_one_box

from models.common import DetectMultiBackend
from utils.dataloaders import IMG_FORMATS, VID_FORMATS, LoadImages, LoadScreenshots, LoadStreams
from utils.general import (
    LOGGER,
    Profile,
    check_file,
    check_img_size,
    check_imshow,
    check_requirements,
    colorstr,
    cv2,
    increment_path,
    non_max_suppression,
    print_args,
    scale_boxes,
    strip_optimizer,
    xyxy2xywh,
)
from utils.torch_utils import select_device, smart_inference_mode


@smart_inference_mode()
def run(
    weights=ROOT / "yolov5s.pt",                            # model path or triton URL
    source=ROOT / "data/images",                            # file/dir/URL/glob/screen/0(webcam)
    data=ROOT / "data/coco128.yaml",                        # dataset.yaml path
    imgsz=(640, 640),                                       # inference size (height, width)
    conf_thres=0.25,                                        # confidence threshold
    iou_thres=0.45,                                         # NMS IOU threshold
    max_det=1000,                                           # maximum detections per image
    device="",                                              # cuda device, i.e. 0 or 0,1,2,3 or cpu
    view_img=False,                                         # show results
    save_txt=False,                                         # save results to *.txt
    save_csv=False,                                         # save results in CSV format
    save_conf=False,                                        # save confidences in --save-txt labels
    save_crop=False,                                        # save cropped prediction boxes
    nosave=False,                                           # do not save images/videos
    classes=None,                                           # filter by class: --class 0, or --class 0 2 3
    agnostic_nms=False,                                     # class-agnostic NMS
    augment=False,                                          # augmented inference
    visualize=False,                                        # visualize features
    update=False,                                           # update all models
    project=ROOT / "runs/detect",                           # save results to project/name
    name="exp",                                             # save results to project/name
    exist_ok=False,                                         # existing project/name ok, do not increment
    line_thickness=3,                                       # bounding box thickness (pixels)
    hide_labels=False,                                      # hide labels
    hide_conf=False,                                        # hide confidences
    half=False,                                             # use FP16 half-precision inference
    dnn=False,                                              # use OpenCV DNN for ONNX inference
    vid_stride=1,                                           # video frame-rate stride
):
    source = str(source)
    save_img = not nosave and not source.endswith(".txt")   # save inference images
    is_file = Path(source).suffix[1:] in (IMG_FORMATS + VID_FORMATS)
    is_url = source.lower().startswith(("rtsp://", "rtmp://", "http://", "https://"))
    webcam = source.isnumeric() or source.endswith(".streams") or (is_url and not is_file)
    screenshot = source.lower().startswith("screen")
    if is_url and is_file:
        source = check_file(source)                         # download

    # Directories
    save_dir = increment_path(Path(project) / name, exist_ok=exist_ok)                  # increment run
    (save_dir / "labels" if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir

    # Load model
    device = select_device(device)
    model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)
    stride, names, pt = model.stride, model.names, model.pt
    imgsz = check_img_size(imgsz, s=stride)                 # check image size

    # Dataloader
    bs = 1  # batch_size
    if webcam:
        view_img = check_imshow(warn=True)
        dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
        bs = len(dataset)
    elif screenshot:
        dataset = LoadScreenshots(source, img_size=imgsz, stride=stride, auto=pt)
    else:
        dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)
    vid_path, vid_writer = [None] * bs, [None] * bs

    # Run inference
    model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz))                    # warmup
    seen, windows, dt = 0, [], (Profile(device=device), Profile(device=device), Profile(device=device))

    break_flag = False                                      # 'Esc' key Break     2024/03/12

    for path, im, im0s, vid_cap, s in dataset:

        if break_flag:                                      # 'Esc' key Break     2024/03/12
            break

        with dt[0]:
            im = torch.from_numpy(im).to(model.device)
            im = im.half() if model.fp16 else im.float()    # uint8 to fp16/32
            im /= 255  # 0 - 255 to 0.0 - 1.0
            if len(im.shape) == 3:
                im = im[None]  # expand for batch dim
            if model.xml and im.shape[0] > 1:
                ims = torch.chunk(im, im.shape[0], 0)

        # Inference
        with dt[1]:
            visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False
            if model.xml and im.shape[0] > 1:
                pred = None
                for image in ims:
                    if pred is None:
                        pred = model(image, augment=augment, visualize=visualize).unsqueeze(0)
                    else:
                        pred = torch.cat((pred, model(image, augment=augment, visualize=visualize).unsqueeze(0)), dim=0)
                pred = [pred, None]
            else:
                pred = model(im, augment=augment, visualize=visualize)
        # NMS
        with dt[2]:
            pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)

        # Second-stage classifier (optional)
        # pred = utils.general.apply_classifier(pred, classifier_model, im, im0s)

        # Define the path for the CSV file
        csv_path = save_dir / "predictions.csv"

        # Create or append to the CSV file
        def write_to_csv(image_name, prediction, confidence):
            """Writes prediction data for an image to a CSV file, appending if the file exists."""
            data = {"Image Name": image_name, "Prediction": prediction, "Confidence": confidence}
            with open(csv_path, mode="a", newline="") as f:
                writer = csv.DictWriter(f, fieldnames=data.keys())
                if not csv_path.is_file():
                    writer.writeheader()
                writer.writerow(data)

        # Process predictions
        for i, det in enumerate(pred):                      # per image
            seen += 1
            if webcam:                                      # batch_size >= 1
                p, im0, frame = path[i], im0s[i].copy(), dataset.count
                s += f"{i}: "
            else:
                p, im0, frame = path, im0s.copy(), getattr(dataset, "frame", 0)

            p = Path(p)                                     # to Path
            save_path = str(save_dir / p.name)              # im.jpg
            txt_path = str(save_dir / "labels" / p.stem) + ("" if dataset.mode == "image" else f"_{frame}")  # im.txt
            s += "%gx%g " % im.shape[2:]                    # print string
            gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]      # normalization gain whwh
            imc = im0.copy() if save_crop else im0          # for save_crop
            annotator = Annotator(im0, line_width=line_thickness, example=str(names))
            if len(det):
                # Rescale boxes from img_size to im0 size
                det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()

                # Print results
                for c in det[:, 5].unique():
                    n = (det[:, 5] == c).sum()              # detections per class
                    s += f"{n} {names[int(c)]}{'s' * (n > 1)}, "  # add to string

                # Write results
                for *xyxy, conf, cls in reversed(det):
                    c = int(cls)                            # integer class
                    label = names[c] if hide_conf else f"{names[c]}"
                    confidence = float(conf)
                    confidence_str = f"{confidence:.2f}"

                    if save_csv:
                        write_to_csv(p.name, label, confidence_str)

                    if save_txt:                            # Write to file
                        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                        line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # label format
                        with open(f"{txt_path}.txt", "a") as f:
                            f.write(("%g " * len(line)).rstrip() % line + "\n")

                    if save_img or save_crop or view_img:   # Add bbox to image
                        c = int(cls)                        # integer class
                        label = None if hide_labels else (names[c] if hide_conf else f"{names[c]} {conf:.2f}")
                        annotator.box_label(xyxy, label, color=colors(c, True))
                    if save_crop:
                        save_one_box(xyxy, imc, file=save_dir / "crops" / names[c] / f"{p.stem}.jpg", BGR=True)

            # Stream results
            im0 = annotator.result()
            if view_img:
#                if platform.system() == "Linux" and p not in windows:
#                    windows.append(p)
#                    cv2.namedWindow(str(p), cv2.WINDOW_NORMAL | cv2.WINDOW_KEEPRATIO)  # allow window resize (Linux)
#                    cv2.resizeWindow(str(p), im0.shape[1], im0.shape[0])
                cv2.namedWindow(str(p), flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) # 2024/03/12
                cv2.imshow(str(p), im0)
#                cv2.waitKey(1)                             # 1 millisecond

                ## 'Esc' key Break    2023/06/18
                c = cv2.waitKey(1)                          # 1 millisecond
                if c == 27: 
                    break_flag = True
                    break 


            # Save results (image with detections)
            if save_img:
                if dataset.mode == "image":
                    cv2.imwrite(save_path, im0)
                else:                                       # 'video' or 'stream'
                    if vid_path[i] != save_path:            # new video
                        vid_path[i] = save_path
                        if isinstance(vid_writer[i], cv2.VideoWriter):
                            vid_writer[i].release()         # release previous video writer
                        if vid_cap:                         # video
                            fps = vid_cap.get(cv2.CAP_PROP_FPS)
                            w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                            h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                        else:                               # stream
                            fps, w, h = 30, im0.shape[1], im0.shape[0]
                        save_path = str(Path(save_path).with_suffix(".mp4"))  # force *.mp4 suffix on results videos
                        vid_writer[i] = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))
                    vid_writer[i].write(im0)

        # Print time (inference-only)
        # 途中表示なし 2024/03/12
#        LOGGER.info(f"{s}{'' if len(det) else '(no detections), '}{dt[1].dt * 1E3:.1f}ms")

    # Print results
    t = tuple(x.t / seen * 1e3 for x in dt)                 # speeds per image
    LOGGER.info(f"Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {(1, 3, *imgsz)}" % t)
    if save_txt or save_img:
        s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ""
        LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}{s}")
    if update:
        strip_optimizer(weights[0])                         # update model (to fix SourceChangeWarning)


def parse_opt():
    """Parses command-line arguments for YOLOv5 detection, setting inference options and model configurations."""
    parser = argparse.ArgumentParser()
    parser.add_argument("--weights", nargs="+", type=str, default=ROOT / "yolov5s.pt", help="model path or triton URL")
    parser.add_argument("--source", type=str, default=ROOT / "data/images", help="file/dir/URL/glob/screen/0(webcam)")
    parser.add_argument("--data", type=str, default=ROOT / "data/coco128.yaml", help="(optional) dataset.yaml path")
    parser.add_argument("--imgsz", "--img", "--img-size", nargs="+", type=int, default=[640], help="inference size h,w")
    parser.add_argument("--conf-thres", type=float, default=0.25, help="confidence threshold")
    parser.add_argument("--iou-thres", type=float, default=0.45, help="NMS IoU threshold")
    parser.add_argument("--max-det", type=int, default=1000, help="maximum detections per image")
    parser.add_argument("--device", default="", help="cuda device, i.e. 0 or 0,1,2,3 or cpu")
    parser.add_argument("--view-img", action="store_true", help="show results")
    parser.add_argument("--save-txt", action="store_true", help="save results to *.txt")
    parser.add_argument("--save-csv", action="store_true", help="save results in CSV format")
    parser.add_argument("--save-conf", action="store_true", help="save confidences in --save-txt labels")
    parser.add_argument("--save-crop", action="store_true", help="save cropped prediction boxes")
    parser.add_argument("--nosave", action="store_true", help="do not save images/videos")
    parser.add_argument("--classes", nargs="+", type=int, help="filter by class: --classes 0, or --classes 0 2 3")
    parser.add_argument("--agnostic-nms", action="store_true", help="class-agnostic NMS")
    parser.add_argument("--augment", action="store_true", help="augmented inference")
    parser.add_argument("--visualize", action="store_true", help="visualize features")
    parser.add_argument("--update", action="store_true", help="update all models")
    parser.add_argument("--project", default=ROOT / "runs/detect", help="save results to project/name")
    parser.add_argument("--name", default="exp", help="save results to project/name")
    parser.add_argument("--exist-ok", action="store_true", help="existing project/name ok, do not increment")
    parser.add_argument("--line-thickness", default=3, type=int, help="bounding box thickness (pixels)")
    parser.add_argument("--hide-labels", default=False, action="store_true", help="hide labels")
    parser.add_argument("--hide-conf", default=False, action="store_true", help="hide confidences")
    parser.add_argument("--half", action="store_true", help="use FP16 half-precision inference")
    parser.add_argument("--dnn", action="store_true", help="use OpenCV DNN for ONNX inference")
    parser.add_argument("--vid-stride", type=int, default=1, help="video frame-rate stride")
    opt = parser.parse_args()
    opt.imgsz *= 2 if len(opt.imgsz) == 1 else 1  # expand
    print_args(vars(opt))
    return opt


def main(opt):
    """Executes YOLOv5 model inference with given options, checking requirements before running the model."""
    check_requirements(ROOT / "requirements.txt", exclude=("tensorboard", "thop"))
    run(**vars(opt))


if __name__ == "__main__":
    opt = parse_opt()
    main(opt)

↑

推論プログラム「detect2.py」の実行 †

実行ディレクトリは「workspace_pylearn/yolov5/」
実行結果は「yolov5/runs/detect/exp(2・3・4 …)」ディレクトリに保存
「exp*」ディレクトリは実行のたびに更新される

カメラ画像入力（'Esc'キー入力で終了）

(py_learn) python detect2.py --source 0

・実行結果

(py_learn) python detect2.py --source 0
detect2: weights=yolov5s.pt, source=0, data=data\coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_csv=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5  v7.0-294-gdb125a20 Python-3.11.7 torch-2.2.0+cu121 CUDA:0 (NVIDIA GeForce RTX 4070 Ti, 12282MiB)

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
1/1: 0...  Success (inf frames 640x480 at 30.00 FPS)

Speed: 0.4ms pre-process, 8.9ms inference, 3.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs\detect\exp10

静止画サンプル画像入力

(py_learn) python detect2.py

・実行結果

(py_learn) python detect2.py
detect: weights=yolov5s.pt, source=data\images, data=data\coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_csv=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5  v7.0-294-gdb125a20 Python-3.11.7 torch-2.2.0+cu121 CUDA:0 (NVIDIA GeForce RTX 4070 Ti, 12282MiB)

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
image 1/2 C:\anaconda_win\workspace_pylearn\yolov5\data\images\bus.jpg: 640x480 4 persons, 1 bus, 48.9ms
image 2/2 C:\anaconda_win\workspace_pylearn\yolov5\data\images\zidane.jpg: 384x640 2 persons, 2 ties, 52.8ms
Speed: 0.0ms pre-process, 50.8ms inference, 74.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs\detect\exp3

「detect2.py」実行時のコマンドパラメータ１
・-- source <入力ソース名>

入力ソース名	種類
0	webcam(0,1,...)
img.jpg	image
vid.mp4	video
screen	screenshot
path/	directory
list.txt	list of images
list.streams	list of streams
'path/*.jpg'	glob
'ｈttps://youtu.be/LNwODJXcvt4'	YouTube
'rtsp://example.com/media.mp4'	RTSP, RTMP, HTTP stream

・-- weights <学習モデル名>

学習モデル名	種類
yolov5s.pt	PyTorch
yolov5s.torchscript	TorchScript
yolov5s.onnx	ONNX Runtime or OpenCV DNN with --dnn
yolov5s_openvino_model	OpenVINO
yolov5s.engine	TensorRT
yolov5s.mlmodel	CoreML (macOS-only)
yolov5s_saved_model	TensorFlow SavedModel
yolov5s.pb	TensorFlow GraphDef
yolov5s.tflite	TensorFlow Lite
yolov5s_edgetpu.tflite	TensorFlow Edge TPU
yolov5s_paddle_model	PaddlePaddle

「detect2.py」実行時のコマンドパラメータ２（詳細）

コマンドオプション	引数	初期値	意味
--weights	str	yolov7s.pt	学習済み重みモデルファイル
--source	str	data/images	推論対象の画像ソース(file/folder) のパス（0,1,... = Webカメラ）
--imgsz	int	(640, 480)	推論対象の画像のサイズ(pixel)
--conf-thres	0.25	float	クラス判定の閾値 (数値が小さい程オブジェクトは増えるが、ノイズも増える
--iou-thres	0.45	float	iou は Intersection Over Union (検出領域が重なっている割合、数値が大きいほど重なり度合いが高い)
--max_det	int	1000	maximum detections per image
--device	str		使用プロセッサの指定(0 or 0,1,2,3 or cpu) (指定なしの場合 cuda)
--view-img	なし	False	推論結果の表示 (指定すれば表示する)
--save-txt	なし	False	推論結果(検出座標と予測クラス)をテキストファイルで残す (*.txt)
--save-conf	なし	False	推論結果(クラスの確率)をテキストファイルで残す (*.txt)
--save_crop	なし	False	save cropped prediction boxes
--nosave	なし	False	推論結果の記録 (指定すれば残さない)
--classes	str	None	クラスフィルタ(--class 0, or --class 0 2 3)
--agnostic-nms	なし	False	class-agnostic NMS
--augment	なし	False	拡張推論
--visualize	なし	False	visualize features
--update	なし	False	モデルをアップデートする
--project	str	runs/detect	推論結果の記録フォルダパス
--name	str	exp	推論結果の記録フォルダの下のフォルダ名(推論ごとにインクリメント)
--exist-ok	なし	False-	推論結果を上書き保存(指定すれば上書き)
--line_thickness	int	3	bounding box thickness (pixels)
--hide_labels	なし	False	hide labels
--hide_conf	なし	False	hide confidences
--half	なし	False	use FP16 half-precision inference
--dnn	なし	False	use OpenCV DNN for ONNX inference
--vid_stride	int	1	video frame-rate stride

↑

学習済みモデルのフォーマット変換「export.py」 †

「export.py」対応モデル

フォーマット	パラメータ `export.py --include`	変換モデルファイル名称
PyTorch	-	yolov5s.pt
TorchScript	`torchscript`	yolov5s.torchscript
ONNX	`onnx`	yolov5s.onnx
OpenVINO	`openvino`	yolov5s_openvino_model/ ※
TensorRT	`engine`	yolov5s.engine
CoreML	`coreml`	yolov5s.mlmodel
TensorFlow SavedModel	`saved_model`	yolov5s_saved_model/ ※
TensorFlow GraphDef	`pb`	yolov5s.pb
TensorFlow Lite	`tflite`	yolov5s.tflite
TensorFlow Edge TPU	`edgetpu`	yolov5s_edgetpu.tflite
TensorFlow.js	`tfjs`	yolov5s_web_model/ ※
PaddlePaddle	`paddle`	yolov5s_paddle_model/ ※

　※ フォルダ名（フォルダ内に変換したモデルファイル）

onnx, OpenVINO™ に変換する

(py_learn2) python export.py --weights yolov5s.pt --include onnx openvino

・実行結果

(py_learn2) python export.py --weights yolov5s.pt --include onnx openvino
export: data=C:\anaconda_win\workspace_pylearn\yolov5\data\coco128.yaml, weights=['yolov5s.pt'], imgsz=[640, 640], batch_size=1, device=cpu, half=False, inplace=False, keras=False, optimize=False, int8=False, per_tensor=False, dynamic=False, simplify=False, opset=17, verbose=False, workspace=4, nms=False, agnostic_nms=False, topk_per_class=100, topk_all=100, iou_thres=0.45, conf_thres=0.25, include=['onnx', 'openvino']
YOLOv5  v7.0-294-gdb125a20 Python-3.11.8 torch-2.2.1+cu121 CPU

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs

PyTorch: starting from yolov5s.pt with output shape (1, 25200, 85) (14.1 MB)

ONNX: starting export with onnx 1.15.0...
ONNX: export success  0.8s, saved as yolov5s.onnx (28.0 MB)

OpenVINO: starting export with openvino 2024.0.0-14509-34caeefd078-releases/2024/0...
OpenVINO: export success  1.4s, saved as yolov5s_openvino_model\ (28.2 MB)

Export complete (2.8s)
Results saved to C:\anaconda_win\workspace_pylearn\yolov5
Detect:          python detect.py --weights yolov5s_openvino_model\
Validate:        python val.py --weights yolov5s_openvino_model\
PyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s_openvino_model\')
Visualize:       https://netron.app

パッケージ環境（OpenVINO™ のバージョンによっては動作しないことがある）

(py_learn2) python -V
Python 3.11.8

(py_learn2) conda list
    :
onnx                      1.15.0                   pypi_0    pypi
onnxruntime               1.17.1                   pypi_0    pypi
opencv                    4.6.0           py311h5d08a89_5
opencv-python             4.9.0.80                 pypi_0    pypi
openjpeg                  2.4.0                h4fc8c34_0
openssl                   3.0.13               h2bbff1b_0
openvino                  2024.0.0                 pypi_0    pypi
openvino-dev              2024.0.0                 pypi_0    pypi
openvino-telemetry        2023.2.1                 pypi_0    pypi
    :

(参考) OpenVINO™ のコンバートコマンドで変換

(py_learn) mo  --input_model yolov5s.onnx

・実行結果

(py_learn) mo  --input_model yolov5s.onnx
[ INFO ] Generated IR will be compressed to FP16. If you get lower accuracy, please consider disabling compression explicitly by adding argument --compress_to_fp16=False.
Find more information about compression to FP16 at https://docs.openvino.ai/2023.0/openvino_docs_MO_DG_FP16_Compression.html
[ INFO ] The model was converted to IR v11, the latest model format that corresponds to the source DL framework input/output format. While IR v11 is backwards compatible with OpenVINO Inference Engine API v1.0, please use API v2.0 (as of 2022.1) to take advantage of the latest improvements in IR v11.
Find more information about API v2.0 and IR v11 at https://docs.openvino.ai/2023.0/openvino_2_0_transition_guide.html
[ INFO ] MO command line tool is considered as the legacy conversion API as of OpenVINO 2023.2 release. Please use OpenVINO Model Converter (OVC). OVC represents a lightweight alternative of MO and provides simplified model conversion API.
Find more information about transition from MO to OVC at https://docs.openvino.ai/2023.2/openvino_docs_OV_Converter_UG_prepare_model_convert_model_MO_OVC_transition.html
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: C:\anaconda_win\workspace_pylearn\yolov5\yolov5s.xml
[ SUCCESS ] BIN file: C:\anaconda_win\workspace_pylearn\yolov5\yolov5s.bin

OpenVINO™ 対応プログラムについて
・この方法でコンバートされたモデルは、従来の方法によるアクセスプログラムでは動作しない → サンプルデモを動かす
・「openvino-dev」パッケージ付属のモデルオプティマイザーによる変換が必要
　→ 「export.py」で得られた ONNXファイルを OpenVINO™ IR に変換

↑

変換した学習済みモデルで推論プログラム「detect2.py」の実行 †

PyTorch（オリジナル）モデル「yolov5s.pt」

(py_learn2) python detect2.py --source ../../Videos/car_m.mp4 --view-img

・実行結果

(py_learn2) python detect2.py --source ../../Videos/car_m.mp4 --view-img

detect2: weights=yolov5s.pt, source=../../Videos/car_m.mp4, data=data\coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=True, save_txt=False, save_csv=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5  v7.0-294-gdb125a20 Python-3.11.8 torch-2.2.1+cu121 CUDA:0 (NVIDIA GeForce RTX 4070 Ti, 12282MiB)

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Speed: 0.4ms pre-process, 5.7ms inference, 2.6ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs\detect\exp17

ONNX モデル「yolov5s.onnx」

(py_learn2) python detect2.py --source ../../Videos/car_m.mp4 --view-img --weights yolov5s.onnx

・実行結果

(py_learn2) python detect2.py --source ../../Videos/car_m.mp4 --view-img --weights yolov5s.onnx

detect2: weights=['yolov5s.onnx'], source=../../Videos/car_m.mp4, data=data\coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=True, save_txt=False, save_csv=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5  v7.0-294-gdb125a20 Python-3.11.8 torch-2.2.1+cu121 CUDA:0 (NVIDIA GeForce RTX 4070 Ti, 12282MiB)

Loading yolov5s.onnx for ONNX Runtime inference...
requirements: Ultralytics requirement ['onnxruntime-gpu'] not found, attempting AutoUpdate...
ERROR: Could not install packages due to an OSError: [WinError 5] アクセスが拒否されました。: 'C:\\Users\\izuts\\anaconda3\\envs\\py_learn2\\Lib\\site-packages\\onnxruntime\\capi\\onnxruntime_providers_shared.dll'
Consider using the `--user` option or check the permissions.

requirements: ❌ Command 'pip install --no-cache "onnxruntime-gpu" ' returned non-zero exit status 1.
Speed: 1.3ms pre-process, 29.4ms inference, 4.1ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs\detect\exp18

OpenVINO™ モデル「yolov5s_openvino_model」

(py_learn2) python detect2.py --source ../../Videos/car_m.mp4 --view-img --weights yolov5s_openvino_model

・実行結果

(py_learn2) python detect2.py --source ../../Videos/car_m.mp4 --view-img --weights yolov5s_openvino_model

detect2: weights=['yolov5s_openvino_model'], source=../../Videos/car_m.mp4, data=data\coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=True, save_txt=False, save_csv=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5  v7.0-294-gdb125a20 Python-3.11.8 torch-2.2.1+cu121 CUDA:0 (NVIDIA GeForce RTX 4070 Ti, 12282MiB)

Loading yolov5s_openvino_model for OpenVINO inference...
Speed: 1.3ms pre-process, 29.3ms inference, 3.8ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs\detect\exp19

↑

YOLO V5 を「PyTorch」で使う †

↑

YOLO V5 テストプログラム †

プロジェクト・パッケージ「update_20240405.zip」に同梱

プロジェクトの実行ディレクトリ「workspace_pylearn/yolov5/」

(py_learn) python yolov5-test2.py

・実行結果

(py_learn) python yolov5-test2.py
Using cache found in C:\Users\<User>/.cache\torch\hub\ultralytics_yolov5_master
YOLOv5  2024-3-13 Python-3.11.7 torch-2.2.0+cu121 CUDA:0 (NVIDIA GeForce RTX 4070 Ti, 12282MiB)

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape...
Saved 2 images to runs\detect\exp13
image 1/2: 720x1280 2 persons, 2 ties
image 2/2: 1080x810 4 persons, 1 bus
Speed: 8.7ms pre-process, 30.0ms inference, 79.0ms NMS per image at shape (2, 3, 640, 640)
最初の画像からの検出
tensor([[7.42863e+02, 4.79508e+01, 1.14113e+03, 7.16857e+02, 8.80750e-01, 0.00000e+00],
        [4.42037e+02, 4.37341e+02, 4.96715e+02, 7.09926e+02, 6.87170e-01, 2.70000e+01],
        [1.25252e+02, 1.93575e+02, 7.10963e+02, 7.13103e+02, 6.41552e-01, 0.00000e+00],
        [9.82882e+02, 3.08400e+02, 1.02733e+03, 4.20228e+02, 2.62887e-01, 2.70000e+01]], device='cuda:0')
２番目の画像からの検出
tensor([[2.20872e+02, 4.07374e+02, 3.45721e+02, 8.74728e+02, 8.35223e-01, 0.00000e+00],
        [6.62591e+02, 3.86202e+02, 8.10000e+02, 8.80324e+02, 8.28926e-01, 0.00000e+00],
        [5.75802e+01, 3.97293e+02, 2.14777e+02, 9.18263e+02, 7.85060e-01, 0.00000e+00],
        [1.47090e+01, 2.22154e+02, 7.98415e+02, 7.84966e+02, 7.81528e-01, 5.00000e+00],
        [0.00000e+00, 5.53392e+02, 7.24685e+01, 8.74691e+02, 4.64727e-01, 0.00000e+00]], device='cuda:0')
全てのクラス
{0: 'person', 1: 'bicycle', 2: 'car', 3: 'motorcycle', 4: 'airplane', 5: 'bus', 6: 'train', 7: 'truck', 8: 'boat', 9: 'traffic light', 10: 'fire hydrant', 11: 'stop sign', 12: 'parking meter', 13: 'bench', 14: 'bird', 15: 'cat', 16: 'dog', 17: 'horse', 18: 'sheep', 19: 'cow', 20: 'elephant', 21: 'bear', 22: 'zebra', 23: 'giraffe', 24: 'backpack', 25: 'umbrella', 26: 'handbag', 27: 'tie', 28: 'suitcase', 29: 'frisbee', 30: 'skis', 31: 'snowboard', 32: 'sports ball', 33: 'kite', 34: 'baseball bat', 35: 'baseball glove', 36: 'skateboard', 37: 'surfboard', 38: 'tennis racket', 39: 'bottle', 40: 'wine glass', 41: 'cup', 42: 'fork', 43: 'knife', 44: 'spoon', 45: 'bowl', 46: 'banana', 47: 'apple', 48: 'sandwich', 49: 'orange', 50: 'broccoli', 51: 'carrot', 52: 'hot dog', 53: 'pizza', 54: 'donut', 55: 'cake', 56: 'chair', 57: 'couch', 58: 'potted plant', 59: 'bed', 60: 'dining table', 61: 'toilet', 62: 'tv', 63: 'laptop', 64: 'mouse', 65: 'remote', 66: 'keyboard', 67: 'cell phone', 68: 'microwave', 69: 'oven', 70: 'toaster', 71: 'sink', 72: 'refrigerator', 73: 'book', 74: 'clock', 75: 'vase', 76: 'scissors', 77: 'teddy bear', 78: 'hair drier', 79: 'toothbrush'}

・結果画像は「runs/detect/exp(2・3・4 …)」ディレクトリに保存されている。

ソースファイル

▼「yolov5-test2.py」

# -*- coding: utf-8 -*-
##------------------------------------------
## 【復習】「PyTorch で始める AI開発」
##   exercise / YOLOv5で物体検出    Ver. 0.02
##       (PyTorch Hubからダウンロード)
##
##               2024.09.13 Masahiro Izutsu
##------------------------------------------
## https://kikaben.com/yolov5-starter/
## yolov5-test2.py

import torch

# Torch HubからYolao V5をダウンロード
model = torch.hub.load('ultralytics/yolov5', 'yolov5s', pretrained=True)
# 画像のURL
base_url = 'data/images/'

# 画像二つのバッチ
imgs = [base_url + f for f in ('zidane.jpg', 'bus.jpg')]

# 推論の実行
results = model(imgs)

# 結果を表示
results.show()

# 画像をセーブ
results.save()

# 検出されたクラスと数を表示
results.print()

# Bounding Boxなどの表示
print('最初の画像からの検出')
print(results.xyxy[0])

print('２番目の画像からの検出')
print(results.xyxy[1])

# サポートされているクラス
print('全てのクラス')
print(model.names)

↑

YOLO V5 物体検出プログラム「detect2_yolov5.py」の作成 †

作成プログラムの仕様
・オンライン／オフライン（ローカル）どちらでも動作する
・検出された 80種類のオブジェクトを領域と文字で表示する。
・文字の表示は「日本語/英語」の表記が可能。
・オブジェクトの種類によって色分け表示する。
・入力ソースは「WEBカメラ/動画ファイル/静止画ファイル」に対応する。
・結果を画像出力できる。
・プロジェクト・パッケージ「update_20240405.zip」に同梱
「yolov5」ディレクトリ直下にラベルファイルを用意しておく
```
coco.names　　　← 英語版
coco.names_jp 　← 日本語版
```

コマンドパラメータ

コマンドオプション	初期値	意味
-i , --image	'../../Videos/car_m.mp4'	入力ソースのパスまたはカメラ(cam/cam0～cam9)
-y , --yolov5	'ultralytics/yolov5'	yolov5ディレクトリのパス（ローカルの場合は yolov5 のパス）
-m , --models	'yolov5s'	モデル名（ローカルの場合はモデルファイルのパス）※1
-l , --labels	'coco.names_jp'	ラベルファイルのパス(coco.name, coco_name_jp)
-c , --conf	0.25	オブジェクト検出レベルの閾値
-t , --title	'y'	タイトルの表示(y/n)
-s , --speed	'y'	速度の表示(y/n)
-o , --out	'non'	出力結果の保存パス <path/filename> ※2
-cpu	-	CPUフラグ（指定すれば常に CPU動作）

※1 オンライン動作の場合の指定できるモデルの種類「yolov5n」「yolov5s」「yolov5m」「yolov5l」「yolov5x」
※2 出力ファイル名までのディレクトリ・パスは必ず存在すること（存在しない場合は保存しない）

・「-y , --yolov5」パラメータ指定の例

-y ultralytics/yolov5                                       ← オンライン（TorchHub）<default>
-y ./                                                       ← オフライン（ローカル）

　※ 初回起動時にキャッシュにダウンロードされ以後はキャッシュで動作する

・「-m , --models」パラメータ指定の例

-m yolov5s                                                  ← オンライン（TorchHub）<default>
-m ./test/yolov5s.pt                                        ← オフライン（ローカル）

　※ モデルが指定場所にない場合は、初回実行時に自動的にダウンロードされる

▼　コマンドパラメータ詳細

(py_learn) python detect2_yolov5.py -h
usage: detect2_yolov5.py [-h] [-i IMAGE_FILE] [-y YOLOV5] [-m MODELS] [-c CONFIDENCE] [-l LABELS] [-t TITLE]
                         [-s SPEED] [-o IMAGE_OUT] [-cpu]

options:
  -h, --help            show this help message and exit
  -i IMAGE_FILE, --image IMAGE_FILE
                        Absolute path to image file or cam/cam0/cam1 for camera stream.
  -y YOLOV5, --yolov5 YOLOV5
                        YOLO V5 directry absolute path.
  -m MODELS, --models MODELS
                        yolov5n/yolov5m/yolov5l/yolov5x or model file absolute path.
  -c CONFIDENCE, --conf CONFIDENCE
                        confidences labels Default value is 0.25
  -l LABELS, --labels LABELS
                        Language.(jp/en) Default value is 'jp'
  -t TITLE, --title TITLE
                        Program title flag.(y/n) Default value is 'y'
  -s SPEED, --speed SPEED
                        Speed display flag.(y/n) Default calue is 'y'
  -o IMAGE_OUT, --out IMAGE_OUT
                        Processed image file path. Default value is 'non'
  -cpu                  Optional. CPU only!

オンライン実行例

(py_learn) python detect2_yolov5.py

・実行結果

(py_learn) python detect2_yolov5.py

Object detection YoloV5 in PyTorch Ver. 0.05: Starting application...
   OpenCV virsion : 4.9.0

   - Image File   :  ../../Videos/car_m.mp4
   - YOLO v5      :  ultralytics/yolov5
   - Pretrained   :  yolov5s
   - Confidence lv:  0.25
   - Label file   :  coco.names_jp
   - Program Title:  y
   - Speed flag   :  y
   - Processed out:  non
   - Use device   :  cuda:0

Using cache found in C:\Users\izuts/.cache\torch\hub\ultralytics_yolov5_master
YOLOv5  2024-3-13 Python-3.11.8 torch-2.2.1+cu121 CUDA:0 (NVIDIA GeForce RTX 4070 Ti, 12282MiB)

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape...

FPS average:      30.90

 Finished.

オフライン実行例

(py_learn) python detect2_yolov5.py -y ./

・実行結果

(py_learn) python detect2_yolov5.py -y ./

Object detection YoloV5 in PyTorch Ver. 0.05: Starting application...
   OpenCV virsion : 4.9.0

   - Image File   :  ../../Videos/car_m.mp4
   - YOLO v5      :  ./
   - Pretrained   :  yolov5s
   - Confidence lv:  0.25
   - Label file   :  coco.names_jp
   - Program Title:  y
   - Speed flag   :  y
   - Processed out:  non
   - Use device   :  cuda:0

YOLOv5  v7.0-294-gdb125a20 Python-3.11.8 torch-2.2.1+cu121 CUDA:0 (NVIDIA GeForce RTX 4070 Ti, 12282MiB)

Fusing layers...
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients, 16.4 GFLOPs
Adding AutoShape...

FPS average:      20.80

 Finished.

ソースコード

▼「detect2_yolov5.py」

# -*- coding: utf-8 -*-
##------------------------------------------
## 【復習】「PyTorch で始める AI開発」
##   Chapter 04 / Extra edition     Ver. 0.05
##       YoloV5 in PyTorch による物体検出
##
##               2024.09.13 Masahiro Izutsu
##------------------------------------------
## detect2_yolov5.py
##  Ver. 0.03   2024/04/09  classID=119 まで対応
##  Ver. 0.04   2024/04/13  クラウド/ローカル切り替え
##  Ver. 0.05   2024/04/15  confidence 閾値設定/カメラ入力(cam0-cam9)

# -y <YOLOv5>                                   -m <Pretrained model>
#    'ultralytics/yolov5'                          'yolov5s' [yolov5n][yolov5m][yolov5l][yolov5x]      Torch Hub on line
#    '/anaconda_win/workspace_pylearn/yolov5'      '/anaconda_win/workspace_pylearn/yolov5/yolov5s'              off line
#
# 例：Windows
#       python detect2_yolov5.py                (Torch Hub on line )
#       python detect2_yolov5.py -y '/anaconda_win/workspace_pylearn/yolov5' -m '/anaconda_win/workspace_pylearn/yolov5/yolov5s'
#
# 例：Linux
#       python detect2_yolov5.py                (Torch Hub on line)
#       python detect2_yolov5.py -y '~/workspace_pylearn/yolov5' -m '~/workspace_pylearn/yolov5/yolov5s'

# Color Escape Code
GREEN = '\033[1;32m'
RED = '\033[1;31m'
NOCOLOR = '\033[0m'
YELLOW = '\033[1;33m'

# 定数定義
WINDOW_WIDTH = 640

from os.path import expanduser
INPUT_DEF = expanduser('../../Videos/car_m.mp4')
LANG_DEF = 'coco.names_jp'                                    # 2024/04/09

# import処理
import sys
import cv2
import numpy as np
import argparse
import torch
from torch import nn
from torchvision import transforms, models
from PIL import Image
import platform

import my_puttext                                           # my library 2024.03.13
import my_fps                                               # my library 2024.03.13
import my_color80                                           # my library 2024.03.13

TEXT_COLOR = my_color80.CR_white

# タイトル
title = 'Object detection YoloV5 in PyTorch Ver. 0.05'

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type=str,
            default = INPUT_DEF,
            help = 'Absolute path to image file or cam/cam0/cam1 for camera stream.')
    parser.add_argument('-y', '--yolov5', metavar = 'YOLOV5', type=str,
            default = 'ultralytics/yolov5',
            help = 'YOLO V5 directry absolute path.')
    parser.add_argument('-m', '--models', metavar = 'MODELS', type=str,
            default = 'yolov5s',
            help = 'yolov5n/yolov5m/yolov5l/yolov5x or model file absolute path.')
    parser.add_argument('-c', '--conf', metavar = 'CONFIDENCE',
            default = 0.25,                                 # 2024/04/14
            help = 'confidences labels Default value is 0.25')
    parser.add_argument('-l', '--labels', metavar = 'LABELS',
            default = LANG_DEF,                             # 2024/04/09
            help = 'Language.(jp/en) Default value is \'jp\'')
    parser.add_argument('-t', '--title', metavar = 'TITLE',
            default = 'y',
            help = 'Program title flag.(y/n) Default value is \'y\'')
    parser.add_argument('-s', '--speed', metavar = 'SPEED',
            default = 'y',
            help = 'Speed display flag.(y/n) Default calue is \'y\'')
    parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT',
            default = 'non',
            help = 'Processed image file path. Default value is \'non\'')
    parser.add_argument("-cpu", default = False, action = 'store_true',
            help="Optional. CPU only!")
    return parser

# モデル基本情報の表示
def display_info(image, yolov5, models, conf, labels, titleflg, speedflg, outpath, use_device):
    print('\n' + GREEN + title + ': Starting application...' + NOCOLOR)
    print('   OpenCV virsion :',cv2.__version__)
    print('\n   - ' + YELLOW + 'Image File   : ' + NOCOLOR, image)
    print('   - ' + YELLOW + 'YOLO v5      : ' + NOCOLOR, yolov5)
    print('   - ' + YELLOW + 'Pretrained   : ' + NOCOLOR, models)
    print('   - ' + YELLOW + 'Confidence lv: ' + NOCOLOR, conf)
    print('   - ' + YELLOW + 'Label file   : ' + NOCOLOR, labels)
    print('   - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg)
    print('   - ' + YELLOW + 'Speed flag   : ' + NOCOLOR, speedflg)
    print('   - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath)
    print('   - ' + YELLOW + 'Use device   : ' + NOCOLOR, use_device, '\n')

# 画像の種類を判別する
#   戻り値: 'jeg''png'... 画像ファイル
#           'None'        画像ファイル以外 (動画ファイル)
#           'NotFound'    ファイルが存在しない
import os
def is_pict(filename):
    '''
    try:
        imgtype = imghdr.what(filename)
    except FileNotFoundError as e:
        imgtype = 'NotFound'
    return str(imgtype)
    '''
    if not os.path.isfile(filename):
        return 'NotFound'

    types = ['.bmp','.png','.jpg','.jpeg','.JPG','.tif']
    for ss in types:
        if filename.endswith(ss):
            return ss
    return 'None'

# ** main関数 **
def main():
    # 日本語フォント指定
    fontPIL = my_puttext.get_font()                         # 2024.03.13

    # Argument parsing and parameter setting
    ARGS = parse_args().parse_args()
    input_stream = ARGS.image
    labels = ARGS.labels                                    # 2024/04/09
    titleflg = ARGS.title
    speedflg = ARGS.speed

    # 入力 cam/cam0-cam9 対応                               # 2024/04/15
    if input_stream.find('cam') == 0 and len(input_stream) < 5:
        input_stream = 0 if input_stream == 'cam' else int(input_stream[3])
        isstream = True
    else:
        filetype = is_pict(input_stream)
        isstream = filetype == 'None'
        if (filetype == 'NotFound'):
            print(RED + "\ninput file Not found." + NOCOLOR)
            quit()
    outpath = ARGS.out
    conf = ARGS.conf
    yolov5 = ARGS.yolov5 if platform.system()=='Windows' else expanduser(ARGS.yolov5)
    models = ARGS.models if platform.system()=='Windows' else expanduser(ARGS.models)
    
    # 判定ラベル
    with open(labels, 'r', encoding="utf-8") as labels_file:
        label_list = labels_file.read().splitlines()

    # GPUが使用できるか調べる
    use_device = 'cuda:0' if not ARGS.cpu and torch.cuda.is_available() else 'cpu'

    # 情報表示
    display_info(input_stream, yolov5, models, conf, labels, titleflg, speedflg, outpath, use_device)

    # TorchHubからモデルを読み込む （クラウド/ローカル切り替え）    2024/04/13
    cust = 'custom' if 0 < models.find('yolo') else ''
    if yolov5 == 'ultralytics/yolov5':
        if cust == '':
            if -1 == models.find('.'):
                model = torch.hub.load(yolov5, models)
            else:
                model = torch.hub.load(yolov5, 'custom', models)
        else:
            model = torch.hub.load(yolov5, cust, models)
    else:
        if cust == '':
            if -1 == models.find('.'):
                model = torch.hub.load(yolov5, models, source='local')
            else:
                model = torch.hub.load(yolov5, 'custom', models, source='local')
        else:
            model = torch.hub.load(yolov5, cust, models, source='local')

    # モデルを推論用に設定する
    model.eval()
    model.to(use_device)

    # 入力準備
    if (isstream):
        # カメラ 
        cap = cv2.VideoCapture(input_stream)
        ret, frame = cap.read()
        loopflg = cap.isOpened()
    else:
        # 画像ファイル読み込み
        frame = cv2.imread(input_stream)
        if frame is None:
            print(RED + "\nUnable to read the input." + NOCOLOR)
            quit()

        # アスペクト比を固定してリサイズ
        img_h, img_w = frame.shape[:2]
        if (img_w > WINDOW_WIDTH):
            height = round(img_h * (WINDOW_WIDTH / img_w))
            frame = cv2.resize(frame, dsize = (WINDOW_WIDTH, height))
        loopflg = True                                      # 1回ループ

    # 処理結果の記録 step1
    if (outpath != 'non'):
        if (isstream):
            fps = int(cap.get(cv2.CAP_PROP_FPS))
            out_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
            out_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
            fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
            outvideo = cv2.VideoWriter(outpath, fourcc, fps, (out_w, out_h))

    # 計測値初期化
    fpsWithTick = my_fps.fpsWithTick()
    fps_total = 0
    fpsWithTick.get()                                       # fps計測開始

    # メインループ 
    while (loopflg):
        if frame is None:
            print(RED + "\nUnable to read the input." + NOCOLOR)
            quit()

        # ニューラルネットワークを実行する
        results = model(frame, size=640)
        message = []                                        # 表示メッセージ
        bbox = results.xyxy[0].detach().cpu().numpy()
        for preds in bbox:
            xmin = int(preds[0])
            ymin = int(preds[1])
            xmax = int(preds[2])
            ymax = int(preds[3])
            confidence  = preds[4]
            class_id  = int(preds[5])
            color_id = class_id if class_id < 80 else class_id - 40 # 2024/04/09
            
            if (confidence > conf):                         # 低い確率を除外
                # オブジェクト別の色指定
                BOX_COLOR = my_color80.get_boder_bgr80(color_id)
                LABEL_BG_COLOR = my_color80.get_back_bgr80(color_id)

                # ラベル描画領域を得る
                x0,y0,x1,y1 = my_puttext.cv2_putText(img = frame,
                                       text = label_list[class_id] + ': %.2f' % confidence,
                                       org = (xmin+5, ymin+18), fontFace = fontPIL,
                                       fontScale = 14,
                                       color = TEXT_COLOR,
                                       mode = 0,
                                       areaf=True)
                xx = xmax if xmax > x1 else x1              # 横が領域を超える場合は超えた値にする
                cv2.rectangle(frame,(xmin, ymin), (xx, ymin+20), LABEL_BG_COLOR, -1)
                my_puttext.cv2_putText(img = frame,
                                       text = label_list[class_id] + ': %.2f' % confidence,
                                       org = (xmin+5, ymin+18), fontFace = fontPIL,
                                       fontScale = 14,
                                       color = TEXT_COLOR,
                                       mode = 0)
                # 画像に枠を描く
                cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), BOX_COLOR, 1)

        # FPSを計算する
        fps = fpsWithTick.get()
        st_fps = 'fps: {:>6.2f}'.format(fps)
        if (speedflg == 'y'):
            cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1)
            cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA)

        # タイトル描画
        if (titleflg == 'y'):
            cv2.putText(frame, title, (12, 32), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(0, 0, 0), lineType=cv2.LINE_AA)
            cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA)

        # 画像表示 
        window_name = title + "  (hit 'q' or 'esc' key to exit)"
        cv2.namedWindow(window_name, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) 
        cv2.imshow(window_name, frame)

        # 処理結果の記録 step2
        if (outpath != 'non'):
            if (isstream):
                outvideo.write(frame)
            else:
                cv2.imwrite(outpath, frame)

        # 何らかのキーが押されたら終了 
        breakflg = False
        while(True):
            key = cv2.waitKey(1)
            prop_val = cv2.getWindowProperty(window_name, cv2.WND_PROP_ASPECT_RATIO)
            if cv2.getWindowProperty(window_name, cv2.WND_PROP_VISIBLE) < 1:        
                print('\n Window close !!')
                sys.exit(0)
            if key == 27 or key == 113 or (prop_val < 0.0):     # 'esc' or 'q'
                breakflg = True
                break
            if (isstream):
                break

        if ((breakflg == False) and isstream):
            # 次のフレームを読み出す
            ret, frame = cap.read()
            if ret == False:
                break
            loopflg = cap.isOpened()
        else:
            loopflg = False

    # 終了処理 
    if (isstream):
        cap.release()

        # 処理結果の記録 step3
        if (outpath != 'non'):
            if (isstream):
                outvideo.release()

    cv2.destroyAllWindows()

    print('\nFPS average: {:>10.2f}'.format(fpsWithTick.get_average()))
    print('\n Finished.')

# main関数エントリーポイント(実行開始)
if __name__ == "__main__":
    sys.exit(main())

↑

PyTorch モデル実行速度 †

実行プログラム「python detect2_yolov5.py」　(単位：fps)

マシン・OS	モデル	car_m.mp4		car1_mp4		car2.mp4

		GPU	CPU	GPU	CPU	GPU	CPU
HP ENVY Windows 11	yolov5n	32.2	15.9	49.7	17.9	56.0	20.7
	yolov5s	31.3	12.7	38.7	14.8	48.9	15.3
	yolov5m	28.8	8.7	31.8	9.4	42.4	9.5
	yolov5l	25.1	5.7	31.5	5.9	32.0	6.0
	yolov5x	23.8	3.9	30.8	4.0	31.8	4.1
HP ENVY Ubuntu 22.04LTS	yolov5n	64.0	30.0	86.0	34.0	91.0	37.0
	yolov5s	53.9	21.1	72.3	25.5	87.5	27.1
	yolov5m	49.8	13.0	63.2	14.7	78.0	16.0
	yolov5l	44.3	8.3	54.7	8.1	70.7	8.9
	yolov5x	37.7	5.4	46.4	4.9	57.4	5.1
HP ELITE Windows 10	yolov5n	27.3	9.4	39.4	10.6	48.3	11.1
	yolov5s	19.6	5.4	27.9	5.9	30.2	6.3
	yolov5m	15.4	2.9	18.5	3.1	22.3	3.2
	yolov5l	11.1	1.7	12.7	1.7	14.8	1.8
	yolov5x	7.6	1.0	8.3	1.0	9.2	1.4
DELL Latitude Ubuntu 20.04LTS	yolov5n	-	10.8	-	13.7	-	14.7
	yolov5s	-	8.9	-	8.6	-	9.4
	yolov5m	-	3.9	-	4.2	-	4.5
	yolov5l	-	2.3	-	2.4	-	2.7
	yolov5x	-	1.5	-	1.5	-	1.6

・テストコマンド：yolov5n モデルの例

(py_learn) python detect2_yolov5.py -i ../../Videos/car_m.mp4 -m yolov5n
(py_learn) python detect2_yolov5.py -i ../../Videos/car_m.mp4 -m yolov5n -cpu
(py_learn) python detect2_yolov5.py -i ../../Videos/car1_m.mp4 -m yolov5n
(py_learn) python detect2_yolov5.py -i ../../Videos/car1_m.mp4 -m yolov5n -cpu
(py_learn) python detect2_yolov5.py -i ../../Videos/car2_m.mp4 -m yolov5n
(py_learn) python detect2_yolov5.py -i ../../Videos/car2_m.mp4 -m yolov5n -cpu

テスト環境（Intel® CPU / NVIDIA GPU）

機種	OS	CPU	GPU
HP ENVY Desktop TE02-1097jp	Windows11/Ubuntu22.04LTS	13th Gen Core™ i9-13900	GeForce RTX 4070 Ti 12GB
HP EliteDesk 800 G2 SFF	Windows10	6 th Gen Core™ i7-6700	GeForce GTX 1050 Ti 4GB
DELL Latitude 7520 NoteBook	Ubuntu20.04LTS	11th Gen Core™ i7-1185G7	-

↑

モデルによる推論結果の違い †

学習済みモデル「yolov5n(軽)」→「yolov5x(重)」（「detect2.py」による実行例）

元画像 yolov5n yolov5s yolov5m yolov5l yolov5x

↑

YOLO V5 / YOLO V3 比較 †

今回(V5)の結果と以前(V3)の結果を比較する

信号機など V3で検出されなかったオブジェクトが検出できている。

↑

YOLO V5 を「OpenVINO™」で使う †

↑

OpenVINO™ API 2.0 対応方法を調べる †

サンプルデモのインストール
・実行ディレクトリ「workspace_pylearn/」

(py_learn) git clone https://github.com/violet17/yolov5_demo.git

・実行ログ

(py_learn) git clone https://github.com/violet17/yolov5_demo.git
Cloning into 'yolov5_demo'...
remote: Enumerating objects: 31, done.
remote: Counting objects: 100% (31/31), done.
remote: Compressing objects: 100% (30/30), done.
remote: Total 31 (delta 13), reused 0 (delta 0), pack-reused 0Receiving objects:  58% (18/31)
Receiving objects: 100% (31/31), 59.87 KiB | 5.99 MiB/s, done.
Resolving deltas: 100% (13/13), done.

冒頭の update_20240405.zip を解凍してできた「workspace_pylearn/yolov5_demo」を「git clone」でできた「yolov5_demo」にコピーする

オリジナルのデモプログラムを動かす
・実行ディレクトリ「workspace_pylearn/yolov5_demo/」

(py_learn2) python yolov5_demo_sync_ov2023.py -i ../yolov5/data/images/zidane.jpg -m ../yolov5/yolov5s_openvino_model/yolov5s.xml

・オリジナルの API2.0 対応デモプログラム「yolov5_demo_sync_ov2023.py」

(py_learn2) python yolov5_demo_sync_ov2023.py -i ../yolov5/data/images/zidane.jpg -m ../yolov5/yolov5s_openvino_model/yolov5s.xml
[ INFO ] Creating OpenVINO Runtime Core...
[ INFO ] Reading the model:
        ../yolov5/yolov5s_openvino_model/yolov5s.xml
[ INFO ] Preparing inputs
*********** [1,3,640,640]
--------- ../yolov5/data/images/zidane.jpg
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference...
[ INFO ]          classes : 80
[ INFO ]          num     : 3
[ INFO ]          coords  : 4
[ INFO ]          anchors : [10.0, 13.0, 16.0, 30.0, 33.0, 23.0, 30.0, 61.0, 62.0, 45.0, 59.0, 119.0, 116.0, 90.0, 156.0, 198.0, 373.0, 326.0]
Traceback (most recent call last):
  File "C:\anaconda_win\workspace_pylearn\yolov5_demo\yolov5_demo_sync_ov2023.py", line 349, in <module>
    sys.exit(main() or 0)
             ^^^^^^
  File "C:\anaconda_win\workspace_pylearn\yolov5_demo\yolov5_demo_sync_ov2023.py", line 281, in main
    objects += parse_yolo_region(out_blob, in_frame.shape[2:],
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\anaconda_win\workspace_pylearn\yolov5_demo\yolov5_demo_sync_ov2023.py", line 153, in parse_yolo_region
    out_blob_n, out_blob_c, out_blob_h, out_blob_w = blob.shape
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: not enough values to unpack (expected 4, got 3)

　※ 変換した学習済みモデルには対応できないよう

以前物体検出アルゴリズム「YOLO V5」で使用したモデル(V3) で動かしてみる
・学習済みモデルを「workspace_pylearn/yolov5_demo/」内に「yolov5s_v3.xml」「yolov5s_v3.bin」の名前で用意しておく
　→ GitHUNB: ultralytics/yolov5 V3

(py_learn) python yolov5_demo_sync_ov2023.py -i ../yolov5/data/images/zidane.jpg -m yolov5s_v3.xml -show

・実行結果

(py_learn) python yolov5_demo_sync_ov2023.py -i ../yolov5/data/images/zidane.jpg -m yolov5s_v3.xml -show
[ INFO ] Creating OpenVINO Runtime Core...
[ INFO ] Reading the model:
        yolov5s_v3.xml
[ INFO ] Preparing inputs
*********** [1,3,640,640]
--------- ../yolov5/data/images/zidane.jpg
[ INFO ] Loading model to the plugin
[ INFO ] Starting inference...
[ INFO ]          classes : 80
[ INFO ]          num     : 3
[ INFO ]          coords  : 4
[ INFO ]          anchors : [10.0, 13.0, 16.0, 30.0, 33.0, 23.0, 30.0, 61.0, 62.0, 45.0, 59.0, 119.0, 116.0, 90.0, 156.0, 198.0, 373.0, 326.0]
[ INFO ]          classes : 80
[ INFO ]          num     : 3
[ INFO ]          coords  : 4
[ INFO ]          anchors : [10.0, 13.0, 16.0, 30.0, 33.0, 23.0, 30.0, 61.0, 62.0, 45.0, 59.0, 119.0, 116.0, 90.0, 156.0, 198.0, 373.0, 326.0]
[ INFO ]          classes : 80
[ INFO ]          num     : 3
[ INFO ]          coords  : 4
[ INFO ]          anchors : [10.0, 13.0, 16.0, 30.0, 33.0, 23.0, 30.0, 61.0, 62.0, 45.0, 59.0, 119.0, 116.0, 90.0, 156.0, 198.0, 373.0, 326.0]
(720, 1280)

　※ 以前の学習済みモデル(V3) では問題なく動作する

プログラムを改良する 「yolov5_demo_sync_ov2023x.py」
・「yolov5_demo_sync_ov2023.py」を「yolov5_demo_sync_ov2023x.py」としてコピーし修正する
　（プロジェクト・パッケージ「update_20240405.zip」に同梱）
・表示出力を日本語対応にする
・不具合修正（キー入力による中断など）
・コマンドパラメータを修正して使いやすくする

・ラベルファイルをコピーしておく

(py_learn) cp ../yolov5/coco.names ./
(py_learn) cp ../yolov5/coco.names_jp ./

・修正した「yolov5_demo_sync_ov2023x.py」の実行

(py_learn) python yolov5_demo_sync_ov2023x.py -i ../yolov5/data/images/zidane.jpg -r

・実行結果

(py_learn) python yolov5_demo_sync_ov2023x.py -i ../yolov5/data/images/zidane.jpg -r

--- YOLO V5 OpenVINO(API 2.0) demoprogram Ver 0.01 ---
OpenCV: 4.9.0
OpenVINO inference_engine: 2024.0.0-14509-34caeefd078-releases/2024/0

 Creating OpenVINO Runtime Core...
 Reading the model: yolov5s_v3.xml
 Label file  : coco.names_jp
 Input source: ../yolov5/data/images/zidane.jpg
 Starting inference...
[ INFO ]
Detected boxes for batch 1:
[ INFO ]  Class ID      | Confidence | XMIN | YMIN | XMAX | YMAX | COLOR
[ INFO ]     人         |   0.873057 |  747 |   39 | 1148 |  711 | (0, 80, 0)
[ INFO ]     人         |   0.816089 |  116 |  197 | 1003 |  711 | (0, 80, 0)
[ INFO ]   ネクタイ     |   0.778782 |  422 |  430 |  517 |  719 | (128, 0, 128)

 FPS average:      11.80

 Finished.

・動画入力の実行

(py_learn) python yolov5_demo_sync_ov2023x.py -i ../../Videos/car1_m.mp4

・実行結果

(py_learn) python yolov5_demo_sync_ov2023x.py -i ../../Videos/car1_m.mp4

--- YOLO V5 OpenVINO(API 2.0) demoprogram Ver 0.01 ---
OpenCV: 4.9.0
OpenVINO inference_engine: 2024.0.0-14509-34caeefd078-releases/2024/0

 Creating OpenVINO Runtime Core...
 Reading the model: yolov5s_v3.xml
 Label file  : coco.names_jp
 Input source: ../../Videos/car1_m.mp4
 Starting inference...

 FPS average:       9.20

 Finished.

・カメラ入力の実行

(py_learn) python yolov5_demo_sync_ov2023x.py

・実行結果

(py_learn) python yolov5_demo_sync_ov2023x.py

--- YOLO V5 OpenVINO(API 2.0) demoprogram Ver 0.01 ---
OpenCV: 4.9.0
OpenVINO inference_engine: 2024.0.0-14509-34caeefd078-releases/2024/0

 Creating OpenVINO Runtime Core...
 Reading the model: yolov5s_v3.xml
 Label file  : coco.names_jp
 Input source: 0
 Starting inference...

 FPS average:      10.40

 Finished.

主なコマンドパラメータ

コマンド・オプション	初期値	意味
-i , --input	'cam'	入力ソースのパス or cam/cam0/cam1
-m , --mode	'yolov5s_v3.xml'	学習済みモデルのパス
-d , --device	'CPU'	推論デバイス(CPU, GPU, FPGA, HDDL or MYRIAD)
--labels	'coco.names_jp'	ラベルファイルのパス(coco.name, coco_name_jp)
-show	-	表示禁止フラグ（指定すると画面表示をしない）
-r, --raw_output_message	-	メッセージ出力フラグ
-x, --debug_message	-	デバッグ・メッセージ出力フラグ

▼　コマンドパラメータ詳細

(py_learn) python yolov5_demo_sync_ov2023x.py -h

--- YOLO V5 OpenVINO(API 2.0) demoprogram Ver 0.01 ---
OpenCV: 4.9.0
OpenVINO inference_engine: 2024.0.0-14509-34caeefd078-releases/2024/0

usage: yolov5_demo_sync_ov2023x.py [-h] [-m MODEL] [-i INPUT] [-l CPU_EXTENSION] [-d DEVICE] [--labels LABELS]
                                   [-t PROB_THRESHOLD] [-iout IOU_THRESHOLD] [-ni NUMBER_ITER] [-pc] [-r] [-x] [-show]

Options:
  -h, --help            Show this help message and exit.
  -m MODEL, --model MODEL
                        Required. Path to an .xml file with a trained model.
  -i INPUT, --input INPUT
                        Required. Path to an image/video file. (Specify 'cam' to work with camera)
  -l CPU_EXTENSION, --cpu_extension CPU_EXTENSION
                        Optional. Required for CPU custom layers. Absolute path to a shared library with the kernels
                        implementations.
  -d DEVICE, --device DEVICE
                        Optional. Specify the target device to infer on; CPU, GPU, FPGA, HDDL or MYRIAD is acceptable.
                        The sample will look for a suitable plugin for device specified. Default value is CPU
  --labels LABELS       Optional. Labels mapping file
  -t PROB_THRESHOLD, --prob_threshold PROB_THRESHOLD
                        Optional. Probability threshold for detections filtering
  -iout IOU_THRESHOLD, --iou_threshold IOU_THRESHOLD
                        Optional. Intersection over union threshold for overlapping detections filtering
  -ni NUMBER_ITER, --number_iter NUMBER_ITER
                        Optional. Number of inference iterations
  -pc, --perf_counts    Optional. Report performance counters
  -r, --raw_output_message
                        Optional. Output inference results raw values showing
  -x, --debug_message   Optional. Output debug values showing
  -show                 Optional. Hide output view

ソースコード

▼　「yolov5_demo_sync_ov2023x.py」

#!/usr/bin/env python
# -*- coding: utf-8 -*-
##------------------------------------------
## YOLO V5 OpenVINO demoprogram  Ver 0.01 
##   GitHub https://github.com/violet17/yolov5_demo
##
##               2024.03.18 Masahiro Izutsu
##------------------------------------------
## yolov5_demo_sync_ov2023x.py  (original: yolov5_demo_sync_ov2023.py)
##
## 修正箇所：
## ・検出したオブジェクトの表示の日本語対応と表示色
## ・キー入力による中断の不具合修正
## ・コンソール出力、ログ出力の変更
## ・入力パラメータの改良

## from (original: yolov5_demo_sync_ov2023.py)
"""
 Copyright (C) 2018-2019 Intel Corporation

 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
"""

## --- yolov5_demo_sync_ov2023x.py ---

# インポート処理
from __future__ import print_function, division

import logging
import os
import sys
from argparse import ArgumentParser, SUPPRESS
from math import exp as exp
from time import time
import numpy as np

import cv2
from openvino.preprocess import PrePostProcessor, ResizeAlgorithm
from openvino.runtime import Core, Layout, Type
import openvino.runtime as ov

import my_puttext                                               # 2024/03/18
import my_color80                                               # 2024/03/18
import my_fps                                                   # 2024/03/18

#import object_check                                             # 2024/03/20

# Color Escape Code
GREEN = '\033[1;32m'
RED = '\033[1;31m'
NOCOLOR = '\033[0m'
YELLOW = '\033[1;33m'
CYAN = '\033[1;36m'

# 定数定義
TEXT_COLOR = my_color80.CR_white                                # 2024/03/18
DEF_MODEL_PATH = os.path.expanduser('yolov5s_v3.xml')
DEF_LABEL_PATH = os.path.expanduser('coco.names_jp')
DEF_INPUT_PATH = os.path.expanduser('cam')

# タイトル・バージョン情報
title = 'YOLO V5 OpenVINO(API 2.0) demoprogram Ver 0.01'
print(CYAN + '\n--- {} ---'.format(title))
print(GREEN + 'OpenCV:',cv2.__version__)
print("OpenVINO inference_engine:", ov.get_version())
print(NOCOLOR)

logging.basicConfig(format="[ %(levelname)s ] %(message)s", level=logging.INFO, stream=sys.stdout)
log = logging.getLogger()

def build_argparser():
    parser = ArgumentParser(add_help=False)
    args = parser.add_argument_group('Options')
    args.add_argument('-h', '--help', action='help', default=SUPPRESS, help='Show this help message and exit.')
    args.add_argument("-m", "--model", help="Required. Path to an .xml file with a trained model.",
                      default=DEF_MODEL_PATH, type=str)
    args.add_argument("-i", "--input", help="Required. Path to an image/video file. (Specify 'cam' to work with "
                                            "camera)", default=DEF_INPUT_PATH, type=str)
    args.add_argument("-l", "--cpu_extension",
                      help="Optional. Required for CPU custom layers. Absolute path to a shared library with "
                           "the kernels implementations.", type=str, default=None)
    args.add_argument("-d", "--device",
                      help="Optional. Specify the target device to infer on; CPU, GPU, FPGA, HDDL or MYRIAD is"
                           " acceptable. The sample will look for a suitable plugin for device specified. "
                           "Default value is CPU", default="CPU", type=str)
    args.add_argument("--labels", help="Optional. Labels mapping file", default=DEF_LABEL_PATH, type=str)
    args.add_argument("-t", "--prob_threshold", help="Optional. Probability threshold for detections filtering",
                      default=0.5, type=float)
    args.add_argument("-iout", "--iou_threshold", help="Optional. Intersection over union threshold for overlapping "
                                                       "detections filtering", default=0.4, type=float)
    args.add_argument("-ni", "--number_iter", help="Optional. Number of inference iterations", default=1, type=int)
    args.add_argument("-pc", "--perf_counts", help="Optional. Report performance counters", default=False,
                      action="store_true")
    args.add_argument("-r", "--raw_output_message", help="Optional. Output inference results raw values showing",
                      default=False, action="store_true")                   # 2024/03/18
    args.add_argument("-x", "--debug_message", help="Optional. Output debug values showing",
                      default=False, action="store_true")
    args.add_argument("-show", help="Optional. Hide output view", default=True, action='store_false')
    return parser


class YoloParams:
    # ------------------------------------------- Extracting layer parameters ------------------------------------------
    # Magic numbers are copied from yolo samples
    def __init__(self,  side):
        self.num = 3 #if 'num' not in param else int(param['num'])
        self.coords = 4 #if 'coords' not in param else int(param['coords'])
        self.classes = 80 #if 'classes' not in param else int(param['classes'])
        self.side = side
        self.anchors = [10.0, 13.0, 16.0, 30.0, 33.0, 23.0, 30.0, 61.0, 62.0, 45.0, 59.0, 119.0, 116.0, 90.0, 156.0,
                        198.0,
                        373.0, 326.0] #if 'anchors' not in param else [float(a) for a in param['anchors'].split(',')]

    def log_params(self):
        params_to_print = {'classes': self.classes, 'num': self.num, 'coords': self.coords, 'anchors': self.anchors}
        [log.info("         {:8}: {}".format(param_name, param)) for param_name, param in params_to_print.items()]


def letterbox(img, size=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True):
    # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
    shape = img.shape[:2]  # current shape [height, width]
    w, h = size

    # Scale ratio (new / old)
    r = min(h / shape[0], w / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better test mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = w - new_unpad[0], h - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, 64), np.mod(dh, 64)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (w, h)
        ratio = w / shape[1], h / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border

    top2, bottom2, left2, right2 = 0, 0, 0, 0
    if img.shape[0] != h:
        top2 = (h - img.shape[0])//2
        bottom2 = top2
        img = cv2.copyMakeBorder(img, top2, bottom2, left2, right2, cv2.BORDER_CONSTANT, value=color)  # add border
    elif img.shape[1] != w:
        left2 = (w - img.shape[1])//2
        right2 = left2
        img = cv2.copyMakeBorder(img, top2, bottom2, left2, right2, cv2.BORDER_CONSTANT, value=color)  # add border
    return img


def scale_bbox(x, y, height, width, class_id, confidence, im_h, im_w, resized_im_h=640, resized_im_w=640):
    gain = min(resized_im_w / im_w, resized_im_h / im_h)  # gain  = old / new
    pad = (resized_im_w - im_w * gain) / 2, (resized_im_h - im_h * gain) / 2  # wh padding
    x = int((x - pad[0])/gain)
    y = int((y - pad[1])/gain)

    w = int(width/gain)
    h = int(height/gain)
 
    xmin = max(0, int(x - w / 2))
    ymin = max(0, int(y - h / 2))
    xmax = min(im_w, int(xmin + w))
    ymax = min(im_h, int(ymin + h))
    # Method item() used here to convert NumPy types to native types for compatibility with functions, which don't
    # support Numpy types (e.g., cv2.rectangle doesn't support int64 in color parameter)
    return dict(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax, class_id=class_id.item(), confidence=confidence.item())


def entry_index(side, coord, classes, location, entry):
    side_power_2 = side ** 2
    n = location // side_power_2
    loc = location % side_power_2
    return int(side_power_2 * (n * (coord + classes + 1) + entry) + loc)


def parse_yolo_region(blob, resized_image_shape, original_im_shape, params, threshold):
    # --- Validating output parameters ---
    out_blob_n, out_blob_c, out_blob_h, out_blob_w = blob.shape
    predictions = 1.0/(1.0+np.exp(np.zeros(blob.shape)-blob)) 
                   
    assert out_blob_w == out_blob_h, "Invalid size of output blob. It sould be in NCHW layout and height should " \
                                     "be equal to width. Current height = {}, current width = {}" \
                                     "".format(out_blob_h, out_blob_w)

    # --- Extracting layer parameters ---
    orig_im_h, orig_im_w = original_im_shape
    resized_image_h, resized_image_w = resized_image_shape
    objects = list()

    side_square = params.side * params.side

    # --- Parsing YOLO Region output ---
    bbox_size = int(out_blob_c/params.num) #4+1+num_classes

    for row, col, n in np.ndindex(params.side, params.side, params.num):
        bbox = predictions[0, n*bbox_size:(n+1)*bbox_size, row, col]
        
        x, y, width, height, object_probability = bbox[:5]
        class_probabilities = bbox[5:]
        if object_probability < threshold:
            continue
        x = (2*x - 0.5 + col)*(resized_image_w/out_blob_w)
        y = (2*y - 0.5 + row)*(resized_image_h/out_blob_h)
        if int(resized_image_w/out_blob_w) == 8 & int(resized_image_h/out_blob_h) == 8: #80x80, 
            idx = 0
        elif int(resized_image_w/out_blob_w) == 16 & int(resized_image_h/out_blob_h) == 16: #40x40
            idx = 1
        elif int(resized_image_w/out_blob_w) == 32 & int(resized_image_h/out_blob_h) == 32: # 20x20
            idx = 2

        width = (2*width)**2* params.anchors[idx * 6 + 2 * n]
        height = (2*height)**2 * params.anchors[idx * 6 + 2 * n + 1]
        class_id = np.argmax(class_probabilities)
        confidence = object_probability
        objects.append(scale_bbox(x=x, y=y, height=height, width=width, class_id=class_id, confidence=confidence,
                                  im_h=orig_im_h, im_w=orig_im_w, resized_im_h=resized_image_h, resized_im_w=resized_image_w))
    return objects


def intersection_over_union(box_1, box_2):
    width_of_overlap_area = min(box_1['xmax'], box_2['xmax']) - max(box_1['xmin'], box_2['xmin'])
    height_of_overlap_area = min(box_1['ymax'], box_2['ymax']) - max(box_1['ymin'], box_2['ymin'])
    if width_of_overlap_area < 0 or height_of_overlap_area < 0:
        area_of_overlap = 0
    else:
        area_of_overlap = width_of_overlap_area * height_of_overlap_area
    box_1_area = (box_1['ymax'] - box_1['ymin']) * (box_1['xmax'] - box_1['xmin'])
    box_2_area = (box_2['ymax'] - box_2['ymin']) * (box_2['xmax'] - box_2['xmin'])
    area_of_union = box_1_area + box_2_area - area_of_overlap
    if area_of_union == 0:
        return 0
    return area_of_overlap / area_of_union


def main():
    # 日本語フォント指定
    fontPIL = my_puttext.get_font()                             # 2024/03/18

    args = build_argparser().parse_args()


    # --- 1. Plugin initialization for specified device and load extensions library if specified ---
    print(' Creating OpenVINO Runtime Core...')
    core = Core()

    # --- 2. Reading the IR generated by the Model Optimizer (.xml and .bin files) ---
    model = args.model
    print(f" Reading the model: {model}")
    model = core.read_model(model)

    assert len(model.inputs) == 1, "Sample supports only single input topologies"

    # --- 4. Preparing inputs ---
    if args.debug_message:                                      # 2024/03/18
        log.info("Preparing inputs")

    # Read and pre-process input images
    n, c, h, w = model.inputs[0].shape

    if args.labels and os.path.isfile(args.labels):
        with open(args.labels, 'r', encoding="utf-8") as f:     # 2024/03/18
            labels_map = [x.strip() for x in f]
    else:
        labels_map = None
    print(f" Label file  : {args.labels}")                     # 2024/03/18

#    input_stream = 0 if args.input == "cam" else args.input
    if args.input.lower() == "cam" or args.input.lower() == "cam0":
        input_stream = 0
    elif args.input.lower() == "cam1":
        input_stream = 1
    else:
        input_stream = args.input

    print(f" Input source: {input_stream}")                     # 2024/03/18
    cap = cv2.VideoCapture(input_stream)
    number_input_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    number_input_frames = 1 if number_input_frames != -1 and number_input_frames < 0 else number_input_frames

    wait_key_code = 1                                           # 2024/03/18

    # Number of frames in picture is 1 and this will be read in cycle. Sync mode is default value for this case
    if number_input_frames != 1:
        ret, frame = cap.read()
    else:
        wait_key_code = 0                                       # 2024/03/18

    # --- 5. Loading model to the plugin ---
    if args.debug_message:                                      # 2024/03/18
        log.info("Loading model to the plugin")
    compiled_model = core.compile_model(model, device_name=args.device)

    render_time = 0
    parsing_time = 0

    # --- 6. Doing inference ---
    print(" Starting inference...")

    # 計測値初期化
    fpsWithTick = my_fps.fpsWithTick()
    fpsWithTick.get()                                            # fps計測開始

    # メインループ 
    while cap.isOpened():

        ret, frame = cap.read()
        if not ret:
            break
        in_frame = letterbox(frame, (w, h))

        in_frame0 = in_frame
        # resize input_frame to network size
        in_frame = in_frame.transpose((2, 0, 1))  # Change data layout from HWC to CHW
        in_frame = in_frame.reshape((n, c, h, w))

        # Start inference
        start_time = time()
        results = compiled_model.infer_new_request({0: in_frame})
        det_time = time() - start_time

        objects = list()
        for idx in range(len(results)):
            out_blob = results[idx]
            layer_params = YoloParams(side=out_blob.shape[2])

            # オブジェクト・チェック（DEBUG）                   # 2024/03/20
#            if args.debug_message:
#                object_check.chk_object(results, 'results')
#                object_check.chk_object(out_blob, 'out_blob')


            # ログを表示    2024/03/18
            if args.debug_message:                              # 2024/03/18
                layer_params.log_params()

            objects += parse_yolo_region(out_blob, in_frame.shape[2:],
                                            frame.shape[:-1], layer_params,
                                            args.prob_threshold)
            parsing_time = time() - start_time

        # Filtering overlapping boxes with respect to the --iou_threshold CLI parameter
        objects = sorted(objects, key=lambda obj : obj['confidence'], reverse=True)
        for i in range(len(objects)):
            if objects[i]['confidence'] == 0:
                continue
            for j in range(i + 1, len(objects)):
                if intersection_over_union(objects[i], objects[j]) > args.iou_threshold:
                    objects[j]['confidence'] = 0

        # Drawing objects with respect to the --prob_threshold CLI parameter
        objects = [obj for obj in objects if obj['confidence'] >= args.prob_threshold]

        if len(objects) and args.raw_output_message:
            log.info("\nDetected boxes for batch {}:".format(1))
            log.info(" Class ID \t| Confidence | XMIN | YMIN | XMAX | YMAX | COLOR ")

        origin_im_size = frame.shape[:-1]
        for obj in objects:
            # Validation bbox of detected object
            if obj['xmax'] > origin_im_size[1] or obj['ymax'] > origin_im_size[0] or obj['xmin'] < 0 or obj['ymin'] < 0:
                continue
#            color = (int(min(obj['class_id'] * 12.5, 255)),
#                     min(obj['class_id'] * 7, 255), min(obj['class_id'] * 5, 255))

            # オブジェクト別の色指定
            BOX_COLOR = my_color80.get_boder_bgr80(obj['class_id'])
            LABEL_BG_COLOR = my_color80.get_back_bgr80(obj['class_id'])

            det_label = labels_map[obj['class_id']] if labels_map and len(labels_map) >= obj['class_id'] else \
                str(obj['class_id'])

            if args.raw_output_message:
                log.info(
                    "{:^9} \t| {:10f} | {:4} | {:4} | {:4} | {:4} | {} ".format(det_label, obj['confidence'], obj['xmin'],
                                                                              obj['ymin'], obj['xmax'], obj['ymax'],
                                                                              BOX_COLOR))
            # ラベル描画領域を得る
            x0,y0,x1,y1 = my_puttext.cv2_putText(img = frame,
                   text = det_label + ' ' + str(round(obj['confidence'] * 100, 1)) + ' %',
                   org = (obj['xmin'], obj['ymin'] - 7), fontFace = fontPIL,
                   fontScale = 14,
                   color = TEXT_COLOR,
                   mode = 0,
                   areaf=True)
            xx = obj['xmax'] if obj['xmax'] > x1 else x1              # 横が領域を超える場合は超えた値にする
            cv2.rectangle(frame, (obj['xmin'], obj['ymin']-26), (xx, obj['ymin']), LABEL_BG_COLOR, -1)

            my_puttext.cv2_putText(img = frame,
                   text = det_label + ' ' + str(round(obj['confidence'] * 100, 1)) + ' %',
                   org = (obj['xmin'], obj['ymin'] - 7), fontFace = fontPIL,
                   fontScale = 14,
                   color = TEXT_COLOR,
                   mode = 0)

            # 画像に枠を描く
            cv2.rectangle(frame, (obj['xmin'], obj['ymin']), (obj['xmax'], obj['ymax']), BOX_COLOR, 2)

        # FPSを計算する
        fps = fpsWithTick.get()

        # Draw performance stats over frame
        inf_time_message = "Inference time: {:.3f} ms".format(det_time * 1e3)
        render_time_message = "OpenCV rendering time: {:.3f} ms".format(render_time * 1e3)
        parsing_message = "YOLO parsing time is {:.3f} ms".format(parsing_time * 1e3)

        # 文字の影
        cv2.putText(frame, inf_time_message, (15+1, 15+1), cv2.FONT_HERSHEY_COMPLEX, 0.5, (255, 255, 255), 1)
        cv2.putText(frame, render_time_message, (15+1, 45+1), cv2.FONT_HERSHEY_COMPLEX, 0.5, (255, 255, 255), 1)
        cv2.putText(frame, parsing_message, (15+1, 30+1), cv2.FONT_HERSHEY_COMPLEX, 0.5, (255, 255, 255), 1)
        # 文字描画
        cv2.putText(frame, inf_time_message, (15, 15), cv2.FONT_HERSHEY_COMPLEX, 0.5, (200, 10, 10), 1)
        cv2.putText(frame, render_time_message, (15, 45), cv2.FONT_HERSHEY_COMPLEX, 0.5, (10, 10, 200), 1)
        cv2.putText(frame, parsing_message, (15, 30), cv2.FONT_HERSHEY_COMPLEX, 0.5, (10, 10, 200), 1)

        start_time = time()
        if args.show:
            window_name = "DetectionResults (hit 'q' or 'esc' key to exit)"
            cv2.namedWindow(window_name, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) 
            cv2.imshow(window_name, frame)
        render_time = time() - start_time
        cv2.imwrite("results.jpg", frame)

        if args.show:
            key = cv2.waitKey(wait_key_code)        # 2024/03/18

            # ESC key
            if key == 27 or key == 113:             # 'esc' or 'q'
                break

    cv2.destroyAllWindows()

    print('\n FPS average: {:>10.2f}'.format(fpsWithTick.get_average()))
    print('\n Finished.')


if __name__ == '__main__':
    sys.exit(main() or 0)

↑

YOLO V5 学習済みモデルバージョンによる違い †

　「yolov5_demo_sync_ov2023x.py」がエラーとなる原因を調べる

オブジェクトの属性エラーのようなのでチェックプログラムを作成
　（プロジェクト・パッケージ「update_20240405.zip」に同梱）

▼　「object_check.py」

# -*- coding: utf-8 -*-
##------------------------------------------
##   My Library Object Check  Ver 0.01
##
##               2024.03.15 Masahiro Izutsu
##------------------------------------------
## object_check.py.py

# オブジェクトの表示
def chk_obj(obj):
    print(f'\n■■ obj  ■■\n{obj}')

# オブジェクトの型
def chk_type(obj):
    print(f'\n■■ type ■■\n{type(obj)}')

# 次元の確認
def chk_shape(obj):
    print(f'\n■■ shape ■■')
    try:
        print(f'\n{obj.shape}')
    except AttributeError as e:
        print(e)

# サイズの確認
def chk_size(obj):
    print(f'\n■■ size ■■')
    try:
        print(f'\n{obj.size}')
    except AttributeError as e:
        print(e)

# 辞書のキーを出力
def chk_keys(obj):
    print(f'\n■■ keys ■■')
    try:
        print(f'\n{obj.keys}')
    except AttributeError as e:
        print(e)

# オブジェクトの全ての属性(メソッドやインスタンス変数)
def chk_dir(obj):
    print(f'\n■■ dir ■■\n{dir(obj)}')

# 属性と中に入ってる変数を出力
def chk_vars(obj):
    print(f'\n■■ vars ■■')
    try:
        print(f'\n{obj.vars}')
    except AttributeError as e:
        print(e)

# オブジェクトのチェック
def chk_object(obj, obj_str):
    print(f'↓↓↓↓↓↓↓↓↓↓ 「{obj_str} 」CHECK START... ↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓↓')
    chk_obj(obj)
    chk_type(obj)
    chk_shape(obj)
    chk_size(obj)
    chk_keys(obj)
    chk_dir(obj)
    chk_vars(obj)
    print(f'↑↑↑↑↑↑↑↑↑↑ 「{obj_str} 」CHECK START... ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑')

if __name__ == '__main__':
    class Sample:
        def __init__(self, value):
            self.value = value

        def show_value(self):
            print(f'Value: {self.value}')

    sample_object = Sample(3)

    chk_object(sample_object, 'sample_object')

「yolov5_demo_sync_ov2023x.py」のエラー箇所の前に挿入

    :
# インポート処理

import object_check                                             # 2024/03/20
    :

    :
        objects = list()
        for idx in range(len(results)):
            out_blob = results[idx]
            layer_params = YoloParams(side=out_blob.shape[2])

            # オブジェクト・チェック（DEBUG）                   # 2024/03/20
            if args.debug_message:
                object_check.chk_object(results, 'results')
                object_check.chk_object(out_blob, 'out_blob')
    :

・学習済みモデル(V3)「yolov5s_v3.xml」の推論結果で得られるオブジェクト

(py_learn) python yolov5_demo_sync_ov2023x.py -i ../../Images/cat.jpg  -x

　 'result object'

{<ConstOutput: names[668, Conv_487] shape[1,255,20,20] type: f32>: array([[[[ ..., ]]]], dtype=float32),
 <ConstOutput: names[648, Conv_471] shape[1,255,40,40] type: f32>: array([[[[ ..., ]]]], dtype=float32),
 <ConstOutput: names[628, Conv_455] shape[1,255,80,80] type: f32>: array([[[[ ..., ]]]], dtype=float32)}

・学習済みモデル(V7)「yolov5s.xml」の推論結果で得られるオブジェクト

(py_learn) python yolov5_demo_sync_ov2023x.py -i ../../Images/cat.jpg -m ../yolov5/yolov5s.xml -x

　 'result' object

{<ConstOutput: names[output0] shape[1,25200,85] type: f32>: array([[[ ..., ]]], dtype=float32)}

「Netron (Web版)」で視覚化して確認「yolov5s_v3.onnx」/「yolov5s.onnx」

↑

「export.py」で得られた ONNXファイルを OpenVINO™ IR に変換 †

　　参考サイト：→ Object Detection & YOLOs

ONNXファイルからモデルオプティマイザーを使用してIRファイルに変換できる
・モデルオプティマイザーを使用して YOLOv5 モデルを変換するときに、IR の出力ノードを指定する必要がある
・YOLOv5 には 3 つの出力ノードがある
「Netron (Web版)」で YOLOv5 ONNX の重みを視覚化する
・Netronでキーワード「Transpose」を検索して出力ノードを見つける
・前図赤印 ① の畳み込みノードダブルクリックし、右のプロパティパネルで、名前「/model.24/m.0/Conv」を読み取る~ことができる
・同様に ②「/model.24/m.1/Conv」, ③「/model.24/m.2/Conv」を得る
・モデルオプティマイザーの出力ノードパラメーターとして「/model.24/m.0/Conv」「/model.24/m.1/Conv」「/model.24/m.2/Conv」を使用する

モデルオプティマイザーを使用してコンバートする
・「workspace_pylearn/yolov5/」ディレクトリで実行する
・「yolov5s.onnx」→「yolov5s_v7.xml」「yolov5s_v7..bin」

(py_learn) mo --input_model yolov5s.onnx --model_name yolov5s_v7 -s 255 --reverse_input_channels --output '/model.24/m.0/Conv','/model.24/m.1/Conv','/model.24/m.2/Conv'

・実行結果

(py_learn) mo --input_model yolov5s.onnx --model_name yolov5s_v7 -s 255 --reverse_input_channels --output '/model.24/m.0/Conv','/model.24/m.1/Conv','/model.24/m.2/Conv'
[ INFO ] Generated IR will be compressed to FP16. If you get lower accuracy, please consider disabling compression explicitly by adding argument --compress_to_fp16=False.
Find more information about compression to FP16 at https://docs.openvino.ai/2023.0/openvino_docs_MO_DG_FP16_Compression.html
[ INFO ] MO command line tool is considered as the legacy conversion API as of OpenVINO 2023.2 release. Please use OpenVINO Model Converter (OVC). OVC represents a lightweight alternative of MO and provides simplified model conversion API.
Find more information about transition from MO to OVC at https://docs.openvino.ai/2023.2/openvino_docs_OV_Converter_UG_prepare_model_convert_model_MO_OVC_transition.html
[ SUCCESS ] Generated IR version 11 model.
[ SUCCESS ] XML file: C:\anaconda_win\workspace_pylearn\yolov5\yolov5s_v7.xml
[ SUCCESS ] BIN file: C:\anaconda_win\workspace_pylearn\yolov5\yolov5s_v7.bin

前項で作成した「yolov5_demo_sync_ov2023x.py」を実行
・「workspace_pylearn/yolov5_demo/」ディレクトリで実行する

(py_learn) python yolov5_demo_sync_ov2023x.py -i ../../Images/cat.jpg -m ../yolov5/yolov5s_v7.xml

・実行結果

(py_learn) python yolov5_demo_sync_ov2023x.py -i ../../Images/cat.jpg -m ../yolov5/yolov5s_v7.xml

--- YOLO V5 OpenVINO(API 2.0) demoprogram Ver 0.01 ---
OpenCV: 4.9.0
OpenVINO inference_engine: 2024.0.0-14509-34caeefd078-releases/2024/0

 Creating OpenVINO Runtime Core...
 Reading the model: ../yolov5/yolov5s_v7.xml
 Label file  : coco.names_jp
 Input source: ../../Images/cat.jpg
 Starting inference...

 FPS average:      11.90

 Finished.

↑

OpenVINO™ API 2.0 対応プログラムを作成 †

プログラム概要
・サイト YOLOv5_OpenVINO_demo のサンプルプログラムを参考に、物体認識プログラムを作成する
　（修正済みプロジェクト・パッケージ「update_20240405.zip」に同梱）
・入力ソースとして、単一の静止画/動画ファイル指定・カメラ（0/1）、が選べるようにする
・OpenVINO™ API 2.0 に準拠する

プロジェクトの実行ディレクトリ

　Windows の場合　

(py_learn) PS > cd /anaconda_win/workspace_pylearn/yolov5

　Linux の場合　

(py_learn) $ cd ~/workspace_pylearn/yolov5

実行手順
・コマンドラインから起動する

(py_learn) python yolov5_OV2.py

・コマンドライン引数

コマンドオプション	デフォールト設定	意味
-h, --help	-	ヘルプ表示
-i, --input	cam	カメラ(cam/cam0～cam9)または動画・静止画像ファイル ※
-m, --model	yolov5s_v7.xml	学習済みモデル(IR)
-d, --device	CPU	デバイス指定 (CPU/GPU/MYRIAD)
-l, --label	coco.names_jp	ラベル・ファイル
-t, --prob_threshold	0.5	クラス判定の閾値 (数値が小さい程オブジェクトは増えるが、ノイズも増える
-iout, --iou_threshold	0.4	Intersection Over Union(検出領域が重なっている割合、数値が大きいほど重なり度合いが高い)
-t, --title	y	タイトル表示 (y/n)
-s, --speed	y	スピード計測表示 (y/n)
-o, --out	non	処理結果を出力する場合のファイル名

　※ 入力ソースの指定
　　　・cam　　　　：カメラ入力
　　　・ファイルパス：動画ファイル(.mp4) / 静止画ファイル(.jpg, .png, .bmp, ....)

(py_learn) python yolov5_OV2.py -h
usage: yolov5_OV2.py [-h] [-i INPUT] [-m MODEL] [-d DEVICE] [--labels LABELS] [-t PROB_THRESHOLD]
                     [-iout IOU_THRESHOLD] [--titlef TITLE] [--speedf SPEED] [-o IMAGE_OUT]

Options:
  -h, --help            Show this help message and exit.
  -i INPUT, --input INPUT
                        Required. Path to an image/video file. (Specify 'cam','cam0','cam1')
  -m MODEL, --model MODEL
                        Required. Path to an .xml file with a trained model.
  -d DEVICE, --device DEVICE
                        Optional. Specify the target device to infer on; CPU, GPU, FPGA, HDDL or MYRIAD is acceptable.
                        The sample will look for a suitable plugin for device specified. Default value is CPU
  --labels LABELS       Optional. Labels mapping file
  -t PROB_THRESHOLD, --prob_threshold PROB_THRESHOLD
                        Optional. Probability threshold for detections filtering
  -iout IOU_THRESHOLD, --iou_threshold IOU_THRESHOLD
                        Optional. Intersection over union threshold for overlapping detections filtering
  --titlef TITLE        Program title flag.(y/n) Default value is 'y'
  --speedf SPEED        Speed display flag.(y/n) Default calue is 'y'
  -o IMAGE_OUT, --out IMAGE_OUT
                        Output image file path. Default value is 'non'

実行例
・入力パラメータなしの場合

(py_learn) python yolov5_OV2.py

・実行結果

(py_learn) python yolov5_OV2.py

YOLO V5 in OpenVINO(API 2.0)  Ver 0.01: Starting application...
   OpenVINO inference_engine: 2024.0.0-14509-34caeefd078-releases/2024/0
   OpenCV virsion : 4.9.0

   - Input source   :  cam
   - Pretrained     :  yolov5s_v7.xml
   - Label file     :  coco.names_jp
   - Use device     :  CPU
   - prob threshold :  0.5
   - iou  threshold :  0.4

   - Output path    :  non
   - Program Title  :  y
   - Speed flag     :  y

 FPS average:       5.80

 Finished.

・カメラ・デバイス１を使う場合

(py_learn) python yolov5_OV2.py -i cam1

・動画ファイル指定の場合

(py_learn) python yolov5_OV2.py -i ../../Videos/car1_m.mp4

(py_learn) python yolov5_OV2.py -i ../../Videos/car2_m.mp4

・静止画ファイル指定の場合

(py_learn) python yolov5_OV2.py -i ../../Images/desk-image.jpg

(py_learn) python yolov5_OV2.py -i ../../Images/car-person.jpg

(py_learn) python yolov5_OV2.py -i ../../Images/bus.jpg

(py_learn) python yolov5_OV2.py -i ../../Images/zidane.jpg

ソースコード
・ファイルの場所 /workspace_pylearn/yolov5

▼「yolov5_OV2.py」

# -*- coding: utf-8 -*-
##------------------------------------------
## YOLO V5 in OpenVINO(API 2.0)  Ver 0.03
##
##               2024.03.31 Masahiro Izutsu
##------------------------------------------
## yolov5_OV2.py  (original: yolov5_demo_sync_ov2023.py)
##  Ver. 0.02   2024/04/09  classID=119 まで対応
##  Ver. 0.03   2024/04/15  カメラ入力(cam0-cam9)

# インポート処理
from __future__ import print_function, division

import logging
import os
import sys
from argparse import ArgumentParser, SUPPRESS
from math import exp as exp
from time import time
import numpy as np

import cv2
from openvino.preprocess import PrePostProcessor, ResizeAlgorithm
from openvino.runtime import Core, Layout, Type
import openvino.runtime as ov

import my_puttext
import my_color80
import my_fps

# Color Escape Code
GREEN = '\033[1;32m'
RED = '\033[1;31m'
NOCOLOR = '\033[0m'
YELLOW = '\033[1;33m'
CYAN = '\033[1;36m'

# 定数定義
TEXT_COLOR = my_color80.CR_white
DEF_MODEL_PATH = os.path.expanduser('yolov5s_v7.xml')
DEF_LABEL_PATH = os.path.expanduser('coco.names_jp')
DEF_INPUT_PATH = os.path.expanduser('cam')

# タイトル・バージョン情報
title = 'YOLO V5 in OpenVINO(API 2.0)  Ver 0.03'

#logging.basicConfig(format="[ %(levelname)s ] %(message)s", level=logging.INFO, stream=sys.stdout)
#log = logging.getLogger()

def build_argparser():
    parser = ArgumentParser(add_help=False)
    args = parser.add_argument_group('Options')
    args.add_argument('-h', '--help', action = 'help', default = SUPPRESS, help = 'Show this help message and exit.')
    args.add_argument("-i", "--input", default = DEF_INPUT_PATH, type=str,
                        help="Required. Path to an image/video file. (Specify 'cam','cam0','cam1')")
    args.add_argument("-m", "--model", help = "Required. Path to an .xml file with a trained model.",
                        default=DEF_MODEL_PATH, type=str)
    args.add_argument("-d", "--device", default = "CPU", type = str,
                        help="Optional. Specify the target device to infer on; CPU, GPU, FPGA, HDDL or MYRIAD is"
                           " acceptable. The sample will look for a suitable plugin for device specified. "
                           "Default value is CPU")
    args.add_argument("--labels", help = "Optional. Labels mapping file", default = DEF_LABEL_PATH, type = str)
    args.add_argument("-t", "--prob_threshold", default = 0.5, type = float,
                        help = "Optional. Probability threshold for detections filtering")
    args.add_argument("-iout", "--iou_threshold", default=0.4, type=float,
                        help="Optional. Intersection over union threshold for overlapping detections filtering")
    args.add_argument('--titlef', metavar = 'TITLE', default = 'y',
                        help = 'Program title flag.(y/n) Default value is \'y\'')
    args.add_argument('--speedf', metavar = 'SPEED', default = 'y',
                        help = 'Speed display flag.(y/n) Default calue is \'y\'')
    args.add_argument('-o', '--out', metavar = 'IMAGE_OUT', default = 'non',
                        help = 'Output image file path. Default value is \'non\'')
    return parser

# 基本情報の表示
def display_info(args):
    print('\n' + GREEN + title + ': Starting application...' + NOCOLOR)
    print("   OpenVINO inference_engine:", ov.get_version())
    print('   OpenCV virsion :',cv2.__version__)
    print('\n   - ' + YELLOW + 'Input source   : ' + NOCOLOR, args.input)
    print('   - ' + YELLOW + 'Pretrained     : ' + NOCOLOR, args.model)
    print('   - ' + YELLOW + 'Label file     : ' + NOCOLOR, args.labels)
    print('   - ' + YELLOW + 'Use device     : ' + NOCOLOR, args.device)
    print('   - ' + YELLOW + 'prob threshold : ' + NOCOLOR, args.prob_threshold)
    print('   - ' + YELLOW + 'iou  threshold : ' + NOCOLOR, args.iou_threshold, '\n')

    print('   - ' + YELLOW + 'Output path    : ' + NOCOLOR, args.out)
    print('   - ' + YELLOW + 'Program Title  : ' + NOCOLOR, args.titlef)
    print('   - ' + YELLOW + 'Speed flag     : ' + NOCOLOR, args.speedf)


class YoloParams:
    # ------------------------------------------- Extracting layer parameters ------------------------------------------
    # Magic numbers are copied from yolo samples
    def __init__(self,  side):
        self.num = 3 #if 'num' not in param else int(param['num'])
        self.coords = 4 #if 'coords' not in param else int(param['coords'])
        self.classes = 80 #if 'classes' not in param else int(param['classes'])
        self.side = side
        self.anchors = [10.0, 13.0, 16.0, 30.0, 33.0, 23.0, 30.0, 61.0, 62.0, 45.0, 59.0, 119.0, 116.0, 90.0, 156.0,
                        198.0,
                        373.0, 326.0] #if 'anchors' not in param else [float(a) for a in param['anchors'].split(',')]

    def log_params(self):
        params_to_print = {'classes': self.classes, 'num': self.num, 'coords': self.coords, 'anchors': self.anchors}
        [log.info("         {:8}: {}".format(param_name, param)) for param_name, param in params_to_print.items()]


def letterbox(img, size=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True):
    # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
    shape = img.shape[:2]  # current shape [height, width]
    w, h = size

    # Scale ratio (new / old)
    r = min(h / shape[0], w / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better test mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = w - new_unpad[0], h - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, 64), np.mod(dh, 64)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (w, h)
        ratio = w / shape[1], h / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border

    top2, bottom2, left2, right2 = 0, 0, 0, 0
    if img.shape[0] != h:
        top2 = (h - img.shape[0])//2
        bottom2 = top2
        img = cv2.copyMakeBorder(img, top2, bottom2, left2, right2, cv2.BORDER_CONSTANT, value=color)  # add border
    elif img.shape[1] != w:
        left2 = (w - img.shape[1])//2
        right2 = left2
        img = cv2.copyMakeBorder(img, top2, bottom2, left2, right2, cv2.BORDER_CONSTANT, value=color)  # add border
    return img


def scale_bbox(x, y, height, width, class_id, confidence, im_h, im_w, resized_im_h=640, resized_im_w=640):
    gain = min(resized_im_w / im_w, resized_im_h / im_h)  # gain  = old / new
    pad = (resized_im_w - im_w * gain) / 2, (resized_im_h - im_h * gain) / 2  # wh padding
    x = int((x - pad[0])/gain)
    y = int((y - pad[1])/gain)

    w = int(width/gain)
    h = int(height/gain)

    xmin = max(0, int(x - w / 2))
    ymin = max(0, int(y - h / 2))
    xmax = min(im_w, int(xmin + w))
    ymax = min(im_h, int(ymin + h))
    # Method item() used here to convert NumPy types to native types for compatibility with functions, which don't
    # support Numpy types (e.g., cv2.rectangle doesn't support int64 in color parameter)
    return dict(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax, class_id=class_id.item(), confidence=confidence.item())


def entry_index(side, coord, classes, location, entry):
    side_power_2 = side ** 2
    n = location // side_power_2
    loc = location % side_power_2
    return int(side_power_2 * (n * (coord + classes + 1) + entry) + loc)


def parse_yolo_region(blob, resized_image_shape, original_im_shape, params, threshold):
    # --- Validating output parameters ---
    out_blob_n, out_blob_c, out_blob_h, out_blob_w = blob.shape
    predictions = 1.0/(1.0+np.exp(np.zeros(blob.shape)-blob)) 

    assert out_blob_w == out_blob_h, "Invalid size of output blob. It sould be in NCHW layout and height should " \
                                     "be equal to width. Current height = {}, current width = {}" \
                                     "".format(out_blob_h, out_blob_w)

    # --- Extracting layer parameters ---
    orig_im_h, orig_im_w = original_im_shape
    resized_image_h, resized_image_w = resized_image_shape
    objects = list()

    side_square = params.side * params.side

    # --- Parsing YOLO Region output ---
    bbox_size = int(out_blob_c/params.num) #4+1+num_classes

    for row, col, n in np.ndindex(params.side, params.side, params.num):
        bbox = predictions[0, n*bbox_size:(n+1)*bbox_size, row, col]
        
        x, y, width, height, object_probability = bbox[:5]
        class_probabilities = bbox[5:]
        if object_probability < threshold:
            continue
        x = (2*x - 0.5 + col)*(resized_image_w/out_blob_w)
        y = (2*y - 0.5 + row)*(resized_image_h/out_blob_h)
        if int(resized_image_w/out_blob_w) == 8 & int(resized_image_h/out_blob_h) == 8: #80x80, 
            idx = 0
        elif int(resized_image_w/out_blob_w) == 16 & int(resized_image_h/out_blob_h) == 16: #40x40
            idx = 1
        elif int(resized_image_w/out_blob_w) == 32 & int(resized_image_h/out_blob_h) == 32: # 20x20
            idx = 2

        width = (2*width)**2* params.anchors[idx * 6 + 2 * n]
        height = (2*height)**2 * params.anchors[idx * 6 + 2 * n + 1]
        class_id = np.argmax(class_probabilities)
        confidence = object_probability
        objects.append(scale_bbox(x=x, y=y, height=height, width=width, class_id=class_id, confidence=confidence,
                                  im_h=orig_im_h, im_w=orig_im_w, resized_im_h=resized_image_h, resized_im_w=resized_image_w))
    return objects


def intersection_over_union(box_1, box_2):
    width_of_overlap_area = min(box_1['xmax'], box_2['xmax']) - max(box_1['xmin'], box_2['xmin'])
    height_of_overlap_area = min(box_1['ymax'], box_2['ymax']) - max(box_1['ymin'], box_2['ymin'])
    if width_of_overlap_area < 0 or height_of_overlap_area < 0:
        area_of_overlap = 0
    else:
        area_of_overlap = width_of_overlap_area * height_of_overlap_area
    box_1_area = (box_1['ymax'] - box_1['ymin']) * (box_1['xmax'] - box_1['xmin'])
    box_2_area = (box_2['ymax'] - box_2['ymin']) * (box_2['xmax'] - box_2['xmin'])
    area_of_union = box_1_area + box_2_area - area_of_overlap
    if area_of_union == 0:
        return 0
    return area_of_overlap / area_of_union

# 画像の種類を判別する
#   戻り値: 'jeg''png'... 画像ファイル
#           'None'        画像ファイル以外 (動画ファイル)
#           'NotFound'    ファイルが存在しない
def is_pict(filename):
    if not os.path.isfile(filename):
        return 'NotFound'

    types = ['.bmp','.png','.jpg','.jpeg','.JPG','.tif']
    for ss in types:
        if filename.endswith(ss):
            return ss
    return 'None'

def main():
    # 日本語フォント指定
    fontPIL = my_puttext.get_font()

    # 入力パラメータ
    args = build_argparser().parse_args()
    display_info(args)
    outpath = args.out

    # --- 1. Plugin initialization for specified device and load extensions library if specified ---
    core = Core()

    # --- 2. Reading the IR generated by the Model Optimizer (.xml and .bin files) ---
    model = args.model
    model = core.read_model(model)

    assert len(model.inputs) == 1, "Sample supports only single input topologies"

    # --- 4. Preparing inputs ---

    # Read and pre-process input images
    n, c, h, w = model.inputs[0].shape

    # 判定ラベル
    if args.labels and os.path.isfile(args.labels):
        with open(args.labels, 'r', encoding="utf-8") as f:     # 2024/03/18
            labels_map = [x.strip() for x in f]
    else:
        labels_map = None

    # 入力 cam/cam0-cam9 対応                               # 2024/04/15
    input_stream = args.input
    if input_stream.find('cam') == 0 and len(input_stream) < 5:
        input_stream = 0 if input_stream == 'cam' else int(input_stream[3])
        isstream = True
    else:
        filetype = is_pict(input_stream)
        isstream = filetype == 'None'
        if (filetype == 'NotFound'):
            print(RED + "\ninput file Not found." + NOCOLOR)
            quit()

    # 入力準備
    cap = cv2.VideoCapture(input_stream)
    number_input_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    number_input_frames = 1 if number_input_frames != -1 and number_input_frames < 0 else number_input_frames

    wait_key_code = 1

    # Number of frames in picture is 1 and this will be read in cycle. Sync mode is default value for this case
    if number_input_frames != 1:
        ret, frame = cap.read()
    else:
        wait_key_code = 0

    # --- 5. Loading model to the plugin ---
    compiled_model = core.compile_model(model, device_name=args.device)

    parsing_time = 0

    # --- 6. Doing inference ---

    # 処理結果の記録 step1
    if (outpath != 'non'):
        if (isstream):
            fps = int(cap.get(cv2.CAP_PROP_FPS))
            out_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
            out_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
            fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
            outvideo = cv2.VideoWriter(outpath, fourcc, fps, (out_w, out_h))

    # 計測値初期化
    fpsWithTick = my_fps.fpsWithTick()
    fpsWithTick.get()                                            # fps計測開始

    # メインループ 
    while cap.isOpened():

        ret, frame = cap.read()
        if not ret:
            break
        in_frame = letterbox(frame, (w, h))

        in_frame0 = in_frame
        # resize input_frame to network size
        in_frame = in_frame.transpose((2, 0, 1))  # Change data layout from HWC to CHW
        in_frame = in_frame.reshape((n, c, h, w))

        # Start inference
        start_time = time()
        results = compiled_model.infer_new_request({0: in_frame})
        det_time = time() - start_time

        objects = list()
        for idx in range(len(results)):
            out_blob = results[idx]
            layer_params = YoloParams(side=out_blob.shape[2])

            objects += parse_yolo_region(out_blob, in_frame.shape[2:],
                                            frame.shape[:-1], layer_params,
                                            args.prob_threshold)
            parsing_time = time() - start_time

        # Filtering overlapping boxes with respect to the --iou_threshold CLI parameter
        objects = sorted(objects, key=lambda obj : obj['confidence'], reverse=True)
        for i in range(len(objects)):
            if objects[i]['confidence'] == 0:
                continue
            for j in range(i + 1, len(objects)):
                if intersection_over_union(objects[i], objects[j]) > args.iou_threshold:
                    objects[j]['confidence'] = 0

        # Drawing objects with respect to the --prob_threshold CLI parameter
        objects = [obj for obj in objects if obj['confidence'] >= args.prob_threshold]

        origin_im_size = frame.shape[:-1]
        for obj in objects:
            # Validation bbox of detected object
            if obj['xmax'] > origin_im_size[1] or obj['ymax'] > origin_im_size[0] or obj['xmin'] < 0 or obj['ymin'] < 0:
                continue

            # オブジェクト別の色指定
            color_id = obj['class_id'] if obj['class_id'] < 80 else obj['class_id'] - 40    # 2024/04/09
            BOX_COLOR = my_color80.get_boder_bgr80(color_id)
            LABEL_BG_COLOR = my_color80.get_back_bgr80(color_id)

            det_label = labels_map[obj['class_id']] if labels_map and len(labels_map) >= obj['class_id'] else \
                str(obj['class_id'])

            # ラベル描画領域を得る
            x0,y0,x1,y1 = my_puttext.cv2_putText(img = frame,
                   text = det_label + ' ' + str(round(obj['confidence'] * 100, 1)) + ' %',
                   org = (obj['xmin'], obj['ymin'] - 7), fontFace = fontPIL,
                   fontScale = 14,
                   color = TEXT_COLOR,
                   mode = 0,
                   areaf=True)
            xx = obj['xmax'] if obj['xmax'] > x1 else x1              # 横が領域を超える場合は超えた値にする
            cv2.rectangle(frame, (obj['xmin'], obj['ymin']-26), (xx, obj['ymin']), LABEL_BG_COLOR, -1)

            my_puttext.cv2_putText(img = frame,
                   text = det_label + ' ' + str(round(obj['confidence'] * 100, 1)) + ' %',
                   org = (obj['xmin'], obj['ymin'] - 7), fontFace = fontPIL,
                   fontScale = 14,
                   color = TEXT_COLOR,
                   mode = 0)

            # 画像に枠を描く
            cv2.rectangle(frame, (obj['xmin'], obj['ymin']), (obj['xmax'], obj['ymax']), BOX_COLOR, 2)

        # FPSを計算する
        fps = fpsWithTick.get()
        st_fps = 'fps: {:>6.2f}'.format(fps)
        if (args.speedf == 'y'):
            cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1)
            cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA)

        # タイトル描画
        if (args.titlef == 'y'):
            cv2.putText(frame, title, (12, 32), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(0, 0, 0), lineType=cv2.LINE_AA)
            cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA)

        # 画像表示 
        window_name = title + "  (hit 'q' or 'esc' key to exit)"
        cv2.namedWindow(window_name, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) 
        cv2.imshow(window_name, frame)

        # 処理結果の記録 step2
        if (outpath != 'non'):
            if (isstream):
                outvideo.write(frame)
            else:
                cv2.imwrite(outpath, frame)

        # 何らかのキーが押されたら終了 
        key = cv2.waitKey(wait_key_code)
        if key == 27 or key == 113:             # 'esc' or 'q'
            break

        # ウインドウのクローズボタン
        if cv2.getWindowProperty(window_name, cv2.WND_PROP_VISIBLE) < 1:        
            print('\n Window close !!')
            break

    cv2.destroyAllWindows()

    print('\n FPS average: {:>10.2f}'.format(fpsWithTick.get_average()))
    print('\n Finished.\n')

if __name__ == '__main__':
    sys.exit(main() or 0)

・ファイルの場所 /workspace_py37/mylib
□ Python 私的汎用ライブラリ

↑

OpenVINO™ API 2.0 対応プログラム実行速度 †

実行プログラム「python yolov5_OV2.py」　(単位：fps)　学習済みモデル「yolov5s_v7.xml」

マシン・OS	car_m.mp4		car1_mp4		car2.mp4

	GPU	CPU	GPU	CPU	GPU	CPU
HP ENVY windows11	11.5	11.9	12.7	12.4	14.2	12.9
HP ENVY Ubuntu22.04LTS	10.6	10.3	11.3	10.5	12.4	11.0
DELL XPS windows11	10.7	7.4	12.0	7.9	12.7	8.0
DELL Latitude Ubuntu20.04LTS	9.7	6.5	10.7	6.8	11.2	7.1
HP ELITE windows10	6.6	4.5	7.2	4.7	7.7	5.0

・テストコマンド

(py_learn) python yolov5_OV2.py -i ../../Videos/car_m.mp4
(py_learn) python yolov5_OV2.py -i ../../Videos/car_m.mp4 -d GPU
(py_learn) python yolov5_OV2.py -i ../../Videos/car1_m.mp4
(py_learn) python yolov5_OV2.py -i ../../Videos/car1_m.mp4 -d GPU
(py_learn) python yolov5_OV2.py -i ../../Videos/car2_m.mp4
(py_learn) python yolov5_OV2.py -i ../../Videos/car2_m.mp4 -d GPU

テスト環境（Intel® CPU / GPU）

機種	OS	CPU	GPU
HP ENVY Desktop TE02-1097jp	Windows11/Ubuntu22.04LTS	13th Gen Core™ i9-13900	UHD Graphics 770
DELL XPS Plus 9320 NoteBook	Windows11	12th Gen Core™ i7-1260P	Iris® Xe Graphics
DELL Latitude 7520 NoteBook	Ubuntu20.04LTS	11th Gen Core™ i7-1185G7	Iris® Xe Graphics
HP EliteDesk 800 G2 SFF	Windows10	6 th Gen Core™ i7-6700	HD Graphics 530

↑

対処したエラー詳細 †

↑

UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. †

エラー内容

(py_test) python detect2_yolov5.py -i ../../Videos/car_m.mp4 -m yolov5x
    :
Using cache found in /home/USER/.cache/torch/hub/ultralytics_yolov5_master
YOLOv5 噫 2021-9-16 torch 2.2.1+cpu CPU

Fusing layers... 
/home/USER/anaconda3/envs/py_learn/lib/python3.11/site-packages/torch/functional.py:507: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3549.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Model Summary: 444 layers, 86705005 parameters, 0 gradients
Adding AutoShape... 
Traceback (most recent call last):
    :

対処方法
1. エラーメッセージからキャッシュデータのディレクトリを調べ削除
```
/home/USER/.cache/torch/hub/ultralytics_yolov5_master
```
2. 再度実行する

コメント
以前に実行したキャッシュが残っていて、つじつまが合わなくなることがあるらしい

↑

ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found †

　【Ubuntu20.04LTSで発生】

エラー内容

(py_test) python detect2.py --source 0
Traceback (most recent call last):
  File "/home/mizutu/workspace_pylearn/yolov5/detect2.py", line 58, in <module>
    :
ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/mizutu/anaconda3/envs/py_test/lib/python3.11/site-packages/cv2/python-3.11/cv2.cpython-311-x86_64-linux-gnu.so)

対処方法
1.「libstdc++.so.6」を調べる

$ls -l ~/anaconda3/envs/py_test/lib/
    :
lrwxrwxrwx   1 mizutu mizutu        19  3月 16 17:01 libstdc++.so.6 -> libstdc++.so.6.0.29
-rwxrwxr-x   4 mizutu mizutu  17981480  6月  1  2022 libstdc++.so.6.0.29
    :

$ ls -l /lib/x86_64-linux-gnu/
    :
lrwxrwxrwx  1 root root       19  7月  9  2023 libstdc++.so.6 -> libstdc++.so.6.0.28
-rw-r--r--  1 root root  1956992  7月  9  2023 libstdc++.so.6.0.28
    :

2.「libstdc++.so.6.0.29」をシステム側にコピーしリンクを再作成

$sudo cp /home/mizutu/anaconda3/envs/py_test/lib/libstdc++.so.6.0.29 /lib/x86_64-linux-gnu
$cd /lib/x86_64-linux-gnu
$sudo ln -sb libstdc++.so.6.0.29 libstdc++.so.6
$sudo chmod 644 libstdc++.so.6.0.29

3. ファイルの確認

$ ls -l /lib/x86_64-linux-gnu/
    :
lrwxrwxrwx  1 root root       19  3月 17 05:37 libstdc++.so.6 -> libstdc++.so.6.0.29
-rw-r--r--  1 root root  1956992  7月  9  2023 libstdc++.so.6.0.28
-rw-r--r--  1 root root 17981480  3月 17 05:34 libstdc++.so.6.0.29
lrwxrwxrwx  1 root root       19  7月  9  2023 libstdc++.so.6~ -> libstdc++.so.6.0.28
    :

コメント
システムのアップデートでシンボリックリンクが書き変わった場合再度リンクを作成する

参考サイト:
・https://github.com/pybind/pybind11/discussions/3453

"This undoes exactly what a virtual environment like anaconda is meant to achieve: not having to replace system libraries in order to satisfy dependencies."

↑