YOLOv7

物体認識の深層学習タスク：YOLO V7 †

　物体認識のタスク YOLO の新しい版である YOLO v7 を試してみる。

物体認識の深層学習タスク：YOLO V7
参考資料

※ 最終更新:2023/12/21　

↑

動作環境の準備と設定 †

↑

事前準備 †

「anaconda」環境を前提とする（未構築の場合は以下を参照）
- 画像生成プロジェクト環境構築（Windows編）
- 画像生成プロジェクト環境構築（Linux編）

プロジェクトファイルをダウンロードする
AI_project_20230413.zip (216MB) <YOLO v7 プロジェクト一式> をダウンロード

「update」フォルダ内をすべてプロジェクトのホームディレクトリに配置する（上書き保存）
・「Windows」の場合 ➡ 「/anacondawin」直下
・「Linux」の場合　 ➡ 「~/」ホームディレクトリ直下

プロジェクトのディレクトリ配置

プロジェクト・ホーム（c:/anaconda_win または ~/）
├─Images                ← 画像データ
├─model                 ← 学習済みモデル
├─Videos                ← 動画データ
│       :
├─work                  ← 今回使用するプロジェクト・ディレクトリ
│  ├─openvino
│  └─yolov7
├─workspace_py37
│  └─mylib             ← python 共有ライブラリ (パスが通っていること)
└─workspace_py38        ← anaconda 環境下のアプリケーション・プロジェクト
    ├─ :
    └─stable_diffusion

↑

環境構築のための主なコマンドの復習 †

pip コマンド

コマンド	内容
pip list	インストール済みパッケージ名とバージョン一覧
pip freeze	インストール済みパッケージ名とバージョン一覧（パッケージ管理除外）
pip install <パッケージ名>	パッケージのインストール
pip install <パッケージ名>==<バージョン>	バージョンを指定してインストール（バージョンを省略するとインストール可能なバージョン表示）
pip install -r requirements.txt	パッケージリストファイルに従ってインストール
pip uninstall <パッケージ名>	インストール済みパッケージのアンインストール
pip uninstall <パッケージ名> <パッケージ名> ...	複数インストール済みパッケージのアンインストール

conda コマンド

コマンド	内容
conda update -n base conda	conda のアップデート
conda create -n py37 python=3.7	仮想環境の作成(python versionを指定)
conda env create -f env1.yaml	仮想環境の作成(yamlファイルの読み込み)
conda create -n 新しい環境名 --clone 複製したいベースの環境名	仮想環境を複製
conda info -e	作成した仮想環境一覧の表示
conda activate py37	作成した仮想環境に切り替える
conda deactivate	現在の仮想環境から出る
conda env remove -n py37	仮想環境の削除
conda remove —-all	すべての仮想環境の削除
conda env export > env1.yaml	現在の仮想環境の設定ファイルを書き出す
conda list	現在の仮想環境のパッケージを確認

↑

新しく Python3.8 仮想環境「py38a」を構築 †

ターミナルを起動して現在の環境を確認

(py38) PS C:\anaconda_win\work> conda info -e

▼「conda info -e」ログ

(py38) PS C:\anaconda_win\workspace_py38> cd ..\work\
(py38) PS C:\anaconda_win\work> ls                                      ← 今回のプロジェクト・フォルダ


    ディレクトリ: N:\anaconda_win\work


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d-----        2023/04/13      4:39                openvino
d-----        2023/04/13      4:39                yolov7
------        2023/04/10     13:36            294 requirements.txt


(py38) PS C:\anaconda_win\work> conda info -e
# conda environments:
#
base                     C:\Users\(xxxxx)\anaconda3
    :
py37                     C:\Users\(xxxxx)\anaconda3\envs\py37
py38                  *  C:\Users\(xxxxx)\anaconda3\envs\py38           ← 現在アクティブな仮想環境

仮想環境「py38a」を構築

(py38) PS C:\anaconda_win\work> conda create -n py38a python=3.8

▼「conda create -n py38a python=3.8」ログ

(py38) PS C:\anaconda_win\work> conda create -n py38a python=3.8
Collecting package metadata (current_repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.13.0
  latest version: 23.3.1

Please update conda by running

    $ conda update -n base -c defaults conda                            ← このメッセージが出たら後で「anaconda」をアップデートする



## Package Plan ##

  environment location: C:\Users\izuts\anaconda3\envs\py38a

  added / updated specs:
    - python=3.8


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2023.01.10 |       haa95532_0         121 KB
    certifi-2022.12.7          |   py38haa95532_0         148 KB
    libffi-3.4.2               |       hd77b12b_6         109 KB
    openssl-1.1.1t             |       h2bbff1b_0         5.5 MB
    pip-23.0.1                 |   py38haa95532_0         2.7 MB
    python-3.8.16              |       h6244533_3        18.9 MB
    setuptools-65.6.3          |   py38haa95532_0         1.1 MB
    sqlite-3.41.1              |       h2bbff1b_0         897 KB
    wheel-0.38.4               |   py38haa95532_0          83 KB
    ------------------------------------------------------------
                                           Total:        29.6 MB

The following NEW packages will be INSTALLED:

  ca-certificates    pkgs/main/win-64::ca-certificates-2023.01.10-haa95532_0
  certifi            pkgs/main/win-64::certifi-2022.12.7-py38haa95532_0
  libffi             pkgs/main/win-64::libffi-3.4.2-hd77b12b_6
  openssl            pkgs/main/win-64::openssl-1.1.1t-h2bbff1b_0
  pip                pkgs/main/win-64::pip-23.0.1-py38haa95532_0
  python             pkgs/main/win-64::python-3.8.16-h6244533_3
  setuptools         pkgs/main/win-64::setuptools-65.6.3-py38haa95532_0
  sqlite             pkgs/main/win-64::sqlite-3.41.1-h2bbff1b_0
  vc                 pkgs/main/win-64::vc-14.2-h21ff451_1
  vs2015_runtime     pkgs/main/win-64::vs2015_runtime-14.27.29016-h5e58377_2
  wheel              pkgs/main/win-64::wheel-0.38.4-py38haa95532_0
  wincertstore       pkgs/main/win-64::wincertstore-0.2-py38haa95532_2


Proceed ([y]/n)?

Downloading and Extracting Packages
python-3.8.16        | 18.9 MB   | #################################### | 100%
wheel-0.38.4         | 83 KB     | #################################### | 100%
pip-23.0.1           | 2.7 MB    | #################################### | 100%
libffi-3.4.2         | 109 KB    | #################################### | 100%
sqlite-3.41.1        | 897 KB    | #################################### | 100%
ca-certificates-2023 | 121 KB    | #################################### | 100%
setuptools-65.6.3    | 1.1 MB    | #################################### | 100%
certifi-2022.12.7    | 148 KB    | #################################### | 100%
openssl-1.1.1t       | 5.5 MB    | #################################### | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate py38a
#
# To deactivate an active environment, use
#
#     $ conda deactivate

できた環境にアクティベートして環境確認

(py38) PS C:\anaconda_win\work> conda activate py38a

▼「conda activate py38a」ログ

(py38) PS C:\anaconda_win\work> conda activate py38a
(py38a) PS C:\anaconda_win\work> conda list
# packages in environment at C:\Users\izuts\anaconda3\envs\py38a:
#
# Name                    Version                   Build  Channel
ca-certificates           2023.01.10           haa95532_0
certifi                   2022.12.7        py38haa95532_0
libffi                    3.4.2                hd77b12b_6
openssl                   1.1.1t               h2bbff1b_0
pip                       23.0.1           py38haa95532_0
python                    3.8.16               h6244533_3
setuptools                65.6.3           py38haa95532_0
sqlite                    3.41.1               h2bbff1b_0
vc                        14.2                 h21ff451_1
vs2015_runtime            14.27.29016          h5e58377_2
wheel                     0.38.4           py38haa95532_0
wincertstore              0.2              py38haa95532_2
(py38a) PS N:\anaconda_win\work> conda info -e
# conda environments:
#
base                     C:\Users\izuts\anaconda3
    :
py37                     C:\Users\izuts\anaconda3\envs\py37
py38                     C:\Users\izuts\anaconda3\envs\py38
py38a                 *  C:\Users\izuts\anaconda3\envs\py38a           ← 現在アクティブな仮想環境

↑

新規の仮想環境「py38a」に必要なパッケージをインストール †

　● OpenVINO™ランタイムを使用して「Stable Diffusion」と「YOLOv7」が動作するようにする。

「requirements.txt」

# Stable Diffusion & YOLOv7_OpenVINO
openvino==2022.1.0
numpy==1.19.5
opencv-python==4.5.5.64
transformers==4.16.2
diffusers==0.2.4
tqdm==4.64.0
huggingface_hub==0.9.0
scipy==1.9.0
streamlit==1.12.0
watchdog==2.1.9
ftfy==6.1.1
PyMuPDF
torchvision
matplotlib
seaborn
onnx
googletrans==4.0.0-rc1

パッケージのインストール

(py38a) PS C:\anaconda_win\work> pip install -r requirements.txt

▼「pip install -r requirements.txt」ログ

(py38a) PS C:\anaconda_win\work> pip install -r requirements.txt
Collecting openvino==2022.1.0
  Using cached openvino-2022.1.0-7019-cp38-cp38-win_amd64.whl (22.9 MB)
Collecting numpy==1.19.5
  Using cached numpy-1.19.5-cp38-cp38-win_amd64.whl (13.3 MB)
Collecting opencv-python==4.5.5.64
  Using cached opencv_python-4.5.5.64-cp36-abi3-win_amd64.whl (35.4 MB)
Collecting transformers==4.16.2
  Using cached transformers-4.16.2-py3-none-any.whl (3.5 MB)
Collecting diffusers==0.2.4
  Using cached diffusers-0.2.4-py3-none-any.whl (112 kB)
Collecting tqdm==4.64.0
  Using cached tqdm-4.64.0-py2.py3-none-any.whl (78 kB)
Collecting huggingface_hub==0.9.0
  Using cached huggingface_hub-0.9.0-py3-none-any.whl (120 kB)
Collecting scipy==1.9.0
  Using cached scipy-1.9.0-cp38-cp38-win_amd64.whl (38.6 MB)
Collecting streamlit==1.12.0
  Using cached streamlit-1.12.0-py2.py3-none-any.whl (9.1 MB)
Collecting watchdog==2.1.9
  Using cached watchdog-2.1.9-py3-none-win_amd64.whl (78 kB)
Collecting ftfy==6.1.1
  Using cached ftfy-6.1.1-py3-none-any.whl (53 kB)
Collecting PyMuPDF
  Downloading PyMuPDF-1.21.1-cp38-cp38-win_amd64.whl (11.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.7/11.7 MB 31.2 MB/s eta 0:00:00
Collecting torchvision
  Downloading torchvision-0.15.1-cp38-cp38-win_amd64.whl (1.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 12.6 MB/s eta 0:00:00
Collecting matplotlib
  Downloading matplotlib-3.7.1-cp38-cp38-win_amd64.whl (7.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.6/7.6 MB 27.2 MB/s eta 0:00:00
Collecting seaborn
  Downloading seaborn-0.12.2-py3-none-any.whl (293 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 293.3/293.3 kB ? eta 0:00:00
Collecting onnx
  Downloading onnx-1.13.1-cp38-cp38-win_amd64.whl (12.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12.2/12.2 MB 34.4 MB/s eta 0:00:00
Collecting googletrans==4.0.0-rc1
  Using cached googletrans-4.0.0rc1-py3-none-any.whl
Collecting pyyaml>=5.1
  Using cached PyYAML-6.0-cp38-cp38-win_amd64.whl (155 kB)
Collecting filelock
  Downloading filelock-3.11.0-py3-none-any.whl (10.0 kB)
Collecting tokenizers!=0.11.3,>=0.10.1
  Downloading tokenizers-0.13.3-cp38-cp38-win_amd64.whl (3.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.5/3.5 MB 31.7 MB/s eta 0:00:00
Collecting requests
  Downloading requests-2.28.2-py3-none-any.whl (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.8/62.8 kB ? eta 0:00:00
Collecting sacremoses
  Using cached sacremoses-0.0.53-py3-none-any.whl
Collecting packaging>=20.0
  Downloading packaging-23.1-py3-none-any.whl (48 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.9/48.9 kB 2.4 MB/s eta 0:00:00
Collecting regex!=2019.12.17
  Downloading regex-2023.3.23-cp38-cp38-win_amd64.whl (267 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 267.9/267.9 kB ? eta 0:00:00
Collecting importlib-metadata
  Downloading importlib_metadata-6.3.0-py3-none-any.whl (22 kB)
Collecting Pillow
  Downloading Pillow-9.5.0-cp38-cp38-win_amd64.whl (2.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 20.2 MB/s eta 0:00:00
Collecting torch>=1.4
  Downloading torch-2.0.0-cp38-cp38-win_amd64.whl (172.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 172.3/172.3 MB 7.5 MB/s eta 0:00:00
Requirement already satisfied: colorama in c:\users\izuts\appdata\roaming\python\python38\site-packages (from tqdm==4.64.0->-r requirements.txt (line 7)) (0.4.3)
Collecting typing-extensions>=3.7.4.3
  Downloading typing_extensions-4.5.0-py3-none-any.whl (27 kB)
Collecting protobuf<4,>=3.12
  Using cached protobuf-3.20.3-cp38-cp38-win_amd64.whl (904 kB)
Collecting pympler>=0.9
  Using cached Pympler-1.0.1-py3-none-any.whl (164 kB)
Collecting click>=7.0
  Using cached click-8.1.3-py3-none-any.whl (96 kB)
Collecting pydeck>=0.1.dev5
  Using cached pydeck-0.8.0-py2.py3-none-any.whl (4.7 MB)
Collecting blinker>=1.0.0
  Downloading blinker-1.6.1-py3-none-any.whl (13 kB)
Collecting semver
  Downloading semver-3.0.0-py3-none-any.whl (17 kB)
Collecting rich>=10.11.0
  Downloading rich-13.3.4-py3-none-any.whl (238 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 238.7/238.7 kB 7.4 MB/s eta 0:00:00
Collecting pandas>=0.21.0
  Downloading pandas-2.0.0-cp38-cp38-win_amd64.whl (11.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.3/11.3 MB 28.5 MB/s eta 0:00:00
Collecting tzlocal>=1.1
  Downloading tzlocal-4.3-py3-none-any.whl (20 kB)
Collecting tornado>=5.0
  Using cached tornado-6.2-cp37-abi3-win_amd64.whl (425 kB)
Collecting python-dateutil
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting cachetools>=4.0
  Downloading cachetools-5.3.0-py3-none-any.whl (9.3 kB)
Collecting pyarrow>=4.0
  Downloading pyarrow-11.0.0-cp38-cp38-win_amd64.whl (20.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.6/20.6 MB 26.1 MB/s eta 0:00:00
Collecting gitpython!=3.1.19
  Downloading GitPython-3.1.31-py3-none-any.whl (184 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 184.3/184.3 kB 11.6 MB/s eta 0:00:00
Collecting validators>=0.2
  Using cached validators-0.20.0-py3-none-any.whl
Collecting toml
  Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB)
Collecting altair>=3.2.0
  Downloading altair-4.2.2-py3-none-any.whl (813 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 813.6/813.6 kB 25.1 MB/s eta 0:00:00
Collecting wcwidth>=0.2.5
  Downloading wcwidth-0.2.6-py2.py3-none-any.whl (29 kB)
Collecting httpx==0.13.3
  Using cached httpx-0.13.3-py3-none-any.whl (55 kB)
Collecting sniffio
  Using cached sniffio-1.3.0-py3-none-any.whl (10 kB)
Collecting httpcore==0.9.*
  Using cached httpcore-0.9.1-py3-none-any.whl (42 kB)
Collecting chardet==3.*
  Using cached chardet-3.0.4-py2.py3-none-any.whl (133 kB)
Collecting rfc3986<2,>=1.3
  Using cached rfc3986-1.5.0-py2.py3-none-any.whl (31 kB)
Collecting idna==2.*
  Using cached idna-2.10-py2.py3-none-any.whl (58 kB)
Collecting hstspreload
  Downloading hstspreload-2023.1.1-py3-none-any.whl (1.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 32.6 MB/s eta 0:00:00
Requirement already satisfied: certifi in c:\users\izuts\anaconda3\envs\py38a\lib\site-packages (from httpx==0.13.3->googletrans==4.0.0-rc1->-r requirements.txt (line 18)) (2022.12.7)
Collecting h11<0.10,>=0.8
  Using cached h11-0.9.0-py2.py3-none-any.whl (53 kB)
Collecting h2==3.*
  Using cached h2-3.2.0-py2.py3-none-any.whl (65 kB)
Collecting hyperframe<6,>=5.2.0
  Using cached hyperframe-5.2.0-py2.py3-none-any.whl (12 kB)
Collecting hpack<4,>=3.0
  Using cached hpack-3.0.0-py2.py3-none-any.whl (38 kB)
Collecting sympy
  Downloading sympy-1.11.1-py3-none-any.whl (6.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 29.5 MB/s eta 0:00:00
Collecting jinja2
  Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB)
Collecting networkx
  Downloading networkx-3.1-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 21.9 MB/s eta 0:00:00
Collecting importlib-resources>=3.2.0
  Downloading importlib_resources-5.12.0-py3-none-any.whl (36 kB)
Collecting kiwisolver>=1.0.1
  Downloading kiwisolver-1.4.4-cp38-cp38-win_amd64.whl (55 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 55.4/55.4 kB 3.0 MB/s eta 0:00:00
Collecting contourpy>=1.0.1
  Downloading contourpy-1.0.7-cp38-cp38-win_amd64.whl (162 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 163.0/163.0 kB ? eta 0:00:00
Collecting cycler>=0.10
  Downloading cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting fonttools>=4.22.0
  Downloading fonttools-4.39.3-py3-none-any.whl (1.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 62.4 MB/s eta 0:00:00
Collecting matplotlib
  Downloading matplotlib-3.7.0-cp38-cp38-win_amd64.whl (7.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.7/7.7 MB 7.0 MB/s eta 0:00:00
  Downloading matplotlib-3.6.3-cp38-cp38-win_amd64.whl (7.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.2/7.2 MB 32.9 MB/s eta 0:00:00
Collecting pyparsing>=2.2.1
  Using cached pyparsing-3.0.9-py3-none-any.whl (98 kB)
Collecting entrypoints
  Using cached entrypoints-0.4-py3-none-any.whl (5.3 kB)
Collecting toolz
  Using cached toolz-0.12.0-py3-none-any.whl (55 kB)
Collecting jsonschema>=3.0
  Downloading jsonschema-4.17.3-py3-none-any.whl (90 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 90.4/90.4 kB 1.7 MB/s eta 0:00:00
Collecting gitdb<5,>=4.0.1
  Downloading gitdb-4.0.10-py3-none-any.whl (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.7/62.7 kB ? eta 0:00:00
Collecting zipp>=0.5
  Downloading zipp-3.15.0-py3-none-any.whl (6.8 kB)
Collecting pandas>=0.21.0
  Downloading pandas-1.5.3-cp38-cp38-win_amd64.whl (11.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.0/11.0 MB 23.4 MB/s eta 0:00:00
  Downloading pandas-1.5.2-cp38-cp38-win_amd64.whl (11.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.0/11.0 MB 8.7 MB/s eta 0:00:00
  Using cached pandas-1.5.1-cp38-cp38-win_amd64.whl (11.0 MB)
  Using cached pandas-1.5.0-cp38-cp38-win_amd64.whl (11.0 MB)
  Using cached pandas-1.4.4-cp38-cp38-win_amd64.whl (10.6 MB)
Collecting pytz>=2020.1
  Downloading pytz-2023.3-py2.py3-none-any.whl (502 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 502.3/502.3 kB 32.8 MB/s eta 0:00:00
Requirement already satisfied: six>=1.5 in c:\users\izuts\appdata\roaming\python\python38\site-packages (from python-dateutil->streamlit==1.12.0->-r requirements.txt (line 10)) (1.14.0)
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.15-py2.py3-none-any.whl (140 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 140.9/140.9 kB 8.2 MB/s eta 0:00:00
Collecting charset-normalizer<4,>=2
  Downloading charset_normalizer-3.1.0-cp38-cp38-win_amd64.whl (96 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 96.4/96.4 kB 5.4 MB/s eta 0:00:00
Collecting markdown-it-py<3.0.0,>=2.2.0
  Downloading markdown_it_py-2.2.0-py3-none-any.whl (84 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84.5/84.5 kB ? eta 0:00:00
Collecting pygments<3.0.0,>=2.13.0
  Downloading Pygments-2.15.0-py3-none-any.whl (1.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.1/1.1 MB 24.2 MB/s eta 0:00:00
Collecting backports.zoneinfo
  Using cached backports.zoneinfo-0.2.1-cp38-cp38-win_amd64.whl (38 kB)
Collecting tzdata
  Downloading tzdata-2023.3-py2.py3-none-any.whl (341 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 341.8/341.8 kB 22.1 MB/s eta 0:00:00
Collecting pytz-deprecation-shim
  Using cached pytz_deprecation_shim-0.1.0.post0-py2.py3-none-any.whl (15 kB)
Collecting decorator>=3.4.0
  Using cached decorator-5.1.1-py3-none-any.whl (9.1 kB)
Collecting joblib
  Using cached joblib-1.2.0-py3-none-any.whl (297 kB)
Collecting smmap<6,>=3.0.1
  Using cached smmap-5.0.0-py3-none-any.whl (24 kB)
Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-2.1.2-cp38-cp38-win_amd64.whl (16 kB)
Collecting pkgutil-resolve-name>=1.3.10
  Using cached pkgutil_resolve_name-1.3.10-py3-none-any.whl (4.7 kB)
Collecting pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0
  Downloading pyrsistent-0.19.3-cp38-cp38-win_amd64.whl (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.7/62.7 kB 845.9 kB/s eta 0:00:00
Collecting attrs>=17.4.0
  Downloading attrs-22.2.0-py3-none-any.whl (60 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 60.0/60.0 kB ? eta 0:00:00
Collecting mdurl~=0.1
  Downloading mdurl-0.1.2-py3-none-any.whl (10.0 kB)
Collecting mpmath>=0.19
  Downloading mpmath-1.3.0-py3-none-any.whl (536 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 536.2/536.2 kB 32.9 MB/s eta 0:00:00
Installing collected packages: wcwidth, tokenizers, rfc3986, pytz, mpmath, hyperframe, hpack, h11, chardet, zipp, watchdog, urllib3, tzdata, typing-extensions, tqdm, tornado, toolz, toml, sympy, sniffio, smmap, semver, regex, pyyaml, python-dateutil, pyrsistent, pyparsing, PyMuPDF, pympler, pygments, protobuf, pkgutil-resolve-name, Pillow, packaging, numpy, networkx, mdurl, MarkupSafe, kiwisolver, joblib, idna, hstspreload, h2, ftfy, fonttools, filelock, entrypoints, decorator, cycler, click, charset-normalizer, cachetools, backports.zoneinfo, attrs, validators, scipy, sacremoses, requests, pytz-deprecation-shim, pyarrow, pandas, openvino, opencv-python, onnx, markdown-it-py, jinja2, importlib-resources, importlib-metadata, httpcore, gitdb, contourpy, blinker, tzlocal, torch, rich, pydeck, matplotlib, jsonschema, huggingface_hub, httpx, gitpython, transformers, torchvision, seaborn, googletrans, diffusers, altair, streamlit
Successfully installed MarkupSafe-2.1.2 Pillow-9.5.0 PyMuPDF-1.21.1 altair-4.2.2 attrs-22.2.0 backports.zoneinfo-0.2.1 blinker-1.6.1 cachetools-5.3.0 chardet-3.0.4 charset-normalizer-3.1.0 click-8.1.3 contourpy-1.0.7 cycler-0.11.0 decorator-5.1.1 diffusers-0.2.4 entrypoints-0.4 filelock-3.11.0 fonttools-4.39.3 ftfy-6.1.1 gitdb-4.0.10 gitpython-3.1.31 googletrans-4.0.0rc1 h11-0.9.0 h2-3.2.0 hpack-3.0.0 hstspreload-2023.1.1 httpcore-0.9.1 httpx-0.13.3 huggingface_hub-0.9.0 hyperframe-5.2.0 idna-2.10 importlib-metadata-6.3.0 importlib-resources-5.12.0 jinja2-3.1.2 joblib-1.2.0 jsonschema-4.17.3 kiwisolver-1.4.4 markdown-it-py-2.2.0 matplotlib-3.6.3 mdurl-0.1.2 mpmath-1.3.0 networkx-3.1 numpy-1.19.5 onnx-1.13.1 opencv-python-4.5.5.64 openvino-2022.1.0 packaging-23.1 pandas-1.4.4 pkgutil-resolve-name-1.3.10 protobuf-3.20.3 pyarrow-11.0.0 pydeck-0.8.0 pygments-2.15.0 pympler-1.0.1 pyparsing-3.0.9 pyrsistent-0.19.3 python-dateutil-2.8.2 pytz-2023.3 pytz-deprecation-shim-0.1.0.post0 pyyaml-6.0 regex-2023.3.23 requests-2.28.2 rfc3986-1.5.0 rich-13.3.4 sacremoses-0.0.53 scipy-1.9.0 seaborn-0.12.2 semver-3.0.0 smmap-5.0.0 sniffio-1.3.0 streamlit-1.12.0 sympy-1.11.1 tokenizers-0.13.3 toml-0.10.2 toolz-0.12.0 torch-2.0.0 torchvision-0.15.1 tornado-6.2 tqdm-4.64.0 transformers-4.16.2 typing-extensions-4.5.0 tzdata-2023.3 tzlocal-4.3 urllib3-1.26.15 validators-0.20.0 watchdog-2.1.9 wcwidth-0.2.6 zipp-3.15.0

インストールしたパッケージの確認

(py38a) PS C:\anaconda_win\work> pip list

▼「pip list」ログ

(py38a) PS C:\anaconda_win\work> pip list
Package               Version
--------------------- -----------
altair                4.2.2
astroid               2.3.3
attrs                 22.2.0
backports.zoneinfo    0.2.1
blinker               1.6.1
cachetools            5.3.0
certifi               2022.12.7
chardet               3.0.4
charset-normalizer    3.1.0
click                 8.1.3
colorama              0.4.3
contourpy             1.0.7
cycler                0.11.0
decorator             5.1.1
diffusers             0.2.4
entrypoints           0.4
filelock              3.11.0
fonttools             4.39.3
ftfy                  6.1.1
gitdb                 4.0.10
GitPython             3.1.31
googletrans           4.0.0rc1
h11                   0.9.0
h2                    3.2.0
hpack                 3.0.0
hstspreload           2023.1.1
httpcore              0.9.1
httpx                 0.13.3
huggingface-hub       0.9.0
hyperframe            5.2.0
idna                  2.10
importlib-metadata    6.3.0
importlib-resources   5.12.0
isort                 4.3.21
Jinja2                3.1.2
joblib                1.2.0
jsonschema            4.17.3
kiwisolver            1.4.4
lazy-object-proxy     1.4.3
markdown-it-py        2.2.0
MarkupSafe            2.1.2
matplotlib            3.6.3
mccabe                0.6.1
mdurl                 0.1.2
mpmath                1.3.0
networkx              3.1
numpy                 1.19.5
onnx                  1.13.1
opencv-python         4.5.5.64
openvino              2022.1.0
packaging             23.1
pandas                1.4.4
Pillow                9.5.0
pip                   23.0.1
pkgutil_resolve_name  1.3.10
protobuf              3.20.3
pyarrow               11.0.0
pydeck                0.8.0
Pygments              2.15.0
pylint                2.4.4
Pympler               1.0.1
PyMuPDF               1.21.1
pyparsing             3.0.9
pyrsistent            0.19.3
python-dateutil       2.8.2
pytz                  2023.3
pytz-deprecation-shim 0.1.0.post0
PyYAML                6.0
regex                 2023.3.23
requests              2.28.2
rfc3986               1.5.0
rich                  13.3.4
sacremoses            0.0.53
scipy                 1.9.0
seaborn               0.12.2
semver                3.0.0
setuptools            65.6.3
six                   1.14.0
smmap                 5.0.0
sniffio               1.3.0
streamlit             1.12.0
sympy                 1.11.1
tokenizers            0.13.3
toml                  0.10.2
toolz                 0.12.0
torch                 2.0.0
torchvision           0.15.1
tornado               6.2
tqdm                  4.64.0
transformers          4.16.2
typing_extensions     4.5.0
tzdata                2023.3
tzlocal               4.3
urllib3               1.26.15
validators            0.20.0
watchdog              2.1.9
wcwidth               0.2.6
wheel                 0.38.4
wincertstore          0.2
wrapt                 1.11.2
zipp                  3.15.0

↑

Stable-Diffusion デモ・プログラムの実行確認 †

トム・クルーズのゴッホ風のストリートアート

(py38a) PS C:\anaconda_win\work> cd ../workspace_py38/stable_diffusion
(py38a) PS C:\anaconda_win\workspace_py38\stable_diffusion> python demo.py --prompt "Street-art painting of Tom Cruise in style of Gogh, photorealism"
32it [04:03,  7.61s/it]

　　　　　　　↑ -- 1回あたりの演算速度（秒）
　　　　↑ --------- トータル時間（分：秒）
　　↑ ------------- 演算回数（最大32）

↑

GUI で画像を生成する「stable_diffusion2.py」の実行確認 †

(py38a) PS C:\anaconda_win\workspace_py38\stable_diffusion> python stable_diffusion2.py
Starting..
 - Program title  : Stable Diffusion2 OpenVINO™  Ver 0.02
 - OpenVINO engine: 2022.1.0-7019-cdb9bec7210-releases/2022/1
 - OpenCV version : 4.5.5
 - Log level      : 3
 - Output dir     : result/
 - Output header  : output_
 - model          : bes-dev/stable-diffusion-v1-4-openvino
 - beta_start     : 0.00085
 - beta_end       : 0.012
 - beta_schedule  : scaled_linear
 - eta            : 0.0
 - tokenizer      : openai/clip-vit-large-patch14

 Prompt: Beautiful green forest where light shines
 (和訳): 光が輝く美しい緑の森

** start 0 ** 4029269746
32it [03:26,  6.46s/it]
 -Output-: result/output_4029269746.png
** end **     00:03:49

Finished.

↑

YOLOv7_OpenVINO のテスト †

↑

GitHub サイトからプロジェクトをダウンロード †

YOLOv7_OpenVINO サイトを開く
①「Code」を押し「Download.ZIP」を選択。「YOLOv7_OpenVINO_cpp-python-main.zip」をダウンロード

②「YOLOv7」を押してオフィシャルサイト(Official YOLOv7)を開く

③「Code」を押し「Download.ZIP」を選択。「yolov7-main.zip」をダウンロード

④ ページを上にスクロールして「Testing」の項の「yolov7.pt」をダウンロード
「YOLOv7_OpenVINO_cpp-python-main.zip」「yolov7-main.zip」を解凍する
できた「YOLOv7_OpenVINO_cpp-python-main」「yolov7-main」フォルダを「/work」直下にコピーする

「/work/yolo7-main」の下に「yolov7.pt」をコピーする

/work
├─openvino
├─yolov7
├─YOLOv7_OpenVINO_cpp-python-main
└─yolov7-main
    ├─ :
    └─yolo7.pt

↑

学習済みモデルのコンバート †

オフィシャルサイトの「export.py」を使って onnx 形式に変換

(py38a) PS > cd /anaconda_win/work/yolov7-main
(py38a) PS > python export.py --weights yolov7.pt

▼「python export.py --weights yolov7.pt」ログ

(py38a) PS > cd /anaconda_win/work/yolov7-main
(py38a) PS > python export.py --weights yolov7.pt
Import onnx_graphsurgeon failure: No module named 'onnx_graphsurgeon'
Namespace(batch_size=1, conf_thres=0.25, device='cpu', dynamic=False, dynamic_batch=False, end2end=False, fp16=False, grid=False, img_size=[640, 640], include_nms=False, int8=False, iou_thres=0.45, max_wh=None, simplify=False, topk_all=100, weights='yolov7.pt')
YOLOR  2023-4-13 torch 2.0.0+cpu CPU

Fusing layers...
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
RepConv.fuse_repvgg_block
Model Summary: 306 layers, 36905341 parameters, 36905341 gradients

Starting TorchScript export with torch 2.0.0+cpu...
TorchScript export success, saved as yolov7.torchscript.pt
CoreML export failure: No module named 'coremltools'

Starting TorchScript-Lite export with torch 2.0.0+cpu...
TorchScript-Lite export success, saved as yolov7.torchscript.ptl

Starting ONNX export with onnx 1.13.1...
N:\anaconda_win\work\yolov7-main\models\yolo.py:582: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if augment:
N:\anaconda_win\work\yolov7-main\models\yolo.py:614: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if profile:
N:\anaconda_win\work\yolov7-main\models\yolo.py:629: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if profile:
============== Diagnostic Run torch.onnx.export version 2.0.0+cpu ==============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

ONNX export success, saved as yolov7.onnx

Export complete (27.27s). Visualize with https://github.com/lutzroeder/netron.

できた「yolo7.onnx」を「/work/YOLOv7_OpenVINO_cpp-python-main」フォルダにコピーする

↑

デモ・プログラムの実行 †

静止画像

(py38a) PS > cd /anaconda_win/work/YOLOv7_OpenVINO_cpp-python-main/
(py38a) PS > python python/image.py -m yolov7.onnx -i data/horses.jpg -p

▼「python python/image.py -m yolov7.onnx -i data/horses.jpg -p」ログ

(py38a) PS C:\anaconda_win\work\YOLOv7_OpenVINO_cpp-python-main> python python/image.py -m yolov7.onnx -i data/horses.jpg -p
Dump preprocessor: Input "images" (color BGR):
    User's input tensor: {1,640,640,3}, [N,H,W,C], f32
    Model's expected tensor: {1,3,640,640}, [N,C,H,W], f32
    Pre-processing steps (2):
      convert color (RGB): ({1,640,640,3}, [N,H,W,C], f32, BGR) -> ({1,640,640,3}, [N,H,W,C], f32, RGB)
      scale (255,255,255): ({1,640,640,3}, [N,H,W,C], f32, RGB) -> ({1,640,640,3}, [N,H,W,C], f32, RGB)
    Implicit pre-processing steps (1):
      convert layout [N,C,H,W]: ({1,640,640,3}, [N,H,W,C], f32, RGB) -> ({1,3,640,640}, [N,C,H,W], f32, RGB)
#br

　プログラムが終了すると実行フォルダ「YOLOv7_OpenVINO_cpp-python-main」内に入力画像「data/horses.jpg」に対する認識結果画像「yolov7_out.jpg」が生成される

カメラ入力

(py38a) PS > cd /anaconda_win/work/YOLOv7_OpenVINO_cpp-python-main/
(py38a) PS > python python/webcam.py -m yolov7.onnx -i 0 -p

▼「python python/webcam.py -m yolov7.onnx -i 0 -p」ログ

(py38a) PS > cd /anaconda_win/work/YOLOv7_OpenVINO_cpp-python-main/
(py38a) PS > python python/webcam.py -m yolov7.onnx -i 0 -p
Dump preprocessor: Input "images" (color BGR):
    User's input tensor: {1,640,640,3}, [N,H,W,C], f32
    Model's expected tensor: {1,3,640,640}, [N,C,H,W], f32
    Pre-processing steps (2):
      convert color (RGB): ({1,640,640,3}, [N,H,W,C], f32, BGR) -> ({1,640,640,3}, [N,H,W,C], f32, RGB)
      scale (255,255,255): ({1,640,640,3}, [N,H,W,C], f32, RGB) -> ({1,640,640,3}, [N,H,W,C], f32, RGB)
    Implicit pre-processing steps (1):
      convert layout [N,C,H,W]: ({1,640,640,3}, [N,H,W,C], f32, RGB) -> ({1,3,640,640}, [N,C,H,W], f32, RGB)

Traceback (most recent call last):
  File "python/webcam.py", line 24, in <module>
    yolov7_detector.infer_cam(args.input)
  File "N:\anaconda_win\work\YOLOv7_OpenVINO_cpp-python-main\python\yolov7.py", line 262, in infer_cam
    self.infer_queue.start_async({self.input_layer.any_name: img_batch}, (src_img_list, src_size))
  File "C:\Users\izuts\anaconda3\envs\py38a\lib\site-packages\openvino\runtime\ie_api.py", line 241, in start_async
    inputs, get_input_types(self[self.get_idle_request_id()])
KeyboardInterrupt
[ WARN:0@166.632] global D:\a\opencv-python\opencv-python\opencv\modules\videoio\src\cap_msmf.cpp (539) `anonymous-namespace'::SourceReaderCB::~SourceReaderCB terminating async callback

デモ・プログラムについて
・カメラ入力の場合、Linux環境では動作が止まる
・カメラ入力ウインドウ表示後、「Esc」キー入力待ちになるがキーを受け付けないので終了できない
・Windows環境の場合、PowerShellPrompt をアクティブにして「ctrl + c」キー入力で終了する
・「-p」パラメータを付加しないとエラーとなる（これはプログラムのバグのよう）

(py38a) PS > cd /anaconda_win/work/YOLOv7_OpenVINO_cpp-python-main/
(py38a) PS> python python/webcam.py -m yolov7.onnx -i 0
Traceback (most recent call last):
  File "python/webcam.py", line 24, in <module>
    yolov7_detector.infer_cam(args.input)
  File "N:\anaconda_win\work\YOLOv7_OpenVINO_cpp-python-main\python\yolov7.py", line 262, in infer_cam
    self.infer_queue.start_async({self.input_layer.any_name: img_batch}, (src_img_list, src_size))
  File "C:\Users\izuts\anaconda3\envs\py38a\lib\site-packages\openvino\runtime\ie_api.py", line 237, in start_async
    super().start_async(
RuntimeError: Can't SetBlob with name: images, because model input (shape={1,3,640,640}) and blob (shape=(1.640.640.3)) are incompatible
[ WARN:1@7.773] global D:\a\opencv-python\opencv-python\opencv\modules\videoio\src\cap_msmf.cpp (539) `anonymous-namespace'::SourceReaderCB::~SourceReaderCB term

↑

YOLO v7 実行プログラムの作成 †

↑

プログラム概要 †

サイト「Official YOLOv7」のサンプルプログラムを参考に、物体認識プログラムを作成する
スレッド仕様の非同期処理になっているが、コールバック関数側にウインドウ表示処理が入りうまく動作できていないようなので、1フレーム単位の従来手法とする
入力ソースとして、単一の静止画/動画ファイル指定・ダイアログによるファイル選択指定・カメラ、が選べるようにする
サンプルが OpenVINO™ API 2.0 仕様なのでこれに準拠する（難しい！！）

↑

実行手順 †

コマンドラインから起動する

(py38a) PS > cd /anaconda_win/work/yolov7
(py38a) PS C:\anaconda_win\work\yolov7> python object_detect_yolo7.py

コマンドライン引数

コマンドオプション	デフォールト設定	意味	※注
-h, --help	-	ヘルプ表示
-i, --input	../../Videos/car_m.mp4	カメラ(cam)または動画・静止画像ファイル	※1
-m, --model	../../model/yolov7.onnx	学習済みモデル(IR/onnx)
-d, --device	CPU	デバイス指定 (CPU/GPU/MYRIAD)
--conf	0.25	クラス判定の閾値 (数値が小さい程オブジェクトは増えるが、ノイズも増える	※2
--iou	0.45	Intersection Over Union(検出領域が重なっている割合、数値が大きいほど重なり度合いが高い)	※2
-l, --label	coco.names_jp	ラベル・ファイル
--log	3	ログ出力レベル(0/1/2/3/4/5)
-t, --title	y	タイトル表示 (y/n)
-s, --speed	y	スピード計測表示 (y/n)
-o, --out	non	処理結果を出力する場合のファイル名
-p, --pre_api	False	Preprocessing api.
-bs, --batchsize	1	Batch size.
-n, --nireq	1	number of infer request.
-g, --grid	False	With grid in model.

　※1 入力ソースの指定
　　　・cam　　　　：カメラ入力
　　　・0　　　　　：ダイアログによる指定（動画 / 静止画）
　　　・ファイルパス：動画ファイル(.mp4) / 静止画ファイル(.jpg, .png, .bmp, ....)
　※2 パラメータ追加　2023/07/18

(py38a) PS N:\anaconda_win\work\yolov7> python object_detect_yolo7.py -h
usage: object_detect_yolo7.py [-h] [-i INPUT_IMAGE] [-m MODEL] [-d DEVICE]
                              [-lb LABEL_FILE] [--log LOG] [-t TITLE]
                              [-s SPEED] [-o IMAGE_OUT] [-p] [-bs BATCHSIZE]
                              [-n NIREQ] [-g]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_IMAGE, --input INPUT_IMAGE
                        Absolute path to image file or cam for camera
                        stream.Default value is '../../Videos/car_m.mp4'
  -m MODEL, --model MODEL
                        Model Path to an .xml file with a trained
                        model.Default value is ../../model/yolov7.onnx
  -d DEVICE, --device DEVICE
                        Optional. Specify a target device to infer on. CPU,
                        GPU, FPGA, HDDL or MYRIAD is acceptable. The demo will
                        look for a suitable plugin for the device specified.
                        Default value is CPU
  --conf CONF           object confidence threshold
  --iou IOU             IOU threshold for NMS
  -lb LABEL_FILE, --labels LABEL_FILE
                        Absolute path to labels file.Default value is
                        'coco.names_jp'
  --log LOG             Log level(-1/0/1/2/3/4/5) Default value is '3'
  -t TITLE, --title TITLE
                        Program title flag.(y/n) Default value is 'y'
  -s SPEED, --speed SPEED
                        Speed display flag.(y/n) Default calue is 'y'
  -o IMAGE_OUT, --out IMAGE_OUT
                        Processed image file path. Default value is 'non'
  -p, --pre_api         Preprocessing api.
  -bs BATCHSIZE, --batchsize BATCHSIZE
                        Batch size.
  -n NIREQ, --nireq NIREQ
                        number of infer request.
  -g, --grid            With grid in model.

↑

実行例 †

入力パラメータなしの場合

(py38a) PS C:\anaconda_win\work\YOLOv7_OpenVINO_cpp-python-main> cd ../yolov7
(py38a) PS C:\anaconda_win\work\yolov7> python object_detect_yolo7.py
Starting..
 - Program title  : Object detection YOLO V7
 - OpenCV version : 4.5.5
 - OpenVINO engine: 2022.1.0-7019-cdb9bec7210-releases/2022/1
 - Input image    : ../../Videos/car_m.mp4
 - Model          : ../../model/yolov7.onnx
 - Device         : CPU
 - Confidence thr : 0.1
 - IOU threshold  : 0.6
 - Label          : coco.names_jp
 - Log level      : 3
 - Title flag     : y
 - Speed flag     : y
 - Processed out  : non
 - Preprocessing  : False
 - Batch size     : 1
 - number of inf  : 1
 - With grid      : False
FPS average:       1.80
Finished.

動画ファイル指定の場合

(py38a) PS C:\anaconda_win\work\yolov7> python object_detect_yolo7.py -i ../../Videos/car1_m.mp4

(py38a) PS C:\anaconda_win\work\yolov7> python object_detect_yolo7.py -i ../../Videos/car2_m.mp4

静止画ファイル指定の場合

(py38a) PS C:\anaconda_win\work\yolov7> python object_detect_yolo7.py -i ../../Images/desk-image.jpg

(py38a) PS C:\anaconda_win\work\yolov7> python object_detect_yolo7.py -i ../../Images/car-person.jpg

(py38a) PS C:\anaconda_win\work\yolov7> python object_detect_yolo7.py -i ../../Images/bus.jpg

(py38a) PS C:\anaconda_win\work\yolov7> python object_detect_yolo7.py -i ../../Images/zidane.jpg

入力ファイルをダイアログ選択で指定する場合
```
(py38a) PS C:\anaconda_win\work\yolov7> python object_detect_yolo7.py -i 0
```
※ 指定できるファイル拡張子 .mp4 .bmp .png .jpg .jpeg .JPG .tif

↑

ソースコード †

ファイルの場所 /work/yolov7

▼「object_detect_yolo7.py」

# -*- coding: utf-8 -*-
##------------------------------------------
##  Object detection YOLO V7      Ver 0.03
##    with OpenVINO™ runtime
##
##               2023.04.12 Masahiro Izutsu
##------------------------------------------
## GitHub https://github.com/OpenVINO-dev-contest/YOLOv7_OpenVINO_cpp-python
##
## yolo7_ov.py
##  ver 0.03    2023.07.18      conf, iou コマンドパラメータによる設定追加

# import処理
import sys
import os

import yolov7s
import argparse
import cv2
import numpy as np

from openvino.inference_engine import get_version
import my_logging
import my_file
import my_color80
import my_process
import my_movedlg
import my_dialog
import my_winstat
from my_print import printG, printR, printY, printC, printB, printN

# 定数定義
from os.path import expanduser
MODEL_DEF = expanduser('../../model/yolov7.onnx')
INPUT_IMAGE_DEF = '../../Videos/car_m.mp4'             # 入力イメージ・パラメータ初期値
LABEL_DEF = 'coco.names_jp'

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--input', metavar = 'INPUT_IMAGE', type=str, default = INPUT_IMAGE_DEF,
            help = 'Absolute path to image file or cam for camera stream.'
            'Default value is \''+INPUT_IMAGE_DEF+'\'')
    parser.add_argument('-m', '--model', type=str,
            default = MODEL_DEF,
            help = 'Model Path to an .xml file with a trained model.'
            'Default value is '+MODEL_DEF)
    parser.add_argument('-d', '--device', default='CPU', type = str,
            help = 'Optional. Specify a target device to infer on. CPU, GPU, FPGA, HDDL or MYRIAD is '
                   'acceptable. The demo will look for a suitable plugin for the device specified. '
                   'Default value is CPU')
    parser.add_argument('--conf', default = 0.25, type = float, help = 'object confidence threshold')
    parser.add_argument('--iou', default = 0.45, type = float, help = 'IOU threshold for NMS')
    parser.add_argument('-lb', '--labels', metavar = 'LABEL_FILE', type = str, default = LABEL_DEF,
            help = 'Absolute path to labels file.'
            'Default value is \''+LABEL_DEF+'\'')
    parser.add_argument('--log', metavar = 'LOG', default = '3',
            help = 'Log level(-1/0/1/2/3/4/5) Default value is \'3\'')
    parser.add_argument('-t', '--title', metavar = 'TITLE',
            default = 'y',
            help = 'Program title flag.(y/n) Default value is \'y\'')
    parser.add_argument('-s', '--speed', metavar = 'SPEED',
            default = 'y',
            help = 'Speed display flag.(y/n) Default calue is \'y\'')
    parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT',
            default = 'non',
            help = 'Processed image file path. Default value is \'non\'')
    parser.add_argument('-p', '--pre_api', required = False, action='store_true', 
            help = 'Preprocessing api.')
    parser.add_argument('-bs', '--batchsize', required = False, default = 1, type=int,
            help = 'Batch size.')
    parser.add_argument('-n', '--nireq', required = False, default = 1, type=int,
            help = 'number of infer request.')
    parser.add_argument('-g', '--grid', required = False, action = 'store_true', 
            help = 'With grid in model.')
    return parser


# モデル基本情報の表示
def display_info(title, input_str, model, device, conf, iou, labels, log, titleflg, speedflg, outpath, pre_api, batchsize, nireq, grid):
    printG(f' - Program title  : {title}')
    printG(f' - OpenCV version : {cv2.__version__}')
    printG(f' - OpenVINO engine: {get_version()}')
    printY(f' - Input image    : {input_str}')
    printY(f' - Model          : {model}')
    printY(f' - Device         : {device}')
    printY(f' - Confidence thr : {conf}')
    printY(f' - IOU threshold  : {iou}')
    printY(f' - Label          : {labels}')
    printY(f' - Log level      : {log}')
    printY(f' - Title flag     : {titleflg}')
    printY(f' - Speed flag     : {speedflg}')
    printY(f' - Processed out  : {outpath}')
    printB(f' - Preprocessing  : {pre_api}')
    printB(f' - Batch size     : {batchsize}')
    printB(f' - number of inf  : {nireq}')
    printB(f' - With grid      : {grid}')

# ** main関数 **
def main():
    # 入力パラメータの処理
    args = parse_args().parse_args()
    input_str = args.input
    log = args.log
    logsel = int(log)
    labels = args.labels
    titleflg = args.title
    speedflg = args.speed
    model = args.model
    device = args.device
    outpath = args.out
    conf = args.conf
    iou = args.iou

    pre_api = args.pre_api
    batchsize = args.batchsize
    nireq = args.nireq
    grid = args.grid

    # ファイル処理クラスのインスタンス
    my_file_treatment = my_file.FileTreatment(logsel)

    # 入力ソースの違いによる前処理
    if input_str == '0':
        input_str = my_dialog.select_image_movie_file(initdir = '../../Videos')
        if len(input_str) <= 0:
            printR('Program canceled.')
            return 0
    input_stream = input_str
    if input_str.lower() == "cam" or input_str.lower() == "camera":
        input_stream = 0
        isstream = True
    else:
        filetype = my_file_treatment.is_pict(input_stream)
        isstream = filetype == 'None'
        if (filetype == 'NotFound'):
            printR(f"'{input_stream}': Input file Not found.")
            return 0
    if isstream and outpath != 'non' and os.path.splitext(outpath)[1] != '.mp4':
        printR(f"'{outpath}': Output file type is '.mp4'")
        return 0

    # アプリケーション・ログ設定
    module = os.path.basename(__file__)
    module_name = os.path.splitext(module)[0]
    logger = my_logging.get_module_logger_sel(module_name, logsel)
    logger.info('Starting..')

    # YOLO7 オブジェクト
    yolov7_detector=yolov7s.YOLOV7_OPENVINO(model, device, conf, iou, pre_api, batchsize, nireq, grid)
    title = yolov7_detector.title

    # 分類ラベルの取得
    with open(labels, encoding='utf-8') as labels_file:
        label_list = labels_file.read().splitlines()
    yolov7_detector.set_classes(label_list)

    # 情報表示
    display_info(title, input_str, model, device, conf, iou, labels, log, titleflg, speedflg, outpath, pre_api, batchsize, nireq, grid)
    logger.debug(f'Object list\n {label_list}')

    # 関数内パラメータ
    window_name = yolov7_detector.title + "  (hit 'q' or 'esc' key to exit)"

    # 入力準備
    if (isstream):
        # カメラ 
        cap = cv2.VideoCapture(input_stream)
        success, img_frame = cap.read()
        if success == False:
            logger.error('camera read error !!')
        loopflg = cap.isOpened()
        img_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        img_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    else:
        # 画像ファイル読み込み
        img_frame = cv2.imread(input_stream)
        if img_frame is None:
            logger.error('unable to read the input.')
            return 0

        # アスペクト比を固定してリサイズ（スクリーンに収まるサイズ）
        h, w = my_movedlg.get_display_size()
        mx = h - 10 if h > w else w - 10
        img_frame = my_process.frame_resize(img_frame, mx)
        loopflg = True                         # 1回ループ
        img_width = img_frame.shape[1]
        img_height = img_frame.shape[0]

    # ウインドウ初期設定
    cv2.namedWindow(window_name, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) 
    cv2.moveWindow(window_name, 20, 0)

    # 処理結果の記録 step1
    if (outpath != 'non'):
        if (isstream):
            fps = int(cap.get(cv2.CAP_PROP_FPS))
            fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
            outvideo = cv2.VideoWriter(outpath, fourcc, fps, (img_width, img_height))

    # YOLO7 Set callback function for postprocess
    yolov7_detector.infer_queue.set_callback(yolov7_detector.postprocess)

    # fps計測開始
    yolov7_detector.fpsWithTick.get()

    # メインループ 
    while (loopflg):
        if img_frame is None:
            logger.error('unable to read the input.')
            return 0

        src_img_list = []
        src_img_list.append(img_frame)
        img = yolov7_detector.letterbox(img_frame, yolov7_detector.img_size)
        src_size = img_frame.shape[:2]
        img = img.astype(dtype=np.float32)
        if (yolov7_detector.pre_api == False):
            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)  # BGR to RGB
            img /= 255.0
            img = img.transpose(2, 0, 1) # NHWC to NCHW
        input_image = np.expand_dims(img, 0)

        # Do inference
        yolov7_detector.infer_queue.start_async({yolov7_detector.input_layer.any_name: input_image}, (src_img_list, src_size))

        # ウインドウの更新/画像表示 
        yolov7_detector.infer_queue.wait_all()
        cv2.imshow(window_name, img_frame)

        # 処理結果の記録 step2
        if (outpath != 'non'):
            if (isstream):
                outvideo.write(img_frame)
            else:
                cv2.imwrite(outpath, img_frame)

        # 何らかのキーが押されたら終了 
        breakflg = False
        while(True):
            key = cv2.waitKey(1)                        # (カメラ・アクセス速度調整)
            if key == 27 or key == 113:                 # 'esc' or 'q'
                breakflg = True
                break
            if not my_winstat._is_visible(window_name): # 'Close' button
                breakflg = True
                break
            if (isstream):
                break

        if ((breakflg == False) and isstream):
            # 次のフレームを読み出す
            success, img_frame = cap.read()
            if success == False:                        # 動画終了
                break
            loopflg = cap.isOpened()
        else:
            loopflg = False

    # 終了処理 
    if (isstream):
        cap.release()

        # 処理結果の記録 step3
        if (outpath != 'non'):
            if (isstream):
                outvideo.release()

    cv2.destroyAllWindows()

    printC('FPS average: {:>10.2f}'.format(yolov7_detector.fpsWithTick.get_average()))
    logger.info('Finished.\n')

if __name__ == "__main__":
    sys.exit(main() or 0)

▼「yolov7s.py」

# -*- coding: utf-8 -*-
##------------------------------------------
##  YOLOV7_OPENVINO             Ver 0.03
##
##               2023.04.12 Masahiro Izutsu
##------------------------------------------
## GitHub https://github.com/OpenVINO-dev-contest/YOLOv7_OpenVINO_cpp-python
##
## yolo7s.py
##  ver 0.02    2023.05.19      ラベルファイル数を取得するように変更
##  ver 0.03    2023.07.18      conf, iou コマンドパラメータによる設定追加

from openvino.runtime import Core
import cv2
import numpy as np
import random
import time
from openvino.preprocess import PrePostProcessor, ColorFormat
from openvino.runtime import Layout, AsyncInferQueue, PartialShape

import my_fps
import my_puttext
import my_color80
TEXT_COLOR = my_color80.CR_white

class YOLOV7_OPENVINO(object):
    def __init__(self, model_path, device, conf, iou, pre_api, batchsize, nireq, grid):
        self.title = 'Object detection YOLO V7'
        # 日本語フォント指定
        self.fontFace = my_puttext.get_font()
        # 計測値
        self.fpsWithTick = my_fps.fpsWithTick()
        self.speedflg = 'y'

        # set the hyperparameters
        self.classes = [
        "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
        "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
        "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
        "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
        "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
        "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
        "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
        "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
        "hair drier", "toothbrush"
       ]
        self.batchsize = batchsize
        self.grid = grid
        self.img_size = (640, 640) 
        self.conf_thres = conf
        self.iou_thres = iou
        self.class_num = len(self.classes)

        self.colors = [[random.randint(0, 255) for _ in range(3)] for _ in self.classes]
        self.stride = [8, 16, 32]
        self.anchor_list = [[12, 16, 19, 36, 40, 28], [36, 75, 76, 55, 72, 146], [142, 110, 192, 243, 459, 401]]
        self.anchor = np.array(self.anchor_list).astype(float).reshape(3, -1, 2)
        area = self.img_size[0] * self.img_size[1]
        self.size = [int(area / self.stride[0] ** 2), int(area / self.stride[1] ** 2), int(area / self.stride[2] ** 2)]
        self.feature = [[int(j / self.stride[i]) for j in self.img_size] for i in range(3)]

        ie = Core()
        self.model = ie.read_model(model_path)
        self.input_layer = self.model.input(0)
        new_shape = PartialShape([self.batchsize, 3, self.img_size[0], self.img_size[1]])
        self.model.reshape({self.input_layer.any_name: new_shape})
        self.pre_api = pre_api
        if (self.pre_api == True):
            # Preprocessing API
            ppp = PrePostProcessor(self.model)
            # Declare section of desired application's input format
            ppp.input().tensor() \
                .set_layout(Layout("NHWC")) \
                .set_color_format(ColorFormat.BGR)
            # Here, it is assumed that the model has "NCHW" layout for input.
            ppp.input().model().set_layout(Layout("NCHW"))
            # Convert current color format (BGR) to RGB
            ppp.input().preprocess() \
                .convert_color(ColorFormat.RGB) \
                .scale([255.0, 255.0, 255.0])
            self.model = ppp.build()
            print(f'Dump preprocessor: {ppp}')

        self.compiled_model = ie.compile_model(model=self.model, device_name=device)
        self.infer_queue = AsyncInferQueue(self.compiled_model, nireq)

    def set_classes(self, classes):
        self.classes = classes
        self.class_num = len(self.classes)

    def letterbox(self, img, new_shape=(640, 640), color=(114, 114, 114)):
        # Resize and pad image while meeting stride-multiple constraints
        shape = img.shape[:2]  # current shape [height, width]
        if isinstance(new_shape, int):
            new_shape = (new_shape, new_shape)

        # Scale ratio (new / old)
        r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
        new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
        dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - \
            new_unpad[1]  # wh padding

        # divide padding into 2 sides
        dw /= 2
        dh /= 2

        # resize
        if shape[::-1] != new_unpad:
            img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
        top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
        left, right = int(round(dw - 0.1)), int(round(dw + 0.1))

        # add border
        img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)
        return img

    def xywh2xyxy(self, x):
        # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right
        y = np.copy(x)
        y[:, 0] = x[:, 0] - x[:, 2] / 2  # top left x
        y[:, 1] = x[:, 1] - x[:, 3] / 2  # top left y
        y[:, 2] = x[:, 0] + x[:, 2] / 2  # bottom right x
        y[:, 3] = x[:, 1] + x[:, 3] / 2  # bottom right y
        return y

    def nms(self, prediction, conf_thres, iou_thres):
        predictions = np.squeeze(prediction[0])

        # Filter out object confidence scores below threshold
        obj_conf = predictions[:, 4]
        predictions = predictions[obj_conf > conf_thres]
        obj_conf = obj_conf[obj_conf > conf_thres]

        # Multiply class confidence with bounding box confidence
        predictions[:, 5:] *= obj_conf[:, np.newaxis]

        # Get the scores
        scores = np.max(predictions[:, 5:], axis=1)

        # Filter out the objects with a low score
        valid_scores = scores > conf_thres
        predictions = predictions[valid_scores]
        scores = scores[valid_scores]

        # Get the class with the highest confidence
        class_ids = np.argmax(predictions[:, 5:], axis=1)

        # Get bounding boxes for each object
        boxes = self.xywh2xyxy(predictions[:, :4])

        # Apply non-maxima suppression to suppress weak, overlapping bounding boxes
        # indices = nms(boxes, scores, self.iou_threshold)
        indices = cv2.dnn.NMSBoxes(boxes.tolist(), scores.tolist(), conf_thres, iou_thres)
        return boxes[indices], scores[indices], class_ids[indices]

    def clip_coords(self, boxes, img_shape):
        # Clip bounding xyxy bounding boxes to image shape (height, width)
        boxes[:, 0].clip(0, img_shape[1])  # x1
        boxes[:, 1].clip(0, img_shape[0])  # y1
        boxes[:, 2].clip(0, img_shape[1])  # x2
        boxes[:, 3].clip(0, img_shape[0])  # y2

    def scale_coords(self, img1_shape, img0_shape, coords, ratio_pad=None):
        # Rescale coords (xyxy) from img1_shape to img0_shape
        # gain  = old / new
        if ratio_pad is None:
            gain = min(img1_shape[0] / img0_shape[0],
                       img1_shape[1] / img0_shape[1])
            padding = (img1_shape[1] - img0_shape[1] * gain) / \
                2, (img1_shape[0] - img0_shape[0] * gain) / 2
        else:
            gain = ratio_pad[0][0]
            padding = ratio_pad[1]
        coords[:, [0, 2]] -= padding[0]  # x padding
        coords[:, [1, 3]] -= padding[1]  # y padding
        coords[:, :4] /= gain
        self.clip_coords(coords, img0_shape)

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def draw(self, img, boxinfo):
        # タイトル描画
        if len(self.title) != 0:
            cv2.putText(img, self.title, (10 + 2, 30 + 2), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(0, 0, 0), lineType=cv2.LINE_AA)
            cv2.putText(img, self.title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA)

        # FPSを表示
        fps = self.fpsWithTick.get()
        st_fps = 'fps: {:>6.2f}'.format(fps)
        if (self.speedflg == 'y'):
            cv2.rectangle(img, (10, 38), (95, 55), (90, 90, 90), -1)
            cv2.putText(img, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA)

        for xyxy, conf, cls in boxinfo:
            xmin = int(xyxy[0])
            ymin = int(xyxy[1])
            xmax = int(xyxy[2])
            ymax = int(xyxy[3])
            label = self.classes[int(cls)] + ' ' + str(round(conf * 100, 1)) + '%'
            # Plots one bounding box on image img
            color_boder = my_color80.get_boder_bgr80(int(cls))
            color_back = my_color80.get_back_bgr80(int(cls))
            x1, y1, x2, y2 = my_puttext.cv2_putText(img, label, (xmin + 4, ymin - 7), self.fontFace, 14, TEXT_COLOR, 0, True)       # 描画範囲の取得
            cv2.rectangle(img, (xmin, ymin -28), (max(xmax, x2), ymin), color_back, -1)
            cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color_boder, 2)
            my_puttext.cv2_putText(img, label, (xmin + 4, ymin - 7), self.fontFace, 14, TEXT_COLOR, 0)

    def postprocess(self, infer_request, info):
        src_img_list, src_size = info
        for batch_id in range(self.batchsize):
            if self.grid:
                results = np.expand_dims(infer_request.get_output_tensor(0).data[batch_id], axis=0)
            else:
                output = []
                # Get the each feature map's output data
                output.append(self.sigmoid(infer_request.get_output_tensor(0).data[batch_id].reshape(-1, self.size[0]*3, 5+self.class_num)))
                output.append(self.sigmoid(infer_request.get_output_tensor(1).data[batch_id].reshape(-1, self.size[1]*3, 5+self.class_num)))
                output.append(self.sigmoid(infer_request.get_output_tensor(2).data[batch_id].reshape(-1, self.size[2]*3, 5+self.class_num)))
                
                # Postprocessing
                grid = []
                for _, f in enumerate(self.feature):
                    grid.append([[i, j] for j in range(f[0]) for i in range(f[1])])

                result = []
                for i in range(3):
                    src = output[i]
                    xy = src[..., 0:2] * 2. - 0.5
                    wh = (src[..., 2:4] * 2) ** 2
                    dst_xy = []
                    dst_wh = []
                    for j in range(3):
                        dst_xy.append((xy[:, j * self.size[i]:(j + 1) * self.size[i], :] + grid[i]) * self.stride[i])
                        dst_wh.append(wh[:, j * self.size[i]:(j + 1) *self.size[i], :] * self.anchor[i][j])
                    src[..., 0:2] = np.concatenate((dst_xy[0], dst_xy[1], dst_xy[2]), axis=1)
                    src[..., 2:4] = np.concatenate((dst_wh[0], dst_wh[1], dst_wh[2]), axis=1)
                    result.append(src)
                results = np.concatenate(result, 1)
                
            boxes, scores, class_ids = self.nms(results, self.conf_thres, self.iou_thres)
            img_shape = self.img_size
            self.scale_coords(img_shape, src_size, boxes)

            # Draw the results
            self.draw(src_img_list[batch_id], zip(boxes, scores, class_ids))

ファイルの場所 /workspace_py37/mylib
□ Python 私的汎用ライブラリ

↑

その他の「YOLO」アルゴリズム検証 †

　※「py38a」仮想環境で実行可能な修正を加えて本プロジェクト内に再配置した

↑

実行例 †

YOLO V7 / YOLO V5 / YOLO V3

YOLO V5 入力パラメータなしの場合

(py38a) PS C:\anaconda_win\work\YOLOv7_OpenVINO_cpp-python-main> cd ../openvino
(py38a) PS C:\anaconda_win\work\openvino> python object_detect_yolo5.py
Starting..
 - Program title  : Object detection YOLO V5
 - OpenCV version : 4.5.5
 - OpenVINO engine: 2022.1.0-7019-cdb9bec7210-releases/2022/1
 - Input image    : ../../Videos/car_m.mp4
 - Model          : ../../model/yolov5s.xml
 - Device         : CPU
 - Label          : coco.names_jp
 - Threshold      : 0.5
 - Threshold (IOU): 0.4
 - Log level      : 3
 - Program Title  : y
 - Speed flag     : y
 - Processed out  : non
Loading model to the plugin
Starting inference...
FPS average:       2.70
Finished.

YOLO V3 入力パラメータなしの場合

(py38a) PS C:\anaconda_win\work\openvino> python object_detect_yolo3.py
Starting..
 - Program title  : Object detection TinyYOLO V3
 - OpenCV version : 4.5.5
 - OpenVINO engine: 2022.1.0-7019-cdb9bec7210-releases/2022/1
 - Input image    : ../../Videos/car_m.mp4
 - Model          : ../../model/public/FP32/yolo-v3-tiny-tf.xml
 - Device         : CPU
 - Labels File    : coco.names_jp
 - Threshold      : 0.6
 - Intersection Over Union: 0.25
 - Log level      : 3
 - Program Title  : y
 - Speed flag     : y
 - Processed out  : non
 - Input Shape    : [1, 3, 416, 416]
 - Output Shapes  :
    - output #0 name : conv2d_12/Conv2D/YoloRegion
       - output shape: [1, 255, 26, 26]
    - output #1 name : conv2d_9/Conv2D/YoloRegion
       - output shape: [1, 255, 13, 13]
FPS average:      17.30
Finished.

↑

これまでの検証ページ †

↑

ソースコード †

ファイルの場所 /work/openvino

▼「object_detect_yolo5.py」

# -*- coding: utf-8 -*-
##------------------------------------------
##  Object detection YOLO V5      Ver 0.05
##    with OpenVINO™ toolkit
##
##               2022.11.01 Masahiro Izutsu
##------------------------------------------
##  ver 0.00    2021.10.02 yolov5_demo_OV.py  (original: yolov5_demo_OV2021.3.py)
##                         YOLO V5 OpenVINO demoprogram
##                          for    https://github.com/ultralytics/yolov5/tree/v5.0
##                          GitHub https://github.com/violet17/yolov5_demo
##
## object_detect_yolo5.py     (yolov5_demo_OV.py   2021.10.02)
##  ver 0.05    2023.04.12 クラス描画範囲の出力対応

# 定数定義
title = 'Object detection YOLO V5'

from os.path import expanduser
MODEL_DEF = expanduser('../../model/yolov5s.xml')
INPUT_IMAGE_DEF = '../../Videos/car_m.mp4'             # 入力イメージ・パラメータ初期値
LABEL_DEF = 'coco.names_jp'

# モジュール読み込み
from openvino.inference_engine import IECore
from openvino.inference_engine import get_version

# import処理
import sys
import cv2
import numpy as np
import os
import argparse
import my_logging
import my_puttext
import my_file
import my_movedlg
import my_dialog
import my_process
import my_winstat
import my_fps
import my_color80
from my_print import printG, printR, printY, printC, printB, printN
from math import exp as exp

# Adjust these thresholds
PROB_THRESHOLD = 0.5
IOU_THRESHOLD = 0.4

# Used for display
TEXT_COLOR = my_color80.CR_white

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--input', metavar = 'INPUT_IMAGE', type=str, default = INPUT_IMAGE_DEF,
            help = 'Absolute path to image file or cam for camera stream.'
            'Default value is \''+INPUT_IMAGE_DEF+'\'')
    parser.add_argument('-m', '--model', type=str,
            default = MODEL_DEF,
            help = 'Model Path to an .xml file with a trained model.'
            'Default value is '+MODEL_DEF)
    parser.add_argument('-d', '--device', default='CPU', type = str,
            help = 'Optional. Specify a target device to infer on. CPU, GPU, FPGA, HDDL or MYRIAD is '
                   'acceptable. The demo will look for a suitable plugin for the device specified. '
                   'Default value is CPU')
    parser.add_argument('-lb', '--labels', metavar = 'LABEL_FILE', type = str, default = LABEL_DEF,
            help = 'Absolute path to labels file.'
            'Default value is \''+LABEL_DEF+'\'')
    parser.add_argument('--threshold', metavar = 'FLOAT', type = float, default = PROB_THRESHOLD,
            help = 'Optional. Probability threshold for detections filtering'
            'Default value is \''+ str(PROB_THRESHOLD) +'\'')
    parser.add_argument('--iou', metavar = 'FLOAT', type = float, default = IOU_THRESHOLD,
            help = 'Optional. Intersection over union threshold for overlapping detections filtering'
            'Default value is \''+ str(IOU_THRESHOLD) +'\'')
    parser.add_argument('--log', metavar = 'LOG', default = '3',
            help = 'Log level(-1/0/1/2/3/4/5) Default value is \'3\'')
    parser.add_argument('-t', '--title', metavar = 'TITLE',
            default = 'y',
            help = 'Program title flag.(y/n) Default value is \'y\'')
    parser.add_argument('-s', '--speed', metavar = 'SPEED',
            default = 'y',
            help = 'Speed display flag.(y/n) Default calue is \'y\'')
    parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT',
            default = 'non',
            help = 'Processed image file path. Default value is \'non\'')
    return parser

# モデル基本情報の表示
def display_info(input_str, model, device, labels, threshold, iou_threshold, log, titleflg, speedflg, outpath):
    printG(f' - Program title  : {title}')
    printG(f' - OpenCV version : {cv2.__version__}')
    printG(f' - OpenVINO engine: {get_version()}')
    printY(f' - Input image    : {input_str}')
    printY(f' - Model          : {model}')
    printY(f' - Device         : {device}')
    printY(f' - Label          : {labels}')
    printY(f' - Threshold      : {threshold}')
    printY(f' - Threshold (IOU): {iou_threshold}')
    printY(f' - Log level      : {log}')
    printY(f' - Program Title  : {titleflg}')
    printY(f' - Speed flag     : {speedflg}')
    printY(f' - Processed out  : {outpath}')

class YoloParams:
    # ------------------------------------------- Extracting layer parameters ------------------------------------------
    # Magic numbers are copied from yolo samples
    def __init__(self,  side):
        self.num = 3 #if 'num' not in param else int(param['num'])
        self.coords = 4 #if 'coords' not in param else int(param['coords'])
        self.classes = 80 #if 'classes' not in param else int(param['classes'])
        self.side = side
        self.anchors = [10.0, 13.0, 16.0, 30.0, 33.0, 23.0, 30.0, 61.0, 62.0, 45.0, 59.0, 119.0, 116.0, 90.0, 156.0,
                        198.0,
                        373.0, 326.0] #if 'anchors' not in param else [float(a) for a in param['anchors'].split(',')]

        #self.isYoloV3 = False

        #if param.get('mask'):
        #    mask = [int(idx) for idx in param['mask'].split(',')]
        #    self.num = len(mask)

        #    maskedAnchors = []
        #    for idx in mask:
        #        maskedAnchors += [self.anchors[idx * 2], self.anchors[idx * 2 + 1]]
        #    self.anchors = maskedAnchors

        #    self.isYoloV3 = True # Weak way to determine but the only one.

    def log_params(self, logger):
        params_to_print = {'classes': self.classes, 'num': self.num, 'coords': self.coords, 'anchors': self.anchors}
        [logger.debug("         {:8}: {}".format(param_name, param)) for param_name, param in params_to_print.items()]


def letterbox(img, size=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True):
    # Resize image to a 32-pixel-multiple rectangle https://github.com/ultralytics/yolov3/issues/232
    shape = img.shape[:2]  # current shape [height, width]
    w, h = size

    # Scale ratio (new / old)
    r = min(h / shape[0], w / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better test mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = w - new_unpad[0], h - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, 64), np.mod(dh, 64)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (w, h)
        ratio = w / shape[1], h / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        img = cv2.resize(img, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    img = cv2.copyMakeBorder(img, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border

    top2, bottom2, left2, right2 = 0, 0, 0, 0
    if img.shape[0] != h:
        top2 = (h - img.shape[0])//2
        bottom2 = top2
        img = cv2.copyMakeBorder(img, top2, bottom2, left2, right2, cv2.BORDER_CONSTANT, value=color)  # add border
    elif img.shape[1] != w:
        left2 = (w - img.shape[1])//2
        right2 = left2
        img = cv2.copyMakeBorder(img, top2, bottom2, left2, right2, cv2.BORDER_CONSTANT, value=color)  # add border
    return img


def scale_bbox(x, y, height, width, class_id, confidence, im_h, im_w, resized_im_h=640, resized_im_w=640):
    gain = min(resized_im_w / im_w, resized_im_h / im_h)  # gain  = old / new
    pad = (resized_im_w - im_w * gain) / 2, (resized_im_h - im_h * gain) / 2  # wh padding
    x = int((x - pad[0])/gain)
    y = int((y - pad[1])/gain)

    w = int(width/gain)
    h = int(height/gain)
 
    xmin = max(0, int(x - w / 2))
    ymin = max(0, int(y - h / 2))
    xmax = min(im_w, int(xmin + w))
    ymax = min(im_h, int(ymin + h))
    # Method item() used here to convert NumPy types to native types for compatibility with functions, which don't
    # support Numpy types (e.g., cv2.rectangle doesn't support int64 in color parameter)
    return dict(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax, class_id=class_id.item(), confidence=confidence.item())


def entry_index(side, coord, classes, location, entry):
    side_power_2 = side ** 2
    n = location // side_power_2
    loc = location % side_power_2
    return int(side_power_2 * (n * (coord + classes + 1) + entry) + loc)


def parse_yolo_region(blob, resized_image_shape, original_im_shape, params, threshold):
    # ------------------------------------------ Validating output parameters ------------------------------------------    
    out_blob_n, out_blob_c, out_blob_h, out_blob_w = blob.shape
    predictions = 1.0/(1.0+np.exp(-blob)) 
                   
    assert out_blob_w == out_blob_h, "Invalid size of output blob. It sould be in NCHW layout and height should " \
                                     "be equal to width. Current height = {}, current width = {}" \
                                     "".format(out_blob_h, out_blob_w)

    # ------------------------------------------ Extracting layer parameters -------------------------------------------
    orig_im_h, orig_im_w = original_im_shape
    resized_image_h, resized_image_w = resized_image_shape
    objects = list()
 
    side_square = params.side * params.side

    # ------------------------------------------- Parsing YOLO Region output -------------------------------------------
    bbox_size = int(out_blob_c/params.num) #4+1+num_classes

    for row, col, n in np.ndindex(params.side, params.side, params.num):
        bbox = predictions[0, n*bbox_size:(n+1)*bbox_size, row, col]
        
        x, y, width, height, object_probability = bbox[:5]
        class_probabilities = bbox[5:]
        if object_probability < threshold:
            continue
        x = (2*x - 0.5 + col)*(resized_image_w/out_blob_w)
        y = (2*y - 0.5 + row)*(resized_image_h/out_blob_h)
        if int(resized_image_w/out_blob_w) == 8 & int(resized_image_h/out_blob_h) == 8: #80x80, 
            idx = 0
        elif int(resized_image_w/out_blob_w) == 16 & int(resized_image_h/out_blob_h) == 16: #40x40
            idx = 1
        elif int(resized_image_w/out_blob_w) == 32 & int(resized_image_h/out_blob_h) == 32: # 20x20
            idx = 2

        width = (2*width)**2* params.anchors[idx * 6 + 2 * n]
        height = (2*height)**2 * params.anchors[idx * 6 + 2 * n + 1]
        class_id = np.argmax(class_probabilities)
        confidence = object_probability
        objects.append(scale_bbox(x=x, y=y, height=height, width=width, class_id=class_id, confidence=confidence,
                                  im_h=orig_im_h, im_w=orig_im_w, resized_im_h=resized_image_h, resized_im_w=resized_image_w))
    return objects


def intersection_over_union(box_1, box_2):
    width_of_overlap_area = min(box_1['xmax'], box_2['xmax']) - max(box_1['xmin'], box_2['xmin'])
    height_of_overlap_area = min(box_1['ymax'], box_2['ymax']) - max(box_1['ymin'], box_2['ymin'])
    if width_of_overlap_area < 0 or height_of_overlap_area < 0:
        area_of_overlap = 0
    else:
        area_of_overlap = width_of_overlap_area * height_of_overlap_area
    box_1_area = (box_1['ymax'] - box_1['ymin']) * (box_1['xmax'] - box_1['xmin'])
    box_2_area = (box_2['ymax'] - box_2['ymin']) * (box_2['xmax'] - box_2['xmin'])
    area_of_union = box_1_area + box_2_area - area_of_overlap
    if area_of_union == 0:
        return 0
    return area_of_overlap / area_of_union

# ** main関数 **
def main():
    # 日本語フォント指定
    fontFace = my_puttext.get_font()

    # 入力パラメータの処理
    ARGS = parse_args().parse_args()
    input_str = ARGS.input
    log = ARGS.log
    logsel = int(log)
    labels = ARGS.labels
    titleflg = ARGS.title
    speedflg = ARGS.speed
    model = ARGS.model
    threshold = ARGS.threshold
    iou_threshold = ARGS.iou
    device = ARGS.device
    outpath = ARGS.out

    # ファイル処理クラスのインスタンス
    my_file_treatment = my_file.FileTreatment(logsel)

    # 入力ソースの違いによる前処理
    if input_str == '0':
        input_str = my_dialog.select_image_movie_file(initdir = '../../Videos')
        if len(input_str) <= 0:
            printR('Program canceled.')
            return 0
    input_stream = input_str
    if input_str.lower() == "cam" or input_str.lower() == "camera":
        input_stream = 0
        isstream = True
    else:
        filetype = my_file_treatment.is_pict(input_stream)
        isstream = filetype == 'None'
        if (filetype == 'NotFound'):
            printR(f"'{input_stream}': Input file Not found.")
            return 0
    if isstream and outpath != 'non' and os.path.splitext(outpath)[1] != '.mp4':
        printR(f"'{outpath}': Output file type is '.mp4'")
        return 0

    # アプリケーション・ログ設定
    module = os.path.basename(__file__)
    module_name = os.path.splitext(module)[0]
    logger = my_logging.get_module_logger_sel(module_name, logsel)
    logger.info('Starting..')
    
    # 分類ラベルの取得
    with open(labels, encoding='utf-8') as labels_file:
        label_list = labels_file.read().splitlines()

    # 情報表示
    display_info(input_str, model, device, labels, threshold, iou_threshold, log, titleflg, speedflg, outpath)
    logger.debug(f'Object list\n {label_list}')


    ## 1. Plugin initialization for specified device and load extensions library if specified
    logger.debug("Creating Inference Engine...")
    ie = IECore()

    ## 2. Reading the IR generated by the Model Optimizer (.xml and .bin files)
    logger.debug(f"Loading network:\n\t{model}")
    net = ie.read_network(model=model)

    ## 3. Load CPU extension for support specific layer

    ## 4. Preparing inputs
    logger.debug("Preparing inputs")
    input_blob = next(iter(net.input_info))

    #  Defaulf batch_size is 1
    net.batch_size = 1

    # Read and pre-process input images
    in_n, in_c, in_h, in_w = net.input_info[input_blob].input_data.shape

    # 関数内パラメータ
    window_name = title + "  (hit 'q' or 'esc' key to exit)"

    # 入力準備
    if (isstream):
        # カメラ 
        cap = cv2.VideoCapture(input_stream)
        success, img_frame = cap.read()
        if success == False:
            logger.error('camera read error !!')
        loopflg = cap.isOpened()
        img_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        img_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    else:
        # 画像ファイル読み込み
        img_frame = cv2.imread(input_stream)
        if img_frame is None:
            logger.error('unable to read the input.')
            return 0

        # アスペクト比を固定してリサイズ（スクリーンに収まるサイズ）
        h, w = my_movedlg.get_display_size()
        mx = h - 10 if h > w else w - 10
        img_frame = my_process.frame_resize(img_frame, mx)
        loopflg = True                         # 1回ループ
        img_width = img_frame.shape[1]
        img_height = img_frame.shape[0]

    # ウインドウ初期設定
    cv2.namedWindow(window_name, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) 
    cv2.moveWindow(window_name, 20, 0)

    # 処理結果の記録 step1
    if (outpath != 'non'):
        if (isstream):
            fps = int(cap.get(cv2.CAP_PROP_FPS))
            fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
            outvideo = cv2.VideoWriter(outpath, fourcc, fps, (img_width, img_height))

    ## 5. Loading model to the plugin
    logger.info("Loading model to the plugin")
    exec_net = ie.load_network(network=net, num_requests=2, device_name=device)

    cur_request_id = 0
    render_time = 0
    parsing_time = 0

    ## 6. Doing inference
    logger.info("Starting inference...")

    # 計測値初期化
    fpsWithTick = my_fps.fpsWithTick()
    frame_count = 0
    fps_total = 0
    fpsWithTick.get()                         # fps計測開始

    # メインループ 
    while (loopflg):
        if img_frame is None:
            logger.error('unable to read the input.')
            return 0
        frame = img_frame

        in_frame = letterbox(frame, (in_w, in_h))
        in_frame0 = in_frame
        # resize input_frame to network size
        in_frame = in_frame.transpose((2, 0, 1))  # Change data layout from HWC to CHW
        in_frame = in_frame.reshape((in_n, in_c, in_h, in_w))

        # Start inference
        req_handle = exec_net.start_async(request_id=0, inputs={input_blob: in_frame})
        status = req_handle.wait()

        # Collecting object detection results
        objects = list()
        if exec_net.requests[cur_request_id].wait(-1) == 0:
            output = exec_net.requests[cur_request_id].output_blobs
            for layer_name, out_blob in output.items():
                layer_params = YoloParams(side=out_blob.buffer.shape[2])
                
                logger.debug("Layer {} parameters: ".format(layer_name))         # ログを表示
                layer_params.log_params(logger)
                objects += parse_yolo_region(out_blob.buffer, in_frame.shape[2:], frame.shape[:-1], layer_params, threshold)

        # Filtering overlapping boxes with respect to the --iou_threshold CLI parameter
        objects = sorted(objects, key=lambda obj : obj['confidence'], reverse=True)
        for i in range(len(objects)):
            if objects[i]['confidence'] == 0:
                continue
            for j in range(i + 1, len(objects)):
                if intersection_over_union(objects[i], objects[j]) > iou_threshold:
                    objects[j]['confidence'] = 0

        # Drawing objects with respect to the --prob_threshold CLI parameter
        objects = [obj for obj in objects if obj['confidence'] >= threshold]

        if (titleflg == 'y'):
            logger.debug("\nDetected boxes for batch {}:".format(1))
            logger.debug("  Class ID | Confidence | XMIN | YMIN | XMAX | YMAX | COLOR ")

        origin_im_size = frame.shape[:-1]
        for obj in objects:
            # Validation bbox of detected object
            if obj['xmax'] > origin_im_size[1] or obj['ymax'] > origin_im_size[0] or obj['xmin'] < 0 or obj['ymin'] < 0:
                continue

            # オブジェクト別の色指定
            BOX_COLOR = my_color80.get_boder_bgr80(obj['class_id'])
            LABEL_BG_COLOR = my_color80.get_back_bgr80(obj['class_id'])

            det_label = label_list[obj['class_id']] if label_list and len(label_list) >= obj['class_id'] else \
                str(obj['class_id'])

            if (titleflg == 'y'):
                logger.debug(
                    "{:^9} | {:10f} | {:4} | {:4} | {:4} | {:4} | {} ".format(det_label, obj['confidence'], obj['xmin'],
                                                                              obj['ymin'], obj['xmax'], obj['ymax'],
                                                                              BOX_COLOR))

            lablstr = det_label + ' ' + str(round(obj['confidence'] * 100, 1)) + ' %'
            x1, y1, x2, y2 = my_puttext.cv2_putText(frame, lablstr, (obj['xmin'] + 4, obj['ymin'] - 7), fontFace, 14, TEXT_COLOR, 0, True)       # 描画範囲の取得
            cv2.rectangle(frame, (obj['xmin'], obj['ymin']-26), (max(obj['xmax'], x2), obj['ymin']), LABEL_BG_COLOR, -1)
            cv2.rectangle(frame, (obj['xmin'], obj['ymin']), (obj['xmax'], obj['ymax']), BOX_COLOR, 2)
            my_puttext.cv2_putText(frame, lablstr, (obj['xmin'] + 4, obj['ymin'] - 7), fontFace, 14, TEXT_COLOR, 0)

        # FPSを計算する
        fps = fpsWithTick.get()
        st_fps = 'fps: {:>6.2f}'.format(fps)
        if (speedflg == 'y'):
            cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1)
            cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA)

        # タイトル描画
        if (titleflg == 'y'):
            cv2.putText(frame, title, (10 + 2, 30 + 2), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(0, 0, 0), lineType=cv2.LINE_AA)
            cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA)

        # ウインドウの更新/画像表示 
        cv2.imshow(window_name, frame)

        # 処理結果の記録 step2
        if (outpath != 'non'):
            if (isstream):
                outvideo.write(frame)
            else:
                cv2.imwrite(outpath, frame)

        # 何らかのキーが押されたら終了 
        breakflg = False
        while(True):
            key = cv2.waitKey(1)                        # (カメラ・アクセス速度調整)
            if key == 27 or key == 113:                 # 'esc' or 'q'
                breakflg = True
                break
            if not my_winstat._is_visible(window_name): # 'Close' button
                breakflg = True
                break
            if (isstream):
                break

        if ((breakflg == False) and isstream):
            # 次のフレームを読み出す
            success, img_frame = cap.read()
            if success == False:                        # 動画終了
                break
            loopflg = cap.isOpened()
        else:
            loopflg = False

    # 終了処理 
    if (isstream):
        cap.release()

        # 処理結果の記録 step3
        if (outpath != 'non'):
            if (isstream):
                outvideo.release()

    cv2.destroyAllWindows()
    del net
    del exec_net

    printC('FPS average: {:>10.2f}'.format(fpsWithTick.get_average()))
    logger.info('Finished.\n')

if __name__ == '__main__':
    sys.exit(main() or 0)

▼「object_detect_yolo3.py」

# -*- coding: utf-8 -*-
##------------------------------------------
##  Object detection           Ver 0.06
##    with OpenVINO™ toolkit
##
## model: yolo-v3-tiny-tf
##
##               2022.10.30 Masahiro Izutsu
##------------------------------------------
##  ver 0.00    2021.02.24
##  ver 0.01    2021.03.25  device parameter
##  ver 0.02    2021.06.23  fps display
##  ver 0.03    2022.01.04  linux/windows
## object_detect_yolo3_3.py     (object_detect_yolo3_2.py   ver 0.03    2022.01.04)
##  ver 0.05    2022.11.26  object has no attribute 'outputs'
## object_detect_yolo3.py       (object_detect_yolo3_3.py   ver 0.05    2022.11.26)
##  ver 0.06    2023.04.12 クラス描画範囲の出力対応

# 定数定義
title = 'Object detection TinyYOLO V3'

from os.path import expanduser
MODEL_DEF_DETECT = expanduser('../../model/public/FP32/yolo-v3-tiny-tf.xml')
INPUT_IMAGE_DEF = '../../Videos/car_m.mp4'             # 入力イメージ・パラメータ初期値
LABEL_DEF = 'coco.names_jp'

# モジュール読み込み
from openvino.inference_engine import IECore
from openvino.inference_engine import get_version

# import処理
import sys
import cv2
import numpy as np
import os
import argparse
import my_logging
import my_puttext
import my_file
import my_movedlg
import my_dialog
import my_process
import my_winstat
import my_fps
import my_color80
from my_print import printG, printR, printY, printC, printB, printN

# Adjust these thresholds
DETECTION_THRESHOLD = 0.60
IOU_THRESHOLD = 0.25

# Tiny yolo anchor box values
anchors = [10,14, 23,27, 37,58, 81,82, 135,169, 344,319]

# Used for display
TEXT_COLOR = my_color80.CR_white           # white text

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--input', metavar = 'INPUT_IMAGE', type=str, default = INPUT_IMAGE_DEF,
            help = 'Absolute path to image file or cam for camera stream.'
            'Default value is \''+INPUT_IMAGE_DEF+'\'')
    parser.add_argument('--ir', metavar = 'IR_File', type=str,
            default = MODEL_DEF_DETECT,
            help = 'Absolute path to the neural network IR xml file.'
            'Default value is '+MODEL_DEF_DETECT)
    parser.add_argument('-d', '--device', default='CPU', type = str,
            help = 'Optional. Specify a target device to infer on. CPU, GPU, FPGA, HDDL or MYRIAD is '
                   'acceptable. The demo will look for a suitable plugin for the device specified. '
                   'Default value is CPU')
    parser.add_argument('-lb', '--labels', metavar = 'LABEL_FILE', type = str, default = LABEL_DEF,
            help = 'Absolute path to labels file.'
            'Default value is \''+LABEL_DEF+'\'')
    parser.add_argument('--threshold', metavar = 'FLOAT', type = float, default = DETECTION_THRESHOLD,
            help = 'Threshold for detection.'
            'Default value is \''+ str(DETECTION_THRESHOLD) +'\'')
    parser.add_argument('--iou', metavar = 'FLOAT', type = float, default = IOU_THRESHOLD,
            help = 'Intersection Over Union.'
            'Default value is \''+ str(IOU_THRESHOLD) +'\'')
    parser.add_argument('--log', metavar = 'LOG', default = '3',
            help = 'Log level(-1/0/1/2/3/4/5) Default value is \'3\'')
    parser.add_argument('-t', '--title', metavar = 'TITLE',
            default = 'y',
            help = 'Program title flag.(y/n) Default value is \'y\'')
    parser.add_argument('-s', '--speed', metavar = 'SPEED',
            default = 'y',
            help = 'Speed display flag.(y/n) Default calue is \'y\'')
    parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT',
            default = 'non',
            help = 'Processed image file path. Default value is \'non\'')
    return parser


# creates a mask to remove duplicate objects (boxes) and their related probabilities and classifications
# that should be considered the same object.  This is determined by how similar the boxes are
# based on the intersection-over-union metric.
# box_list is as list of boxes (4 floats for centerX, centerY and Length and Width)
def get_duplicate_box_mask(box_list, iou_threshold):
    # The intersection-over-union threshold to use when determining duplicates.
    # objects/boxes found that are over this threshold will be
    # considered the same object
    max_iou = iou_threshold

    box_mask = np.ones(len(box_list))

    for i in range(len(box_list)):
        if box_mask[i] == 0: continue
        for j in range(i + 1, len(box_list)):
            if get_intersection_over_union(box_list[i], box_list[j]) >= max_iou:
                if box_list[i][4] < box_list[j][4]:
                    box_list[i], box_list[j] = box_list[j], box_list[i]
                box_mask[j] = 0.0

    filter_iou_mask = np.array(box_mask > 0.0, dtype='bool')
    return filter_iou_mask

# Evaluate the intersection-over-union for two boxes
# The intersection-over-union metric determines how close
# two boxes are to being the same box.  The closer the boxes
# are to being the same, the closer the metric will be to 1.0
# box_1 and box_2 are arrays of 4 numbers which are the (x, y)
# points that define the center of the box and the length and width of
# the box.
# Returns the intersection-over-union (between 0.0 and 1.0)
# for the two boxes specified.
def get_intersection_over_union(box_1, box_2):

    # one diminsion of the intersecting box
    intersection_dim_1 = min(box_1[0]+0.5*box_1[2],box_2[0]+0.5*box_2[2])-\
                         max(box_1[0]-0.5*box_1[2],box_2[0]-0.5*box_2[2])

    # the other dimension of the intersecting box
    intersection_dim_2 = min(box_1[1]+0.5*box_1[3],box_2[1]+0.5*box_2[3])-\
                         max(box_1[1]-0.5*box_1[3],box_2[1]-0.5*box_2[3])

    if intersection_dim_1 < 0 or intersection_dim_2 < 0 :
        # no intersection area
        intersection_area = 0
    else :
        # intersection area is product of intersection dimensions
        intersection_area =  intersection_dim_1*intersection_dim_2

    # calculate the union area which is the area of each box added
    # and then we need to subtract out the intersection area since
    # it is counted twice (by definition it is in each box)
    union_area = box_1[2]*box_1[3] + box_2[2]*box_2[3] - intersection_area;

    # now we can return the intersection over union
    iou = intersection_area / union_area
    return iou

# モデル基本情報の表示
def display_info(input_shape, net_outputs, input_str, ir, labels, threshold, iou_threshold, device, log, titleflg, speedflg, outpath):
    output_nodes = []
    output_iter = iter(net_outputs)
    for i in range(len(net_outputs)):
        output_nodes.append(next(output_iter))

    printG(f' - Program title  : {title}')
    printG(f' - OpenCV version : {cv2.__version__}')
    printG(f' - OpenVINO engine: {get_version()}')
    printY(f' - Input image    : {input_str}')
    printY(f' - Model          : {ir}')
    printY(f' - Device         : {device}')
    printY(f' - Labels File    : {labels}')
    printY(f' - Threshold      : {threshold}')
    printY(f' - Intersection Over Union: {iou_threshold}')
    printY(f' - Log level      : {log}')
    printY(f' - Program Title  : {titleflg}')
    printY(f' - Speed flag     : {speedflg}')
    printY(f' - Processed out  : {outpath}')
    printN(f' - Input Shape    : {input_shape}')
    printN(' - Output Shapes  :')
    for j in range(len(output_nodes)):
        printN(f'    - output #{j} name : {output_nodes[j]}')
        printN(f'       - output shape: {net_outputs[output_nodes[j]].shape}')

# This function parses the output results from tiny yolo v3.
# The results are transposed so the output shape is (1, 13, 13, 255) or (1, 26, 26, 255). Original will be (1, 255, w, h).
# Tiny yolo does detection on two different scales using 13x13 grid and 26x26 grid.
# This is how the output is parsed:
# Imagine the image being split up into 13x13 or 26x26 grid. Each grid cell contains 3 anchor boxes. 
# For each of those 3 anchor boxes, there are 85 values. 
# 80 class probabilities + 4 coordinate values + 1 box confidence score = 85 values 
# So that results in each grid cell having 255 values (85 values x 3 anchor boxes = 255 values)
def parseTinyYoloV3Output(output_node_results, filtered_objects, source_image_width, source_image_height, scaled_w, scaled_h, detection_threshold, num_labels):
    # transpose the output node results
    output_node_results = output_node_results.transpose(0,2,3,1)
    output_h = output_node_results.shape[1]
    output_w = output_node_results.shape[2]

    # 80 class scores + 4 coordinate values + 1 objectness score = 85 values
    # 85 values * 3 prior box scores per grid cell= 255 values 
    # 255 values * either 26 or 13 grid cells
    num_of_classes = num_labels
    num_anchor_boxes_per_cell = 3
    
    # Set the anchor offset depending on the output result shape
    anchor_offset = 0
    if output_w == 13:
        anchor_offset = 2 * 3
    elif output_w == 26:
        anchor_offset = 2 * 0

    # used to calculate approximate coordinates of bounding box
    x_ratio = float(source_image_width) / scaled_w
    y_ratio = float(source_image_height) / scaled_h

    # Filter out low scoring results
    output_size = output_w * output_h
    for result_counter in range(output_size): 
        row = int(result_counter / output_w)
        col = int(result_counter % output_h)
        for anchor_boxes in range(num_anchor_boxes_per_cell): 
            # check the box confidence score of the anchor box. This is how likely the box contains an object
            box_confidence_score = output_node_results[0][row][col][anchor_boxes * num_of_classes + 5 + 4]
            if box_confidence_score < detection_threshold:
                continue
            # Calculate the x, y, width, and height of the box
            x_center = (col + output_node_results[0][row][col][anchor_boxes * num_of_classes + 5 + 0]) / output_w * scaled_w
            y_center = (row + output_node_results[0][row][col][anchor_boxes * num_of_classes + 5 + 1]) / output_h * scaled_h
            width = np.exp(output_node_results[0][row][col][anchor_boxes * num_of_classes + 5 + 2]) * anchors[anchor_offset + 2 * anchor_boxes]
            height = np.exp(output_node_results[0][row][col][anchor_boxes * num_of_classes + 5 + 3]) * anchors[anchor_offset + 2 * anchor_boxes + 1]
            # Now we check for anchor box for the highest class probabilities.
            # If the probability exceeds the threshold, we save the box coordinates, class score and class id
            for class_id in range(num_of_classes): 
                class_probability = output_node_results[0][row][col][anchor_boxes * num_of_classes + 5 + 5 + class_id]
                # Calculate the class's confidence score by multiplying the box_confidence score by the class probabiity
                class_confidence_score = class_probability * box_confidence_score
                if (class_confidence_score) < detection_threshold:
                    continue
                # Calculate the bounding box top left and bottom right vertexes
                xmin = max(int((x_center - width / 2) * x_ratio), 0)
                ymin = max(int((y_center - height / 2) * y_ratio), 0)
                xmax = min(int(xmin + width * x_ratio), source_image_width-1)
                ymax = min(int(ymin + height * y_ratio), source_image_height-1)
                filtered_objects.append((xmin, ymin, xmax, ymax, class_confidence_score, class_id))

# ** main関数 **
def main():
    # 日本語フォント指定
    fontFace = my_puttext.get_font()

    # 入力パラメータの処理
    ARGS = parse_args().parse_args()
    input_str = ARGS.input
    log = ARGS.log
    logsel = int(log)
    labels = ARGS.labels
    titleflg = ARGS.title
    speedflg = ARGS.speed
    ir = ARGS.ir
    detection_threshold = ARGS.threshold
    iou_threshold = ARGS.iou
    device = ARGS.device
    outpath = ARGS.out

    # ファイル処理クラスのインスタンス
    my_file_treatment = my_file.FileTreatment(logsel)

    # 入力ソースの違いによる前処理
    if input_str == '0':
        input_str = my_dialog.select_image_movie_file(initdir = '../../Videos')
        if len(input_str) <= 0:
            printR('Program canceled.')
            return 0
    input_stream = input_str
    if input_str.lower() == "cam" or input_str.lower() == "camera":
        input_stream = 0
        isstream = True
    else:
        filetype = my_file_treatment.is_pict(input_stream)
        isstream = filetype == 'None'
        if (filetype == 'NotFound'):
            printR(f"'{input_stream}': Input file Not found.")
            return 0
    if isstream and outpath != 'non' and os.path.splitext(outpath)[1] != '.mp4':
        printR(f"'{outpath}': Output file type is '.mp4'")
        return 0

    # アプリケーション・ログ設定
    module = os.path.basename(__file__)
    module_name = os.path.splitext(module)[0]
    logger = my_logging.get_module_logger_sel(module_name, logsel)
    logger.info('Starting..')
    
    # 分類ラベルの取得
    with open(labels, encoding='utf-8') as labels_file:
        label_list = labels_file.read().splitlines()

    ## 1. Create ie core and network ##
    # Select the myriad plugin and IRs to be used
    ie = IECore()
    net = ie.read_network(model = ir, weights = ir[:-3] + 'bin')

    # Set up the input blobs
    input_key = list(net.input_info.keys())[0]              # 入力データ・キー名
    input_blob_name  = net.input_info[input_key].name
    input_blob = net.input_info[input_blob_name].name
    input_shape = net.input_info[input_blob].input_data.shape

    # 情報表示
    display_info(input_shape, net.outputs, input_str, ir, labels, detection_threshold, iou_threshold, device, log, titleflg, speedflg, outpath)
    
    # Load the network and get the network input shape information
    exec_net = ie.load_network(network = net, device_name = device)
    n, c, network_input_h, network_input_w = input_shape

    # 関数内パラメータ
    window_name = title + "  (hit 'q' or 'esc' key to exit)"

    # 入力準備
    if (isstream):
        # カメラ 
        cap = cv2.VideoCapture(input_stream)
        success, img_frame = cap.read()
        if success == False:
            logger.error('camera read error !!')
        loopflg = cap.isOpened()
        img_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        img_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
    else:
        # 画像ファイル読み込み
        img_frame = cv2.imread(input_stream)
        if img_frame is None:
            logger.error('unable to read the input.')
            return 0

        # アスペクト比を固定してリサイズ（スクリーンに収まるサイズ）
        h, w = my_movedlg.get_display_size()
        mx = h - 10 if h > w else w - 10
        img_frame = my_process.frame_resize(img_frame, mx)
        loopflg = True                         # 1回ループ
        img_width = img_frame.shape[1]
        img_height = img_frame.shape[0]

    # ウインドウ初期設定
    cv2.namedWindow(window_name, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) 
    cv2.moveWindow(window_name, 20, 0)

    # 処理結果の記録 step1
    if (outpath != 'non'):
        if (isstream):
            fps = int(cap.get(cv2.CAP_PROP_FPS))
            fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
            outvideo = cv2.VideoWriter(outpath, fourcc, fps, (img_width, img_height))

    # Width and height calculations. These will be used to scale the bounding boxes
    source_image_width = img_frame.shape[1]
    source_image_height = img_frame.shape[0]
    scaled_w = int(source_image_width * min(network_input_w/source_image_width, network_input_w/source_image_height))
    scaled_h = int(source_image_height * min(network_input_h/source_image_width, network_input_h/source_image_height))

    # 計測値初期化
    fpsWithTick = my_fps.fpsWithTick()
    frame_count = 0
    fps_total = 0
    fpsWithTick.get()                         # fps計測開始

    # メインループ 
    while (loopflg):
        if img_frame is None:
            logger.error('unable to read the input.')
            return 0
        frame = img_frame

        ## 2. Preprocessing ##
        # Image preprocessing (resize, transpose, reshape)
        input_image = cv2.resize(frame, (network_input_w, network_input_h), cv2.INTER_LINEAR)
        input_image = input_image.astype(np.float32)
        input_image = np.transpose(input_image, (2,0,1))
        reshaped_image = input_image.reshape((n, c, network_input_h, network_input_w))

        ## 3. Perform Inference ##
        ## 4. Get results ##

        all_output_results = exec_net.infer(inputs={input_blob: reshaped_image})
        
        ## 5. Post processing for results ##
        # Post-processing for tiny yolo v3 
        # The post process consists of the following steps:
        # 1. Parse the output and filter out low scores
        # 2. Filter out duplicates using intersection over union
        # 3. Draw boxes and text

        ### 1. Tiny yolo v3 has two outputs and we check/parse both outputs
        filtered_objects = []
        for output_node_results in all_output_results.values():
            parseTinyYoloV3Output(output_node_results, filtered_objects, source_image_width, source_image_height, scaled_w, scaled_h, detection_threshold, len(label_list))

        ### 2. Filter out duplicate objects from all detected objects
        filtered_mask = get_duplicate_box_mask(filtered_objects, iou_threshold)
        ### 3. Draw rectangles and set up display texts
        for object_index in range(len(filtered_objects)):
            if filtered_mask[object_index] == True:
                # get all values from the filtered object list
                xmin = filtered_objects[object_index][0]
                ymin = filtered_objects[object_index][1]
                xmax = filtered_objects[object_index][2]
                ymax = filtered_objects[object_index][3]
                confidence = filtered_objects[object_index][4]
                class_id = filtered_objects[object_index][5]
                # オブジェクト別の色指定
                BOX_COLOR = my_color80.get_boder_bgr80(class_id)
                LABEL_BG_COLOR = my_color80.get_back_bgr80(class_id)
                # Set up the text for display
                lablstr = label_list[class_id] + str(round(confidence * 100, 1)) + ' %'
                x1, y1, x2, y2 = my_puttext.cv2_putText(frame, lablstr, (xmin+5, ymin+18), fontFace, 14, TEXT_COLOR, 0, True)       # 描画範囲の取得
                cv2.rectangle(frame,(xmin, ymin), (max(xmax, x2), ymin+20), LABEL_BG_COLOR, -1)
                my_puttext.cv2_putText(frame, lablstr, (xmin+5, ymin+18), fontFace, 14, TEXT_COLOR, 0)
                # Set up the bounding box
                cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), BOX_COLOR, 1)

        # FPSを計算する
        fps = fpsWithTick.get()
        st_fps = 'fps: {:>6.2f}'.format(fps)
        if (speedflg == 'y'):
            cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1)
            cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA)

        # タイトル描画
        if (titleflg == 'y'):
            cv2.putText(frame, title, (10 + 2, 30 + 2), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(0, 0, 0), lineType=cv2.LINE_AA)
            cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA)

        # ウインドウの更新/画像表示 
        cv2.imshow(window_name, frame)

        # 処理結果の記録 step2
        if (outpath != 'non'):
            if (isstream):
                outvideo.write(frame)
            else:
                cv2.imwrite(outpath, frame)

        # 何らかのキーが押されたら終了 
        breakflg = False
        while(True):
            key = cv2.waitKey(1)                        # (カメラ・アクセス速度調整)
            if key == 27 or key == 113:                 # 'esc' or 'q'
                breakflg = True
                break
            if not my_winstat._is_visible(window_name): # 'Close' button
                breakflg = True
                break
            if (isstream):
                break

        if ((breakflg == False) and isstream):
            # 次のフレームを読み出す
            success, img_frame = cap.read()
            if success == False:                        # 動画終了
                break
            loopflg = cap.isOpened()
        else:
            loopflg = False

    # 終了処理 
    if (isstream):
        cap.release()

        # 処理結果の記録 step3
        if (outpath != 'non'):
            if (isstream):
                outvideo.release()

    cv2.destroyAllWindows()
    del net
    del exec_net

    printC('FPS average: {:>10.2f}'.format(fpsWithTick.get_average()))
    logger.info('Finished.\n')

# main関数エントリーポイント(実行開始)
if __name__ == "__main__":
    sys.exit(main())