私的AI研究会 > CondaWin
Windows環境下で VertualBox を使わずに「Anaconda」を直接インストールしてこれまでのアプリケーションを実行できるようにする。
環境構築は更新されました 新しページを参照して下さい → AI開発プロジェクト環境構築(Windows編)
X:/anaconda_win ├─anaconda ← windows 環境構築のためのファイル ├─Images ← アプリケーションで利用する画像ファイル ├─model ← OpenVINOアプリケーションで利用する学習済みモデル │ ├─intel │ │ └─FP32 │ └─public │ └─FP32 ├─Videos ← アプリケーションで利用する動画ファイル ├─workspace ← OpenVINO を移用したアプリケーション・プロジェクト │ └─_apps ← anaconda (Windows/Linux) 環境に対応したアプリケーション ※ └─workspace_py37 ← anaconda 環境下のアプリケーション・プロジェクト ├─exercise ← GUI 環境構築のテスト │ ├─cvui │ ├─cvui-master │ ├─for_pycon_shizu-master │ ├─PySimpleGUI-master │ └─PySimpleGUI-Photo-Colorizer-master ├─pyocr ← OCR のテストアプリケーション ※ └─tryocr ← OCR アプリケーション作成プロジェクト ※ ※ anaconda (Windows/Linux) 環境に対応したプロジェクト
コマンド | パラメータ | 用途 |
cd (chdir) | 現在のカレンディレクトリを表示 | |
パス | カレンディレクトリの変更 | |
ls(dir) | カレンディレクトリのファイルやフォルダ一覧 | |
パス | 指定したパスのファイル表示 | |
tree | カレンディレクトリのツリー表示 | |
パス | 指定したパスのツリー表示 | |
ren | 対象ファイル名 新しいファイル名 | ファイル名を変更 |
move | 移動するファイル名 移動先パス | ファイルを移動 |
変更前ディレクトリ 変更後ディレクトリ | ディレクトリ名を変更 | |
copy | コピー元 コピー先 | ファイルをコピー |
del | ファイル | ファイルやフォルダを削除 |
help | コマンド名 | コマンドのヘルプ表示 |
cls | コンソールのクリア | |
md (mkdir) | (パス)ディレクトリ名 | ディレクトリの作成 |
rm (rmdir) | (パス)ディレクトリ名 | ディレクトリの削除 |
type | (パス)ファイル名 | テキストファイルの中身表示 |
more | (パス)ファイル名 | テキストファイルの1画面表示 |
ipconfig | ネットワークの設定確認 | |
exit | コマンドプロンプトを終了 |
プロジェクトパッケージ「anaconda_win_XXXXXXXXX.zip」入手済みの場合 はステップを省略して ここ からの手順を実行する。
(base) PS C:\Users\izuts>conda create -n py37w python=3.7 : done # # To activate this environment, use # # $ conda activate py_37w # # To deactivate an active environment, use # # $ conda deactivate (base) PS C:\Users\izuts>conda activate py37w (py37w) PS C:\Users\izuts> (py37w) PS C:\Users\izuts>conda info -e # conda environments: # base C:\Users\izuts\anaconda3 py37w * C:\Users\izuts\anaconda3\envs\py37w・以降の操作は「Anaconda Prompt」から「py37w」仮想環境を起動した状態で行う。
(py37w) PS C:\Users\izuts>conda install pytorch torchvision torchaudio cpuonly -c pytorch Collecting package metadata (current_repodata.json): done Solving environment: done :
(py37w) PS C:\Users\izuts>conda install -c conda-forge opencv : (py37w) PS C:\Users\izuts>conda install -c conda-forge pandas : (py37w) PS C:\Users\izuts>conda install -c conda-forge tqdm : (py37w) PS C:\Users\izuts>conda install -c conda-forge matplotlib : (py37w) PS C:\Users\izuts>conda install -c conda-forge PyYAML : (py37w) PS C:\Users\izuts>conda install openvino-ie4py -c intel :
(py37w) PS > pip install scikit-learn : (py37w) PS > pip install facenet-pytorch :
(py37w) PS $ conda install -c conda-forge tesseract : (py37w) PS $ pip install pyocr :
残念ながら Linux/Windows 等の異なる OS間ではうまくいかないが、同じOSの下では下記の手順で簡単に仮想環境の移行が実現できる。
(py37w) PS $ conda env export > environment_py37w.yml
$ vi environment_py37w.yml name: py37w → py37x に変更(新しい仮想環境名) channels: - intel - loopbio : prefix: /XXXXXX/anaconda3/envs/py37 → この行削除
(base) PS $ cd /anaconda_win/workspace_py37 ← 「environment_py37w.yml」ファイルのある場所へ (base) PS $ conda env create -f environment_py37w.yml Collecting package metadata (repodata.json): done : (base) PS $ conda info -e # conda environments: # base * C:\Users\XXXXX\anaconda3 py37w C:\Users\XXXXX\anaconda3\envs\py37w (base) PS > conda activate py37w (py37w) PS >
(py37w) PS > python -c "import torch"※「python3」ではエラーとなるので「python」コマンドを使用する。
(py37w) PS > python -c "import tkinter"
(py37w) PS > python -c "from openvino.inference_engine import IECore"
(py37w) PS > cd anaconda_win\workspace_py37\chapter01 (py37w) PS > python chapt01_1.py
(py37w) PS > pip install pyocr
jpn.traineddata ← 日本語用データ~ jpn_vert.traineddata ← 日本語縦書き用データ~ /script Japanese.traineddata ← 日本語用スクリプト~ Japanese_vert.traineddata ← 日本語縦書き用スクリプト~
(py37w) PS > echo $env:TESSDATA_PREFIX C:\Program Files\Tesseract-OCR\tessdata (py37w) PS > echo $env:PYTHONPATH X:\anaconda_win\workspace\lib・Command Prompt の場合
(py37w) > echo %PYTHONPATH% X:\anaconda_win\workspace\lib (py37w) >echo %TESSDATA_PREFIX% C:\Program Files\Tesseract-OCR\tessdata
%windir%\System32\WindowsPowerShell\v1.0\powershell.exe -ExecutionPolicy ByPass -NoExit -Command "& 'C:\ProgramData\Anaconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\<User>\anaconda3' "・次のように変更 <User> = ユーザー名 X: = anaconda_winを配置したドライブ
%windir%\System32\WindowsPowerShell\v1.0\powershell.exe -ExecutionPolicy ByPass -NoExit -Command "& 'C:\ProgramData\Anaconda3\shell\condabin\conda-hook.ps1' ; conda activate 'C:\Users\<User>\anaconda3\envs\py37w' ; Set-Location 'X:\anaconda_win\workspace_py37' "
%windir%\System32\cmd.exe "/K" C:\Users\<User>\anaconda3\Scripts\activate.bat C:\\Users\<User>\anaconda3・次のように変更 <User> = ユーザー名 X: = anaconda_winを配置したドライブ
%windir%\System32\cmd.exe "/K" C:\Users\<User>\anaconda3\Scripts\activate.bat C:\Users\<User>\anaconda3\envs\py37w & cd /d X:\anaconda_win\workspace_py37
ソースコードの場所
(py37w) PS > cd \anaconda_win\workspace\_apps
(py37w) PS > python .\emotion2.py --- Emotion Recognition 2 --- 3.4.2 OpenVINO inference_engine: 2021.4.2-3974-e2a469a3450-releases/2021/4 Emotion Recognition 2: Starting application... - Image File : 0 - m_detect : ../../model/intel/FP32/face-detection-adas-0001.xml - m_recognition: ../../model/intel/FP32/emotions-recognition-retail-0003.xml - Device : CPU - Language : jp - Input Shape1 : data - Output Shape1: detection_out - Input Shape2 : data - Output Shape2: prob_emotion - Program Title: y - Speed flag : y - Processed out: non FPS average: 21.50 Finished.
# -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ toolkit ## Emotion Recognition ## ## model: face-detection-adas-0001 ## emotions-recognition-retail-0003 ## ## 2021.02.24 Masahiro Izutsu ##------------------------------------------ ## 2021.03.25 model/device parameter ## 2021.06.23 fps display ## 2021.12.24 linux/windows # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 WINDOW_WIDTH = 640 TEXT_COLOR = (255, 255, 255) # white text from os.path import expanduser MODEL_DEF_FACE = expanduser('../../model/intel/FP32/face-detection-adas-0001.xml') MODEL_DEF_EMO = expanduser('../../model/intel/FP32/emotions-recognition-retail-0003.xml') # モジュール読み込み from openvino.inference_engine import IECore from openvino.inference_engine import get_version # import処理 import sys import cv2 import numpy as np import argparse import myfunction import mylib import mylib_gui import platform # タイトル・バージョン情報 title = 'Emotion Recognition 2' print(GREEN) print('--- {} ---'.format(title)) print(cv2.__version__) print("OpenVINO inference_engine:", get_version()) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = 'cam', help = 'Absolute path to image file or cam for camera stream.') parser.add_argument('-m_dt', '--m_detector', type=str, default = MODEL_DEF_FACE, help = 'Detector Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_FACE) parser.add_argument('-m_re', '--m_recognition', type=str, default = MODEL_DEF_EMO, help = 'Emotion Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_EMO) parser.add_argument('-d', '--device', default='CPU', type=str, help = 'Optional. Specify a target device to infer on. CPU, GPU, FPGA, HDDL or MYRIAD is ' 'acceptable. The demo will look for a suitable plugin for the device specified. ' 'Default value is CPU') parser.add_argument('-l', '--language', metavar = 'LANGUAGE', default = 'jp', help = 'Language.(jp/en) Default value is \'jp\'') parser.add_argument('-t', '--title', metavar = 'TITLE', default = 'y', help = 'Program title flag.(y/n) Default value is \'y\'') parser.add_argument('-s', '--speed', metavar = 'SPEED', default = 'y', help = 'Speed display flag.(y/n) Default calue is \'y\'') parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT', default = 'non', help = 'Processed image file path. Default value is \'non\'') return parser # モデル基本情報の表示 def display_info(image, detector, recognition, device, lang, input_blob, out_blob, input_blob_emo, out_blob_emo, titleflg, speedflg, outpath): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'm_detect : ' + NOCOLOR, detector) print(' - ' + YELLOW + 'm_recognition: ' + NOCOLOR, recognition) print(' - ' + YELLOW + 'Device : ' + NOCOLOR, device) print(' - ' + YELLOW + 'Language : ' + NOCOLOR, lang) print(' - ' + YELLOW + 'Input Shape1 : ' + NOCOLOR, input_blob) print(' - ' + YELLOW + 'Output Shape1: ' + NOCOLOR, out_blob) print(' - ' + YELLOW + 'Input Shape2 : ' + NOCOLOR, input_blob_emo) print(' - ' + YELLOW + 'Output Shape2: ' + NOCOLOR, out_blob_emo) print(' - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg) print(' - ' + YELLOW + 'Speed flag : ' + NOCOLOR, speedflg) print(' - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath) # 画像の種類を判別する # 戻り値: 'jeg''png'... 画像ファイル # 'None' 画像ファイル以外 (動画ファイル) # 'NotFound' ファイルが存在しない import imghdr def is_pict(filename): try: imgtype = imghdr.what(filename) except FileNotFoundError as e: imgtype = 'NotFound' return str(imgtype) # ** main関数 ** def main(): # 日本語フォント指定 if platform.system()=='Windows': fontPIL = 'meiryo.ttc' # ゴシック体 else: fontPIL = 'NotoSansCJK-Bold.ttc' # ゴシック体 # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image lang = ARGS.language titleflg = ARGS.title speedflg = ARGS.speed if ARGS.image.lower() == "cam" or ARGS.image.lower() == "camera": input_stream = 0 isstream = True else: filetype = is_pict(input_stream) isstream = filetype == 'None' if (filetype == 'NotFound'): print(RED + "\ninput file Not found." + NOCOLOR) quit() model_detector=ARGS.m_detector model_recognition=ARGS.m_recognition device = ARGS.device outpath = ARGS.out # 感情ラベル if (lang == 'jp'): list_emotion = ['平静', '嬉しい', '悲しい', '驚き', '怒り'] else: list_emotion = ['neutral', 'happy', 'sad', 'surprise', 'anger'] # 感情色ラベル color_emotion = [(255, 255, 0), ( 0, 255, 0), ( 0, 255, 255), (255, 0, 255), ( 0, 0, 255)] bkcolor_emotion = [(120, 120, 70), ( 70, 120, 70), ( 70, 120, 120), (120, 70, 120), ( 70, 70, 120)] textcolor_emotion = [(255, 255, 255), (255, 255, 255), (255, 255, 255), (255, 255, 255), (255, 255, 255)] # モデルの読み込み (顔検出)face-detection-adas-0001 ie = IECore() net = ie.read_network(model = model_detector, weights = model_detector[:-4] + '.bin') exec_net = ie.load_network(network = net, device_name = device) # 入出力設定(顔検出) input_blob = net.input_info['data'].name out_blob = next(iter(net.outputs)) n, c, h, w = net.input_info[input_blob].input_data.shape # モデルの読み込み(感情検出)emotions-recognition-retail-0003 net_emo = ie.read_network(model = model_recognition, weights = model_recognition[:-4] + '.bin') exec_net_emo = ie.load_network(network = net_emo, device_name=device) # 入出力設定(感情) input_blob_emo = net.input_info['data'].name out_blob_emo = next(iter(net_emo.outputs)) n_emo, c_emo, h_emo, w_emo = net.input_info[input_blob_emo].input_data.shape # 情報表示 display_info(input_stream, model_detector, model_recognition, device, lang, input_blob, out_blob, input_blob_emo, out_blob_emo, titleflg, speedflg, outpath) # 入力準備 if (isstream): # カメラ cap = cv2.VideoCapture(input_stream) ret, frame = cap.read() loopflg = cap.isOpened() else: # 画像ファイル読み込み frame = cv2.imread(input_stream) if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # アスペクト比を固定してリサイズ img_h, img_w = frame.shape[:2] if (img_w > WINDOW_WIDTH): height = round(img_h * (WINDOW_WIDTH / img_w)) frame = cv2.resize(frame, dsize = (WINDOW_WIDTH, height)) loopflg = True # 1回ループ # 処理結果の記録 step1 if (outpath != 'non'): if (isstream): fps = int(cap.get(cv2.CAP_PROP_FPS)) out_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) out_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') outvideo = cv2.VideoWriter(outpath, fourcc, fps, (out_w, out_h)) # 計測値初期化 fpsWithTick = mylib.fpsWithTick() frame_count = 0 fps_total = 0 fpsWithTick.get() # fps計測開始 # メインループ while (loopflg): if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # 入力データフォーマットへ変換 img = cv2.resize(frame, (w, h)) # サイズ変更 img = img.transpose((2, 0, 1)) # HWC > CHW img = np.expand_dims(img, axis=0) # 次元合せ # 推論実行 out = exec_net.infer(inputs={'data': img}) # 出力から必要なデータのみ取り出し out = out['detection_out'] out = np.squeeze(out) #サイズ1の次元を全て削除 # 検出されたすべての顔領域に対して1つずつ処理 for detection in out: # conf値の取得 confidence = float(detection[2]) # バウンディングボックス座標を入力画像のスケールに変換 xmin = int(detection[3] * frame.shape[1]) ymin = int(detection[4] * frame.shape[0]) xmax = int(detection[5] * frame.shape[1]) ymax = int(detection[6] * frame.shape[0]) # conf値が0.5より大きい場合のみバウンディングボックス表示 if confidence > 0.5: # 顔検出領域はカメラ範囲内に補正する。特にminは補正しないとエラーになる if xmin < 0: xmin = 0 if ymin < 0: ymin = 0 if xmax > frame.shape[1]: xmax = frame.shape[1] if ymax > frame.shape[0]: ymax = frame.shape[0] # 顔領域のみ切り出し frame_face = frame[ ymin:ymax, xmin:xmax ] # 入力データフォーマットへ変換 img = cv2.resize(frame_face, (64, 64)) # サイズ変更 img = img.transpose((2, 0, 1)) # HWC > CHW img = np.expand_dims(img, axis=0) # 次元合せ # 推論実行 out = exec_net_emo.infer(inputs={'data': img}) # 出力から必要なデータのみ取り出し out = out['prob_emotion'] out = np.squeeze(out) # 不要な次元の削減 # 出力値が最大のインデックスを得る emoid = np.argmax(out) emotion = list_emotion[emoid] # バウンディングボックス(顔領域)表示 cv2.rectangle(frame, (xmin, ymin-20), (xmax, ymin), bkcolor_emotion[emoid], -1) # cv2.putText(frame, emotion, (xmin, ymin-4), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.6, color=cor, lineType=cv2.LINE_AA) myfunction.cv2_putText(img = frame, text = emotion, org = (xmin+2, ymin-4), fontFace = fontPIL, fontScale = 12, color = textcolor_emotion[emoid], mode = 0) cv2.rectangle(frame, (xmin, ymin-20), (xmax, ymax), color_emotion[emoid], thickness = 1) # FPSを計算する fps = fpsWithTick.get() st_fps = 'fps: {:>6.2f}'.format(fps) if (speedflg == 'y'): cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1) cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA) # タイトル描画 if (titleflg == 'y'): cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA) # 画像表示 window_name = title + ' (hit key to exit)' cv2.namedWindow(window_name, cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(window_name, frame) # 処理結果の記録 step2 if (outpath != 'non'): if (isstream): outvideo.write(frame) else: cv2.imwrite(outpath, frame) # 何らかのキーが押されたら終了 breakflg = False while(True): key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' breakflg = True break if not mylib_gui._is_visible(window_name): # 'Close' button breakflg = True break if (isstream): break if ((breakflg == False) and isstream): # 次のフレームを読み出す ret, frame = cap.read() if ret == False: break loopflg = cap.isOpened() else: loopflg = False # 終了処理 if (isstream): cap.release() # 処理結果の記録 step3 if (outpath != 'non'): if (isstream): outvideo.release() cv2.destroyAllWindows() print('\nFPS average: {:>10.2f}'.format(fpsWithTick.get_average())) print('\n Finished.') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
(py37w) PS > python .\age_gender2.py --- Age/Gender Recognition 2 --- 3.4.2 OpenVINO inference_engine: 2021.4.2-3974-e2a469a3450-releases/2021/4 Age/Gender Recognition 2: Starting application... - Image File : 0 - m_detect : ../../model/intel/FP32/face-detection-adas-0001.xml - m_recognition: ../../model/intel/FP32/age-gender-recognition-retail-0013.xml - Device : CPU - Language : jp - Input Shape1 : data - Output Shape1: detection_out - Input Shape2 : data - Output Shape2: age_conv3 - Program Title: y - Speed flag : y - Processed out: non FPS average: 24.60 Finished.
# -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ toolkit ## Age/Gender Recognition ## ## model: face-detection-adas-0001 ## age-gender-recognition-retail-0013 ## ## 2021.02.24 Masahiro Izutsu ##------------------------------------------ ## 2021.03.25 model/device parameter ## 2021.06.23 fps display ## 2021.12.24 linux/windows # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 WINDOW_WIDTH = 640 BOX_COLOR_M = ( 0,255, 0) BOX_COLOR_F = ( 0, 0, 255) LABEL_BG_COLOR_M = ( 70, 120, 70) # greyish green background for text LABEL_BG_COLOR_F = ( 70, 70, 120) # greyish red background for text TEXT_COLOR = (255, 255, 255) # white text from os.path import expanduser MODEL_DEF_FACE = expanduser('../../model/intel/FP32/face-detection-adas-0001.xml') MODEL_DEF_AGE = expanduser('../../model/intel/FP32/age-gender-recognition-retail-0013.xml') # モジュール読み込み from openvino.inference_engine import IECore from openvino.inference_engine import get_version # import処理 import sys import cv2 import numpy as np import argparse import myfunction import mylib import mylib_gui import platform # タイトル・バージョン情報 title = 'Age/Gender Recognition 2' print(GREEN) print('--- {} ---'.format(title)) print(cv2.__version__) print("OpenVINO inference_engine:", get_version()) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type=str, default = 'cam', help = 'Absolute path to image file or cam for camera stream.') parser.add_argument('-m_dt', '--m_detector', type=str, default = MODEL_DEF_FACE, help = 'Detector Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_FACE) parser.add_argument('-m_re', '--m_recognition', type=str, default = MODEL_DEF_AGE, help = 'Recognition Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_AGE) parser.add_argument('-d', '--device', default = 'CPU', type=str, help = 'Optional. Specify a target device to infer on. CPU, GPU, FPGA, HDDL or MYRIAD is ' 'acceptable. The demo will look for a suitable plugin for the device specified. ' 'Default value is CPU') parser.add_argument('-l', '--language', metavar = 'LANGUAGE', default = 'jp', help = 'Language.(jp/en) Default value is \'jp\'') parser.add_argument('-t', '--title', metavar = 'TITLE', default = 'y', help = 'Program title flag.(y/n) Default value is \'y\'') parser.add_argument('-s', '--speed', metavar = 'SPEED', default = 'y', help = 'Speed display flag.(y/n) Default calue is \'y\'') parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT', default = 'non', help = 'Processed image file path. Default value is \'non\'') return parser # モデル基本情報の表示 def display_info(image, detector, recognition, device, lang, input_blob, out_blob, input_blob_age, out_blob_age, titleflg, speedflg, outpath): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'm_detect : ' + NOCOLOR, detector) print(' - ' + YELLOW + 'm_recognition: ' + NOCOLOR, recognition) print(' - ' + YELLOW + 'Device : ' + NOCOLOR, device) print(' - ' + YELLOW + 'Language : ' + NOCOLOR, lang) print(' - ' + YELLOW + 'Input Shape1 : ' + NOCOLOR, input_blob) print(' - ' + YELLOW + 'Output Shape1: ' + NOCOLOR, out_blob) print(' - ' + YELLOW + 'Input Shape2 : ' + NOCOLOR, input_blob_age) print(' - ' + YELLOW + 'Output Shape2: ' + NOCOLOR, out_blob_age) print(' - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg) print(' - ' + YELLOW + 'Speed flag : ' + NOCOLOR, speedflg) print(' - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath) # 画像の種類を判別する # 戻り値: 'jeg''png'... 画像ファイル # 'None' 画像ファイル以外 (動画ファイル) # 'NotFound' ファイルが存在しない import imghdr def is_pict(filename): try: imgtype = imghdr.what(filename) except FileNotFoundError as e: imgtype = 'NotFound' return str(imgtype) # ** main関数 ** def main(): # 日本語フォント指定 if platform.system()=='Windows': fontPIL = 'meiryo.ttc' # ゴシック体 else: fontPIL = 'NotoSansCJK-Bold.ttc' # ゴシック体 # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image lang = ARGS.language titleflg = ARGS.title speedflg = ARGS.speed if ARGS.image.lower() == "cam" or ARGS.image.lower() == "camera": input_stream = 0 isstream = True else: filetype = is_pict(input_stream) isstream = filetype == 'None' if (filetype == 'NotFound'): print(RED + "\ninput file Not found." + NOCOLOR) quit() model_detector=ARGS.m_detector model_recognition=ARGS.m_recognition device = ARGS.device outpath = ARGS.out # 性別ラベル if (lang == 'jp'): label = ('女性: ', '男性: ') else: label = ('Female: ', 'Male: ') # モデルの読み込み (顔検出)face-detection-adas-0001 ie = IECore() net = ie.read_network(model = model_detector, weights = model_detector[:-4] + '.bin') exec_net = ie.load_network(network = net, device_name = device) # 入出力設定(顔検出) input_blob = net.input_info['data'].name out_blob = next(iter(net.outputs)) n, c, h, w = net.input_info[input_blob].input_data.shape # モデルの読み込み(年齢/性別)age-gender-recognition-retail-0013 net_age = ie.read_network(model = model_recognition, weights = model_recognition[:-4] + '.bin') exec_net_age = ie.load_network(network = net_age, device_name=device) # 入出力設定(年齢/性別) input_blob_age = net.input_info['data'].name out_blob_age = next(iter(net_age.outputs)) n_age, c_age, h_age, w_age = net.input_info[input_blob_age].input_data.shape # 情報表示 display_info(input_stream, model_detector, model_recognition, device, lang, input_blob, out_blob, input_blob_age, out_blob_age, titleflg, speedflg, outpath) # 入力準備 if (isstream): # カメラ cap = cv2.VideoCapture(input_stream) ret, frame = cap.read() loopflg = cap.isOpened() else: # 画像ファイル読み込み frame = cv2.imread(input_stream) if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # アスペクト比を固定してリサイズ img_h, img_w = frame.shape[:2] if (img_w > WINDOW_WIDTH): height = round(img_h * (WINDOW_WIDTH / img_w)) frame = cv2.resize(frame, dsize = (WINDOW_WIDTH, height)) loopflg = True # 1回ループ # 処理結果の記録 step1 if (outpath != 'non'): if (isstream): fps = int(cap.get(cv2.CAP_PROP_FPS)) out_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) out_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') outvideo = cv2.VideoWriter(outpath, fourcc, fps, (out_w, out_h)) # 計測値初期化 fpsWithTick = mylib.fpsWithTick() frame_count = 0 fps_total = 0 fpsWithTick.get() # fps計測開始 # メインループ while (loopflg): if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # 入力データフォーマットへ変換 img = cv2.resize(frame, (w, h)) # サイズ変更 img = img.transpose((2, 0, 1)) # HWC > CHW img = np.expand_dims(img, axis=0) # 次元合せ # 推論実行 out = exec_net.infer(inputs={'data': img}) # 出力から必要なデータのみ取り出し out = out['detection_out'] out = np.squeeze(out) # サイズ1の次元を全て削除 # 検出されたすべての顔領域に対して1つずつ処理 for detection in out: # conf値の取得 confidence = float(detection[2]) # バウンディングボックス座標を入力画像のスケールに変換 xmin = int(detection[3] * frame.shape[1]) ymin = int(detection[4] * frame.shape[0]) xmax = int(detection[5] * frame.shape[1]) ymax = int(detection[6] * frame.shape[0]) # conf値が0.5より大きい場合のみバウンディングボックス表示 if confidence > 0.5: # 顔検出領域はカメラ範囲内に補正する。特にminは補正しないとエラーになる if xmin < 0: xmin = 0 if ymin < 0: ymin = 0 if xmax > frame.shape[1]: xmax = frame.shape[1] if ymax > frame.shape[0]: ymax = frame.shape[0] # 顔領域のみ切り出し frame_face = frame[ ymin:ymax, xmin:xmax ] # 入力データフォーマットへ変換 img = cv2.resize(frame_face, (62, 62)) # サイズ変更 img = img.transpose((2, 0, 1)) # HWC > CHW img = np.expand_dims(img, axis=0) # 次元合せ # 推論実行 out = exec_net_age.infer(inputs={'data': img}) # 出力から必要なデータのみ取り出し age = out['age_conv3'] prob = out['prob'] age = age[0][0][0][0] * 100 gender = label[np.argmax(prob[0])] if gender == label[0]: box_color = BOX_COLOR_F label_bgcolor = LABEL_BG_COLOR_F else: box_color = BOX_COLOR_M label_bgcolor = LABEL_BG_COLOR_M out_str = gender+':'+'{:>5.1f}'.format(age) label_text_color = TEXT_COLOR # バウンディングボックス(顔領域)表示 cv2.rectangle(frame, (xmin, ymin-20), (xmax, ymin), label_bgcolor, -1) # cv2.putText(frame, out_str, (xmin, ymin-4), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.6, color=cor, lineType=cv2.LINE_AA) myfunction.cv2_putText(img = frame, text = out_str, org = (xmin+2, ymin-4), fontFace = fontPIL, fontScale = 12, color = label_text_color, mode = 0) cv2.rectangle(frame, (xmin, ymin-20), (xmax, ymax), box_color, thickness = 1) # FPSを計算する fps = fpsWithTick.get() st_fps = 'fps: {:>6.2f}'.format(fps) if (speedflg == 'y'): cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1) cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA) # タイトル描画 if (titleflg == 'y'): cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA) # 画像表示 window_name = title + ' (hit key to exit)' cv2.namedWindow(window_name, cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(window_name, frame) # 処理結果の記録 step2 if (outpath != 'non'): if (isstream): outvideo.write(frame) else: cv2.imwrite(outpath, frame) # 何らかのキーが押されたら終了 breakflg = False while(True): key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' breakflg = True break if not mylib_gui._is_visible(window_name): # 'Close' button breakflg = True break if (isstream): break if ((breakflg == False) and isstream): # 次のフレームを読み出す ret, frame = cap.read() if ret == False: break loopflg = cap.isOpened() else: loopflg = False # 終了処理 if (isstream): cap.release() # 処理結果の記録 step3 if (outpath != 'non'): if (isstream): outvideo.release() cv2.destroyAllWindows() print('\nFPS average: {:>10.2f}'.format(fpsWithTick.get_average())) print('\n Finished.') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
(py37w) PS python .\object_detect_yolo3_2.py --- TinyYOLO V3 Object detection --- 3.4.2 OpenVINO inference_engine: 2021.4.2-3974-e2a469a3450-releases/2021/4 Running OpenVINO NCS Tensorflow TinyYolo v3 example... Displaying image with objects detected in GUI... Click in the GUI window and hit any key to exit. .\object_detect_yolo3_2.py:295: DeprecationWarning: 'inputs' property of IENetwork class is deprecated. To access DataPtrs user need to use 'input_data' property of InputInfoPtr objects which can be accessed by 'input_info' property. input_blob = next(iter(net.inputs)) Tiny Yolo v3: Starting application... - IR File : ../../model/public/FP32/yolo-v3-tiny-tf.xml - Input Shape : [1, 3, 416, 416] - Output Shapes: - output #0 name: conv2d_12/Conv2D/YoloRegion - output shape: [1, 255, 26, 26] - output #1 name: conv2d_9/Conv2D/YoloRegion - output shape: [1, 255, 13, 13] - Labels File : coco.names_jp - Image File : 0 - Threshold : 0.6 - Intersection Over Union: 0.25 - Device : CPU - Program Title: y - Speed flag : y - Processed out: non .\object_detect_yolo3_2.py:373: DeprecationWarning: 'outputs' property of InferRequest is deprecated. Please instead use 'output_blobs' property. all_output_results = req_handle.outputs FPS average: 15.80 Finished.
# -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ toolkit ## Object detection ## ## model: yolo-v3-tiny-tf ## ## 2021.02.24 Masahiro Izutsu ##------------------------------------------ ## 2021.03.25 device parameter ## 2021.06.23 fps display ## 2022.01.04 linux/windows # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 WINDOW_WIDTH = 640 from os.path import expanduser MODEL_DEF_DETECT = expanduser('../../model/public/FP32/yolo-v3-tiny-tf.xml') # モジュール読み込み from openvino.inference_engine import IECore from openvino.inference_engine import get_version # import処理 import sys import numpy as np import cv2 import argparse import myfunction import mylib import mylib_gui import platform # タイトル・バージョン情報 title = 'TinyYOLO V3 Object detection' print(GREEN) print('--- {} ---'.format(title)) print(cv2.__version__) print("OpenVINO inference_engine:", get_version()) print(NOCOLOR) # Adjust these thresholds DETECTION_THRESHOLD = 0.60 IOU_THRESHOLD = 0.25 # Tiny yolo anchor box values anchors = [10,14, 23,27, 37,58, 81,82, 135,169, 344,319] # Used for display BOX_COLOR = (0,255,0) LABEL_BG_COLOR = (70, 120, 70) # greyish green background for text TEXT_COLOR = (255, 255, 255) # white text TEXT_FONT = cv2.FONT_HERSHEY_SIMPLEX WINDOW_SIZE_W = 640 WINDOW_SIZE_H = 480 # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('--ir', metavar = 'IR_File', type=str, default = MODEL_DEF_DETECT, help = 'Absolute path to the neural network IR xml file.' 'Default value is '+MODEL_DEF_DETECT) parser.add_argument('-lb', '--labels', metavar = 'LABEL_FILE', type = str, default = 'coco.names_jp', help = 'Absolute path to labels file.' 'Default value is coco.names_jp') parser.add_argument('-i', '--input', metavar = 'IMAGE_FILE or cam', type = str, default = 'cam', help = 'Absolute path to image file or cam for camera stream.') parser.add_argument('-d', '--device', default='CPU', type = str, help = 'Optional. Specify a target device to infer on. CPU, GPU, FPGA, HDDL or MYRIAD is ' 'acceptable. The demo will look for a suitable plugin for the device specified. ' 'Default value is CPU') parser.add_argument('--threshold', metavar = 'FLOAT', type = float, default = DETECTION_THRESHOLD, help = 'Threshold for detection.') parser.add_argument('--iou', metavar = 'FLOAT', type = float, default = IOU_THRESHOLD, help = 'Intersection Over Union.') parser.add_argument('-t', '--title', metavar = 'TITLE', default = 'y', help = 'Program title flag.(y/n) Default value is \'y\'') parser.add_argument('-s', '--speed', metavar = 'SPEED', default = 'y', help = 'Speed display flag.(y/n) Default calue is \'y\'') parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT', default = 'non', help = 'Processed image file path. Default value is \'non\'') return parser # creates a mask to remove duplicate objects (boxes) and their related probabilities and classifications # that should be considered the same object. This is determined by how similar the boxes are # based on the intersection-over-union metric. # box_list is as list of boxes (4 floats for centerX, centerY and Length and Width) def get_duplicate_box_mask(box_list, iou_threshold): # The intersection-over-union threshold to use when determining duplicates. # objects/boxes found that are over this threshold will be # considered the same object max_iou = iou_threshold box_mask = np.ones(len(box_list)) for i in range(len(box_list)): if box_mask[i] == 0: continue for j in range(i + 1, len(box_list)): if get_intersection_over_union(box_list[i], box_list[j]) >= max_iou: if box_list[i][4] < box_list[j][4]: box_list[i], box_list[j] = box_list[j], box_list[i] box_mask[j] = 0.0 filter_iou_mask = np.array(box_mask > 0.0, dtype='bool') return filter_iou_mask # Evaluate the intersection-over-union for two boxes # The intersection-over-union metric determines how close # two boxes are to being the same box. The closer the boxes # are to being the same, the closer the metric will be to 1.0 # box_1 and box_2 are arrays of 4 numbers which are the (x, y) # points that define the center of the box and the length and width of # the box. # Returns the intersection-over-union (between 0.0 and 1.0) # for the two boxes specified. def get_intersection_over_union(box_1, box_2): # one diminsion of the intersecting box intersection_dim_1 = min(box_1[0]+0.5*box_1[2],box_2[0]+0.5*box_2[2])-\ max(box_1[0]-0.5*box_1[2],box_2[0]-0.5*box_2[2]) # the other dimension of the intersecting box intersection_dim_2 = min(box_1[1]+0.5*box_1[3],box_2[1]+0.5*box_2[3])-\ max(box_1[1]-0.5*box_1[3],box_2[1]-0.5*box_2[3]) if intersection_dim_1 < 0 or intersection_dim_2 < 0 : # no intersection area intersection_area = 0 else : # intersection area is product of intersection dimensions intersection_area = intersection_dim_1*intersection_dim_2 # calculate the union area which is the area of each box added # and then we need to subtract out the intersection area since # it is counted twice (by definition it is in each box) union_area = box_1[2]*box_1[3] + box_2[2]*box_2[3] - intersection_area; # now we can return the intersection over union iou = intersection_area / union_area #print("iou: ", iou) return iou # モデル基本情報の表示 def display_info(input_shape, net_outputs, image, ir, labels, threshold, iou_threshold, device, titleflg, speedflg, outpath): output_nodes = [] output_iter = iter(net_outputs) for i in range(len(net_outputs)): output_nodes.append(next(output_iter)) print(YELLOW + 'Tiny Yolo v3: Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'IR File : ' + NOCOLOR, ir) print(' - ' + YELLOW + 'Input Shape : ' + NOCOLOR, input_shape) print(' - ' + YELLOW + 'Output Shapes: ' + NOCOLOR) for j in range(len(output_nodes)): print(' - '+YELLOW+'output #' + str(j) + ' name: ' + NOCOLOR + output_nodes[j]) print(' - output shape: ' + NOCOLOR + str(net_outputs[output_nodes[j]].shape)) print(' - ' + YELLOW + 'Labels File : ' + NOCOLOR, labels) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'Threshold : ' + NOCOLOR, threshold) print(' - ' + YELLOW + 'Intersection Over Union: ' + NOCOLOR, iou_threshold) print(' - ' + YELLOW + 'Device : ' + NOCOLOR, device) print(' - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg) print(' - ' + YELLOW + 'Speed flag : ' + NOCOLOR, speedflg) print(' - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath) # This function parses the output results from tiny yolo v3. # The results are transposed so the output shape is (1, 13, 13, 255) or (1, 26, 26, 255). Original will be (1, 255, w, h). # Tiny yolo does detection on two different scales using 13x13 grid and 26x26 grid. # This is how the output is parsed: # Imagine the image being split up into 13x13 or 26x26 grid. Each grid cell contains 3 anchor boxes. # For each of those 3 anchor boxes, there are 85 values. # 80 class probabilities + 4 coordinate values + 1 box confidence score = 85 values # So that results in each grid cell having 255 values (85 values x 3 anchor boxes = 255 values) def parseTinyYoloV3Output(output_node_results, filtered_objects, source_image_width, source_image_height, scaled_w, scaled_h, detection_threshold, num_labels): # transpose the output node results output_node_results = output_node_results.transpose(0,2,3,1) output_h = output_node_results.shape[1] output_w = output_node_results.shape[2] # 80 class scores + 4 coordinate values + 1 objectness score = 85 values # 85 values * 3 prior box scores per grid cell= 255 values # 255 values * either 26 or 13 grid cells num_of_classes = num_labels num_anchor_boxes_per_cell = 3 # Set the anchor offset depending on the output result shape anchor_offset = 0 if output_w == 13: anchor_offset = 2 * 3 elif output_w == 26: anchor_offset = 2 * 0 # used to calculate approximate coordinates of bounding box x_ratio = float(source_image_width) / scaled_w y_ratio = float(source_image_height) / scaled_h # Filter out low scoring results output_size = output_w * output_h for result_counter in range(output_size): row = int(result_counter / output_w) col = int(result_counter % output_h) for anchor_boxes in range(num_anchor_boxes_per_cell): # check the box confidence score of the anchor box. This is how likely the box contains an object box_confidence_score = output_node_results[0][row][col][anchor_boxes * num_of_classes + 5 + 4] if box_confidence_score < detection_threshold: continue # Calculate the x, y, width, and height of the box x_center = (col + output_node_results[0][row][col][anchor_boxes * num_of_classes + 5 + 0]) / output_w * scaled_w y_center = (row + output_node_results[0][row][col][anchor_boxes * num_of_classes + 5 + 1]) / output_h * scaled_h width = np.exp(output_node_results[0][row][col][anchor_boxes * num_of_classes + 5 + 2]) * anchors[anchor_offset + 2 * anchor_boxes] height = np.exp(output_node_results[0][row][col][anchor_boxes * num_of_classes + 5 + 3]) * anchors[anchor_offset + 2 * anchor_boxes + 1] # Now we check for anchor box for the highest class probabilities. # If the probability exceeds the threshold, we save the box coordinates, class score and class id for class_id in range(num_of_classes): class_probability = output_node_results[0][row][col][anchor_boxes * num_of_classes + 5 + 5 + class_id] # Calculate the class's confidence score by multiplying the box_confidence score by the class probabiity class_confidence_score = class_probability * box_confidence_score if (class_confidence_score) < detection_threshold: continue # Calculate the bounding box top left and bottom right vertexes xmin = max(int((x_center - width / 2) * x_ratio), 0) ymin = max(int((y_center - height / 2) * y_ratio), 0) xmax = min(int(xmin + width * x_ratio), source_image_width-1) ymax = min(int(ymin + height * y_ratio), source_image_height-1) filtered_objects.append((xmin, ymin, xmax, ymax, class_confidence_score, class_id)) # 画像の種類を判別する # 戻り値: 'jeg''png'... 画像ファイル # 'None' 画像ファイル以外 (動画ファイル) # 'NotFound' ファイルが存在しない import imghdr def is_pict(filename): try: imgtype = imghdr.what(filename) except FileNotFoundError as e: imgtype = 'NotFound' return str(imgtype) # ** main関数 ** def main(): # 日本語フォント指定 if platform.system()=='Windows': fontPIL = 'meiryo.ttc' # ゴシック体 else: fontPIL = 'NotoSansCJK-Bold.ttc' # ゴシック体 # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.input labels = ARGS.labels titleflg = ARGS.title speedflg = ARGS.speed if ARGS.input.lower() == "cam" or ARGS.input.lower() == "camera": input_stream = 0 isstream = True else: filetype = is_pict(input_stream) isstream = filetype == 'None' if (filetype == 'NotFound'): print(RED + "\ninput file Not found." + NOCOLOR) quit() ir = ARGS.ir detection_threshold = ARGS.threshold iou_threshold = ARGS.iou device = ARGS.device outpath = ARGS.out # Prepare Categories with open(labels, encoding='utf-8') as labels_file: label_list = labels_file.read().splitlines() print(YELLOW + 'Running OpenVINO NCS Tensorflow TinyYolo v3 example...' + NOCOLOR) print('\n Displaying image with objects detected in GUI...') print(' Click in the GUI window and hit any key to exit.') ####################### 1. Create ie core and network ####################### # Select the myriad plugin and IRs to be used ie = IECore() net = ie.read_network(model = ir, weights = ir[:-3] + 'bin') # Set up the input blobs input_blob = next(iter(net.inputs)) input_shape = net.inputs[input_blob].shape # Display model information display_info(input_shape, net.outputs, input_stream, ir, labels, detection_threshold, iou_threshold, device, titleflg, speedflg, outpath) # Load the network and get the network input shape information exec_net = ie.load_network(network = net, device_name = device) n, c, network_input_h, network_input_w = input_shape # 入力準備 if (isstream): # カメラ cap = cv2.VideoCapture(input_stream) cap.set(cv2.CAP_PROP_FPS, 30) cap.set(cv2.CAP_PROP_FRAME_WIDTH, WINDOW_SIZE_W) cap.set(cv2.CAP_PROP_FRAME_HEIGHT, WINDOW_SIZE_H) ret, frame = cap.read() loopflg = cap.isOpened() else: # 画像ファイル読み込み frame = cv2.imread(input_stream) if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # アスペクト比を固定してリサイズ img_h, img_w = frame.shape[:2] if (img_w > WINDOW_WIDTH): height = round(img_h * (WINDOW_WIDTH / img_w)) frame = cv2.resize(frame, dsize = (WINDOW_WIDTH, height)) loopflg = True # 1回ループ # 処理結果の記録 step1 if (outpath != 'non'): if (isstream): fps = int(cap.get(cv2.CAP_PROP_FPS)) out_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) out_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') outvideo = cv2.VideoWriter(outpath, fourcc, fps, (out_w, out_h)) # Width and height calculations. These will be used to scale the bounding boxes source_image_width = frame.shape[1] source_image_height = frame.shape[0] scaled_w = int(source_image_width * min(network_input_w/source_image_width, network_input_w/source_image_height)) scaled_h = int(source_image_height * min(network_input_h/source_image_width, network_input_h/source_image_height)) # 計測値初期化 fpsWithTick = mylib.fpsWithTick() frame_count = 0 fps_total = 0 fpsWithTick.get() # fps計測開始 # メインループ while (loopflg): # Make a copy of the original frame. Get the frame's width and height. if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() ####################### 2. Preprocessing ####################### # Image preprocessing # frame = cv2.flip(frame, 1) display_image = frame # Image preprocessing (resize, transpose, reshape) input_image = cv2.resize(frame, (network_input_w, network_input_h), cv2.INTER_LINEAR) input_image = input_image.astype(np.float32) input_image = np.transpose(input_image, (2,0,1)) reshaped_image = input_image.reshape((n, c, network_input_h, network_input_w)) ####################### 3. Perform Inference ####################### # Perform the inference asynchronously req_handle = exec_net.start_async(request_id=0, inputs={input_blob: reshaped_image}) status = req_handle.wait() ####################### 4. Get results ####################### all_output_results = req_handle.outputs ####################### 5. Post processing for results ####################### # Post-processing for tiny yolo v3 # The post process consists of the following steps: # 1. Parse the output and filter out low scores # 2. Filter out duplicates using intersection over union # 3. Draw boxes and text ## 1. Tiny yolo v3 has two outputs and we check/parse both outputs filtered_objects = [] for output_node_results in all_output_results.values(): parseTinyYoloV3Output(output_node_results, filtered_objects, source_image_width, source_image_height, scaled_w, scaled_h, detection_threshold, len(label_list)) ## 2. Filter out duplicate objects from all detected objects filtered_mask = get_duplicate_box_mask(filtered_objects, iou_threshold) ## 3. Draw rectangles and set up display texts for object_index in range(len(filtered_objects)): if filtered_mask[object_index] == True: # get all values from the filtered object list xmin = filtered_objects[object_index][0] ymin = filtered_objects[object_index][1] xmax = filtered_objects[object_index][2] ymax = filtered_objects[object_index][3] confidence = filtered_objects[object_index][4] class_id = filtered_objects[object_index][5] # Set up the text for display cv2.rectangle(display_image,(xmin, ymin), (xmax, ymin+20), LABEL_BG_COLOR, -1) # cv2.putText(display_image, label_list[class_id] + ': %.2f' % confidence, (xmin+5, ymin+15), TEXT_FONT, 0.5, TEXT_COLOR, 1) myfunction.cv2_putText(img = display_image, text = label_list[class_id] + ': %.2f' % confidence, org = (xmin+5, ymin+18), fontFace = fontPIL, fontScale = 14, color = TEXT_COLOR, mode = 0) # Set up the bounding box cv2.rectangle(display_image, (xmin, ymin), (xmax, ymax), BOX_COLOR, 1) # FPSを計算する fps = fpsWithTick.get() st_fps = 'fps: {:>6.2f}'.format(fps) if (speedflg == 'y'): cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1) cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA) # タイトル描画 if (titleflg == 'y'): cv2.putText(display_image, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA) # 画像表示 window_name = title + ' (hit key to exit)' cv2.namedWindow(window_name, cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(window_name, display_image) # 処理結果の記録 step2 if (outpath != 'non'): if (isstream): outvideo.write(display_image) else: cv2.imwrite(outpath, display_image) # 何らかのキーが押されたら終了 breakflg = False while(True): key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' breakflg = True break if not mylib_gui._is_visible(window_name): # 'Close' button breakflg = True break if (isstream): break if ((breakflg == False) and isstream): # 次のフレームを読み出す ret, frame = cap.read() if ret == False: break loopflg = cap.isOpened() else: loopflg = False # 終了処理 if (isstream): cap.release() # 処理結果の記録 step3 if (outpath != 'non'): if (isstream): outvideo.release() cv2.destroyAllWindows() del net del exec_net print('\nFPS average: {:>10.2f}'.format(fpsWithTick.get_average())) print('\n Finished.') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
(py37w) > python .\person-tracking2.py --- Person Tracking 2 --- 3.4.2 OpenVINO inference_engine: 2021.4.2-3974-e2a469a3450-releases/2021/4 Person Tracking 2: Starting application... - Image File : ../../Videos/video003.mp4 - m_detect : ../../model/intel/FP32/person-detection-retail-0013.xml - m_redient. : ../../model/intel/FP32/person-reidentification-retail-0287.xml - Device : CPU - Threshold : 0.8 - Speed flag : y - Processed out: non ------------------- 1.0 (239,60)-(272,147) 1.0 (519,148)-(561,231) 1.0 (44,107)-(80,192) 1.0 (333,136)-(364,213) 1.0 (200,62)-(229,148) 1.0 (298,188)-(348,316) 1.0 (337,71)-(365,146) : FPS average: 6.10 Finished.
# -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ toolkit ## Person Re-Identificationで人物を追跡 ## ## model: person-detection-retail-0013 ## person-reidentification-retail-0287 ## ## 2021.03.10 Masahiro Izutsu ##------------------------------------------ ## 2021.03.25 model/device parameter ## 2021.06.23 fps display ## 2021.12.24 linux/windows import sys import argparse import numpy as np import time import random import cv2 from openvino.inference_engine import get_version from openvino.inference_engine import IECore from model import Model import mylib import mylib_gui # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 MOVIE = "../../Videos/video003.mp4" THRESHOLD= 0.8 TRACKING_MAX=50 SCALE = 1.0 from os.path import expanduser MODEL_DEF_DETECT = expanduser('../../model/intel/FP32/person-detection-retail-0013.xml') MODEL_DEF_REIDE = expanduser('../../model/intel/FP32/person-reidentification-retail-0287.xml') # タイトル・バージョン情報 title = 'Person Tracking 2' print(GREEN) print('--- {} ---'.format(title)) print(cv2.__version__) print("OpenVINO inference_engine:", get_version()) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type=str, default = MOVIE, help = 'Absolute path to movie file or cam for camera stream.') parser.add_argument('-m_dt', '--m_detector', type = str, default = MODEL_DEF_DETECT, help = 'Detector Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_DETECT) parser.add_argument('-m_re', '--m_reidentification', type = str, default = MODEL_DEF_REIDE, help = 'Reidentification Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_REIDE) parser.add_argument('-d', '--device', default = 'CPU', type = str, help = 'Optional. Specify a target device to infer on. CPU, GPU, FPGA, HDDL or MYRIAD is ' 'acceptable. The demo will look for a suitable plugin for the device specified. ' 'Default value is CPU') parser.add_argument('--threshold', metavar = 'FLOAT', type = float, default = THRESHOLD, help = 'Threshold for detection.') parser.add_argument('-s', '--speed', metavar = 'SPEED', default = 'y', help = 'Speed display flag.(y/n) Default calue is \'y\'') parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT', default = 'non', help = 'Processed image file path. Default value is \'non\'') return parser class PersonDetector(Model): def __init__(self, model_path, device, ie_core, threshold, num_requests): super().__init__(model_path, device, ie_core, num_requests, None) _, _, h, w = self.input_size self.__input_height = h self.__input_width = w self.__threshold = threshold def __prepare_frame(self, frame): initial_h, initial_w = frame.shape[:2] scale_h, scale_w = initial_h / float(self.__input_height), initial_w / float(self.__input_width) in_frame = cv2.resize(frame, (self.__input_width, self.__input_height)) in_frame = in_frame.transpose((2, 0, 1)) in_frame = in_frame.reshape(self.input_size) return in_frame, scale_h, scale_w def infer(self, frame): in_frame, _, _ = self.__prepare_frame(frame) result = super().infer(in_frame) detections = [] height, width = frame.shape[:2] for r in result[0][0]: conf = r[2] if(conf > self.__threshold): x1 = int(r[3] * width) y1 = int(r[4] * height) x2 = int(r[5] * width) y2 = int(r[6] * height) detections.append([x1, y1, x2, y2, conf]) return detections class PersonReidentification(Model): def __init__(self, model_path, device, ie_core, threshold, num_requests): super().__init__(model_path, device, ie_core, num_requests, None) _, _, h, w = self.input_size self.__input_height = h self.__input_width = w self.__threshold = threshold def __prepare_frame(self, frame): initial_h, initial_w = frame.shape[:2] scale_h, scale_w = initial_h / float(self.__input_height), initial_w / float(self.__input_width) in_frame = cv2.resize(frame, (self.__input_width, self.__input_height)) in_frame = in_frame.transpose((2, 0, 1)) in_frame = in_frame.reshape(self.input_size) return in_frame, scale_h, scale_w def infer(self, frame): in_frame, _, _ = self.__prepare_frame(frame) result = super().infer(in_frame) return np.delete(result, 1) class Tracker: def __init__(self): # 識別情報のDB self.identifysDb = None # 中心位置のDB self.center = [] def __getCenter(self, person): x = person[0] - person[2] y = person[1] - person[3] return (x,y) def __getDistance(self, person, index): (x1, y1) = self.center[index] (x2, y2) = self.__getCenter(person) a = np.array([x1, y1]) b = np.array([x2, y2]) u = b - a return np.linalg.norm(u) def __isOverlap(self, persons, index): [x1, y1, x2, y2] = persons[index] for i, person in enumerate(persons): if(index == i): continue if(max(person[0], x1) <= min(person[2], x2) and max(person[1], y1) <= min(person[3], y2)): return True return False def getIds(self, identifys, persons): if(identifys.size==0): return [] if self.identifysDb is None: self.identifysDb = identifys for person in persons: self.center.append(self.__getCenter(person)) print("input: {} DB:{}".format(len(identifys), len(self.identifysDb))) similaritys = self.__cos_similarity(identifys, self.identifysDb) similaritys[np.isnan(similaritys)] = 0 ids = np.nanargmax(similaritys, axis=1) for i, similarity in enumerate(similaritys): persionId = ids[i] d = self.__getDistance(persons[i], persionId) print("persionId:{} {} distance:{}".format(persionId,similarity[persionId], d)) # 0.95以上で、重なりの無い場合、識別情報を更新する if(similarity[persionId] > 0.95): if(self.__isOverlap(persons, i) == False): self.identifysDb[persionId] = identifys[i] # 0.5以下で、距離が離れている場合、新規に登録する elif(similarity[persionId] < 0.5): if(d > 500): print("distance:{} similarity:{}".format(d, similarity[persionId])) self.identifysDb = np.vstack((self.identifysDb, identifys[i])) self.center.append(self.__getCenter(persons[i])) ids[i] = len(self.identifysDb) - 1 print("> append DB size:{}".format(len(self.identifysDb))) print(ids) # 重複がある場合は、信頼度の低い方を無効化する for i, a in enumerate(ids): for e, b in enumerate(ids): if(e == i): continue if(a == b): if(similarity[a] > similarity[b]): ids[i] = -1 else: ids[e] = -1 print(ids) return ids # コサイン類似度 # 参考にさせて頂きました: https://github.com/kodamap/person_reidentification def __cos_similarity(self, X, Y): m = X.shape[0] Y = Y.T return np.dot(X, Y) / ( np.linalg.norm(X.T, axis=0).reshape(m, 1) * np.linalg.norm(Y, axis=0) ) # モデル基本情報の表示 def display_info(image, detector, reidentification, device, threshold, speedflg, outpath): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'm_detect : ' + NOCOLOR, detector) print(' - ' + YELLOW + 'm_redient. : ' + NOCOLOR, reidentification) print(' - ' + YELLOW + 'Device : ' + NOCOLOR, device) print(' - ' + YELLOW + 'Threshold : ' + NOCOLOR, threshold) print(' - ' + YELLOW + 'Speed flag : ' + NOCOLOR, speedflg) print(' - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath) # 画像の種類を判別する # 戻り値: 'jeg''png'... 画像ファイル # 'None' 画像ファイル以外 (動画ファイル) # 'NotFound' ファイルが存在しない import imghdr def is_pict(filename): try: imgtype = imghdr.what(filename) except FileNotFoundError as e: imgtype = 'NotFound' return str(imgtype) # ** main関数 ** def main(): # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image if ARGS.image.lower() == "cam" or ARGS.image.lower() == "camera": input_stream = 0 else: filetype = is_pict(input_stream) if (filetype == 'NotFound' or filetype !='None'): print(RED + "\ninput file Not found." + NOCOLOR) quit() isstream = True detection_threshold = ARGS.threshold speedflg = ARGS.speed model_detector=ARGS.m_detector model_reidentification=ARGS.m_reidentification outpath = ARGS.out device = ARGS.device cpu_extension = None ie_core = IECore() if device == "CPU" and cpu_extension: ie_core.add_extension(cpu_extension, "CPU") # 情報表示 display_info(input_stream, model_detector, model_reidentification, device, detection_threshold, speedflg, outpath) person_detector = PersonDetector(model_detector, device, ie_core, detection_threshold, num_requests=2) personReidentification = PersonReidentification(model_reidentification, device, ie_core, detection_threshold, num_requests=2) tracker = Tracker() # 入力準備 cap = cv2.VideoCapture (input_stream) # 処理結果の記録 step1 if (outpath != 'non'): if (isstream): fps = int(cap.get(cv2.CAP_PROP_FPS)) out_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) out_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') outvideo = cv2.VideoWriter(outpath, fourcc, fps, (out_w, out_h)) colors = [] for i in range(TRACKING_MAX): b = random.randint(0, 255) g = random.randint(0, 255) r = random.randint(0, 255) colors.append((b,g,r)) # 計測値初期化 fpsWithTick = mylib.fpsWithTick() frame_count = 0 fps_total = 0 fpsWithTick.get() # fps計測開始 # メインループ while True: grabbed, frame = cap.read() if not grabbed:# ループ再生 break if(frame is None): break # Personを検知する persons = [] detections = person_detector.infer(frame) if(len(detections) > 0): print("-------------------") for detection in detections: x1 = int(detection[0]) y1 = int(detection[1]) x2 = int(detection[2]) y2 = int(detection[3]) conf = detection[4] print("{:.1f} ({},{})-({},{})".format(conf, x1, y1, x2, y2)) persons.append([x1,y1,x2,y2]) print("====================") # 各Personの画像から識別情報を取得する identifys = np.zeros((len(persons), 255)) for i, person in enumerate(persons): # 各Personのimage取得 img = frame[person[1] : person[3], person[0]: person[2]] h, w = img.shape[:2] if(h==0 or w==0): continue # identification取得 identifys[i] = personReidentification.infer(img) # Idの取得 ids = tracker.getIds(identifys, persons) # 枠及びIdを画像に追加 for i, person in enumerate(persons): if(ids[i]!=-1): color = colors[int(ids[i])] frame = cv2.rectangle(frame, (person[0], person[1]), (person[2] ,person[3]), color, int(2)) frame = cv2.putText(frame, str(ids[i]), (person[0], person[1]), cv2.FONT_HERSHEY_PLAIN, int(2), color, int(2), cv2.LINE_AA) # 画像の縮小 h, w = frame.shape[:2] frame = cv2.resize(frame, ((int(w * SCALE), int(h * SCALE)))) # FPSを計算する fps = fpsWithTick.get() st_fps = 'fps: {:>6.2f}'.format(fps) if (speedflg == 'y'): cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1) cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA) # 画像の表示 window_name = title + ' (hit key to exit)' cv2.namedWindow(window_name, cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(window_name, frame) # 処理結果の記録 step2 if (outpath != 'non'): if (isstream): outvideo.write(frame) else: cv2.imwrite(outpath, frame) key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' break if not mylib_gui._is_visible(window_name): # 'Close' button break # 終了処理 if (isstream): cap.release() # 処理結果の記録 step3 if (outpath != 'non'): if (isstream): outvideo.release() cv2.destroyAllWindows() print('\nFPS average: {:>10.2f}'.format(fpsWithTick.get_average())) print('\n Finished.') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
(py37w) PS > python .\face-tracking2.py --- Face Tracking 2 --- 3.4.2 OpenVINO inference_engine: 2021.4.2-3974-e2a469a3450-releases/2021/4 Face Tracking 2: Starting application... - Image File : ../../Videos/video005.mp4 - m_detect : ../../model/intel/FP32/face-detection-0200.xml - m_redient. : ../../model/intel/FP32/face-reidentification-retail-0095.xml - Device : CPU - Threshold : 0.5 - Speed flag : y - Processed out: non ------------------- 1.0 (447,91)-(466,112) 1.0 (495,104)-(518,129) 0.8 (129,66)-(154,95) 0.5 (553,133)-(578,158) 0.5 (315,103)-(334,128) ==================== input: 5 DB:5 persionId:0 0.9999999999999997 conf:0.9679621458053589 persionId:1 0.999999999999999 conf:0.9509861469268799 : FPS average: 19.30 Finished.
# -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ toolkit ## Person Re-Identificationで顔を追跡 ## ## model: face-detection-0200 ## face-reidentification-retail-0095 ## ## 2021.03.11 Masahiro Izutsu ##------------------------------------------ ## 2021.03.25 model/device parameter ## 2021.06.23 fps display ## 2021.12.24 linux/windows import sys import argparse import numpy as np import time import random import cv2 from openvino.inference_engine import get_version from openvino.inference_engine import IECore from model import Model import mylib import mylib_gui # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 MOVIE = "../../Videos/video005.mp4" THRESHOLD= 0.5 TRACKING_MAX=50 SCALE = 1.0 from os.path import expanduser MODEL_DEF_DETECT = expanduser('../../model/intel/FP32/face-detection-0200.xml') MODEL_DEF_REIDE = expanduser('../../model/intel/FP32/face-reidentification-retail-0095.xml') # タイトル・バージョン情報 title = 'Face Tracking 2' print(GREEN) print('--- {} ---'.format(title)) print(cv2.__version__) print("OpenVINO inference_engine:", get_version()) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = MOVIE, help = 'Absolute path to movie file or cam for camera stream.') parser.add_argument('-m_dt', '--m_detector', type = str, default = MODEL_DEF_DETECT, help = 'Detector Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_DETECT) parser.add_argument('-m_re', '--m_reidentification', type = str, default = MODEL_DEF_REIDE, help = 'Reidentification Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_REIDE) parser.add_argument('-d', '--device', default = 'CPU', type = str, help = 'Optional. Specify a target device to infer on. CPU, GPU, FPGA, HDDL or MYRIAD is ' 'acceptable. The demo will look for a suitable plugin for the device specified. ' 'Default value is CPU') parser.add_argument('--threshold', metavar = 'FLOAT', type = float, default = THRESHOLD, help = 'Threshold for detection.') parser.add_argument('-s', '--speed', metavar = 'SPEED', default = 'y', help = 'Speed display flag.(y/n) Default calue is \'y\'') parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT', default = 'non', help = 'Processed image file path. Default value is \'non\'') return parser class FaceDetector(Model): def __init__(self, model_path, device, ie_core, threshold, num_requests): super().__init__(model_path, device, ie_core, num_requests, None) _, _, h, w = self.input_size self.__input_height = h self.__input_width = w self.__threshold = threshold def __prepare_frame(self, frame): # shape: [1x3x256x256] - An input image in the format [BxCxHxW], where: initial_h, initial_w = frame.shape[:2] scale_h, scale_w = initial_h / float(self.__input_height), initial_w / float(self.__input_width) in_frame = cv2.resize(frame, (self.__input_width, self.__input_height)) in_frame = in_frame.transpose((2, 0, 1)) in_frame = in_frame.reshape(self.input_size) return in_frame, scale_h, scale_w def infer(self, frame): in_frame, _, _ = self.__prepare_frame(frame) result = super().infer(in_frame) # The net outputs blob with shape: [1, 1, N, 7], facese = [] height, width = frame.shape[:2] for d in result[0][0]: if(d[2]>self.__threshold): face = [ int(d[3] * width), int(d[4] * height), int(d[5] * width), int(d[6] * height), d[2] ] facese.append(face) return facese class FaceReidentification(Model): def __init__(self, model_path, device, ie_core, threshold, num_requests): super().__init__(model_path, device, ie_core, num_requests, None) _, _, h, w = self.input_size self.__input_height = h self.__input_width = w def __prepare_frame(self, frame): initial_h, initial_w = frame.shape[:2] scale_h, scale_w = initial_h / float(self.__input_height), initial_w / float(self.__input_width) in_frame = cv2.resize(frame, (self.__input_width, self.__input_height)) in_frame = in_frame.transpose((2, 0, 1)) in_frame = in_frame.reshape(self.input_size) return in_frame, scale_h, scale_w def infer(self, frame): in_frame, _, _ = self.__prepare_frame(frame) result = super().infer(in_frame) # (1, 256, 1, 1) => (256) return result[0, :, 0, 0] class Tracker: def __init__(self): # 識別情報のDB self.identifysDb = None # 顔の信頼度のDB self.conf = [] def getIds(self, identifys, persons): if(identifys.size==0): return [] if self.identifysDb is None: self.identifysDb = identifys for person in persons: self.conf.append(person[4]) print("input: {} DB:{}".format(len(identifys), len(self.identifysDb))) similaritys = self.__cos_similarity(identifys, self.identifysDb) similaritys[np.isnan(similaritys)] = 0 ids = np.nanargmax(similaritys, axis=1) for i, similarity in enumerate(similaritys): persionId = ids[i] print("persionId:{} {} conf:{}".format(persionId,similarity[persionId], persons[i][4])) # 0.9以上で、顔検出の信頼度が既存のものより高い場合、識別情報を更新する if(similarity[persionId] > 0.9 and persons[i][4] > self.conf[persionId]): print("? refresh id:{} conf:{}".format(persionId, persons[i][4])) self.identifysDb[persionId] = identifys[i] # 0.3以下の場合、追加する elif(similarity[persionId] < 0.3): self.identifysDb = np.vstack((self.identifysDb, identifys[i])) self.conf.append(persons[i][4]) ids[i] = len(self.identifysDb) - 1 print("append id:{} similarity:{}".format(ids[i], similarity[persionId])) print(ids) # 重複がある場合は、信頼度の低い方を無効化する(今回、この可能性は低い) for i, a in enumerate(ids): for e, b in enumerate(ids): if(e == i): continue if(a == b): if(similarity[a] > similarity[b]): ids[i] = -1 else: ids[e] = -1 print(ids) return ids # コサイン類似度 # 参考にさせて頂きました: https://github.com/kodamap/person_reidentification def __cos_similarity(self, X, Y): m = X.shape[0] Y = Y.T return np.dot(X, Y) / ( np.linalg.norm(X.T, axis=0).reshape(m, 1) * np.linalg.norm(Y, axis=0) ) # モデル基本情報の表示 def display_info(image, detector, reidentification, device, threshold, speedflg, outpath): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'm_detect : ' + NOCOLOR, detector) print(' - ' + YELLOW + 'm_redient. : ' + NOCOLOR, reidentification) print(' - ' + YELLOW + 'Device : ' + NOCOLOR, device) print(' - ' + YELLOW + 'Threshold : ' + NOCOLOR, threshold) print(' - ' + YELLOW + 'Speed flag : ' + NOCOLOR, speedflg) print(' - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath) # 画像の種類を判別する # 戻り値: 'jeg''png'... 画像ファイル # 'None' 画像ファイル以外 (動画ファイル) # 'NotFound' ファイルが存在しない import imghdr def is_pict(filename): try: imgtype = imghdr.what(filename) except FileNotFoundError as e: imgtype = 'NotFound' return str(imgtype) # ** main関数 ** def main(): # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image if ARGS.image.lower() == "cam" or ARGS.image.lower() == "camera": input_stream = 0 else: filetype = is_pict(input_stream) if (filetype == 'NotFound' or filetype !='None'): print(RED + "\ninput file Not found." + NOCOLOR) quit() isstream = True detection_threshold = ARGS.threshold speedflg = ARGS.speed model_detector=ARGS.m_detector model_reidentification=ARGS.m_reidentification outpath = ARGS.out device = ARGS.device cpu_extension = None ie_core = IECore() if device == "CPU" and cpu_extension: ie_core.add_extension(cpu_extension, "CPU") # 情報表示 display_info(input_stream, model_detector, model_reidentification, device, detection_threshold, speedflg, outpath) face_detector = FaceDetector(model_detector, device, ie_core, THRESHOLD, num_requests=2) faceReidentification = FaceReidentification(model_reidentification, device, ie_core, THRESHOLD, num_requests=2) tracker = Tracker() # 入力準備 cap = cv2.VideoCapture (input_stream) # 処理結果の記録 step1 if (outpath != 'non'): if (isstream): fps = int(cap.get(cv2.CAP_PROP_FPS)) out_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) out_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') outvideo = cv2.VideoWriter(outpath, fourcc, fps, (out_w, out_h)) colors = [] for i in range(TRACKING_MAX): b = random.randint(0, 25) * 10 g = random.randint(0, 25) * 10 r = random.randint(0, 25) * 10 colors.append((b,g,r)) # 計測値初期化 fpsWithTick = mylib.fpsWithTick() frame_count = 0 fps_total = 0 fpsWithTick.get() # fps計測開始 while True: grabbed, frame = cap.read() if not grabbed:# ループ再生 break if(frame is None): break # Personを検知する persons = [] faces = face_detector.infer(frame) if(len(faces) > 0): print("-------------------") for face in faces: x1 = int(face[0]) y1 = int(face[1]) x2 = int(face[2]) y2 = int(face[3]) conf = face[4] print("{:.1f} ({},{})-({},{})".format(conf, x1, y1, x2, y2)) persons.append([x1, y1, x2, y2, conf]) print("====================") # 各顔画像から識別情報を取得する identifys = np.zeros((len(persons), 256)) for i, person in enumerate(persons): # 各顔画像の取得 img = frame[person[1] : person[3], person[0]: person[2]] h, w = img.shape[:2] if(h==0 or w==0): continue # 類似度の取得 identifys[i] = faceReidentification.infer(img) #インデックスの取得 ids = tracker.getIds(identifys, persons) #枠及びインデックスを画像に追加 for i, person in enumerate(persons): if(ids[i]!=-1): color = colors[int(ids[i])] frame = cv2.rectangle(frame, (person[0], person[1]), (person[2] ,person[3]), color, int(2)) frame = cv2.putText(frame, str(ids[i]), (person[0], person[1]), cv2.FONT_HERSHEY_PLAIN, int(2), color, int(2), cv2.LINE_AA ) # 画像の縮小 h, w = frame.shape[:2] frame = cv2.resize(frame, ((int(w * SCALE), int(h * SCALE)))) # FPSを計算する fps = fpsWithTick.get() st_fps = 'fps: {:>6.2f}'.format(fps) if (speedflg == 'y'): cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1) cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA) # 画像の表示 window_name = title + ' (hit key to exit)' cv2.namedWindow(window_name, cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(window_name, frame) # 処理結果の記録 step2 if (outpath != 'non'): if (isstream): outvideo.write(frame) else: cv2.imwrite(outpath, frame) key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' break if not mylib_gui._is_visible(window_name): # 'Close' button break # 終了処理 if (isstream): cap.release() # 処理結果の記録 step3 if (outpath != 'non'): if (isstream): outvideo.release() cv2.destroyAllWindows() print('\nFPS average: {:>10.2f}'.format(fpsWithTick.get_average())) print('\n Finished.') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
(py37w) PS > python .\sentiment_analysis2.py --- Real-time sentiment analysis 2 --- 3.4.2 OpenVINO inference_engine: 2021.4.2-3974-e2a469a3450-releases/2021/4 Real-time sentiment analysis 2: Starting application... - Image File : 0 - m_detect : ../../model/intel/FP32/face-detection-retail-0004.xml - m_recognition: ../../model/intel/FP32/emotions-recognition-retail-0003.xml - Device : CPU - Language : jp - Program Title: y - Speed flag : y - Processed out: non FPS average: 30.30 Finished.
# -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ toolkit ## Real-time sentiment analysis ## ## model: face-detection-retail-0004 ## emotions-recognition-retail-0003 ## ## 2021.03.25 Masahiro Izutsu ##------------------------------------------ ## 2021.06.23 fps display ## 2021.12.24 linux/windows # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 WINDOW_WIDTH = 640 TEXT_COLOR = (255, 255, 255) # white text from os.path import expanduser MODEL_DEF_DETECT = expanduser('../../model/intel/FP32/face-detection-retail-0004.xml') MODEL_DEF_EMO = expanduser('../../model/intel/FP32/emotions-recognition-retail-0003.xml') # import処理 import sys import cv2 import numpy as np import argparse import myfunction import mylib import mylib_gui import platform from pngoverlay import PNGOverlay # モジュール読み込み from openvino.inference_engine import IECore from openvino.inference_engine import get_version # タイトル・バージョン情報 title = 'Real-time sentiment analysis 2' print(GREEN) print('--- {} ---'.format(title)) print(cv2.__version__) print("OpenVINO inference_engine:", get_version()) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = 'cam', help = 'Absolute path to image file or cam for camera stream.') parser.add_argument('-m_dt', '--m_detector', type = str, default = MODEL_DEF_DETECT, help = 'Detector Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_DETECT) parser.add_argument('-m_re', '--m_recognition', type = str, default = MODEL_DEF_EMO, help = 'Recognition Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_EMO) parser.add_argument('-d', '--device', default = 'CPU', type = str, help = 'Optional. Specify a target device to infer on. CPU, GPU, FPGA, HDDL or MYRIAD is ' 'acceptable. The demo will look for a suitable plugin for the device specified. ' 'Default value is CPU') parser.add_argument('-l', '--language', metavar = 'LANGUAGE', default = 'jp', help = 'Language.(jp/en) Default value is \'jp\'') parser.add_argument('-t', '--title', metavar = 'TITLE', default = 'y', help = 'Program title flag.(y/n) Default value is \'y\'') parser.add_argument('-s', '--speed', metavar = 'SPEED', default = 'y', help = 'Speed display flag.(y/n) Default calue is \'y\'') parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT', default = 'non', help = 'Processed image file path. Default value is \'non\'') return parser # モデル基本情報の表示 def display_info(image, detector, recognition, device, lang, titleflg, speedflg, outpath): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'm_detect : ' + NOCOLOR, detector) print(' - ' + YELLOW + 'm_recognition: ' + NOCOLOR, recognition) print(' - ' + YELLOW + 'Device : ' + NOCOLOR, device) print(' - ' + YELLOW + 'Language : ' + NOCOLOR, lang) print(' - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg) print(' - ' + YELLOW + 'Speed flag : ' + NOCOLOR, speedflg) print(' - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath) # 画像の種類を判別する # 戻り値: 'jeg''png'... 画像ファイル # 'None' 画像ファイル以外 (動画ファイル) # 'NotFound' ファイルが存在しない import imghdr def is_pict(filename): try: imgtype = imghdr.what(filename) except FileNotFoundError as e: imgtype = 'NotFound' return str(imgtype) # ** main関数 ** def main(): # 日本語フォント指定 if platform.system()=='Windows': fontPIL = 'meiryo.ttc' # ゴシック体 else: fontPIL = 'NotoSansCJK-Bold.ttc' # ゴシック体 # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image lang = ARGS.language titleflg = ARGS.title speedflg = ARGS.speed if ARGS.image.lower() == "cam" or ARGS.image.lower() == "camera": input_stream = 0 isstream = True else: filetype = is_pict(input_stream) isstream = filetype == 'None' if (filetype == 'NotFound'): print(RED + "\ninput file Not found." + NOCOLOR) quit() model_detector=ARGS.m_detector model_recognition=ARGS.m_recognition device = ARGS.device outpath = ARGS.out # 感情ラベル if (lang == 'jp'): list_emotion = ['平静', '嬉しい', '悲しい', '驚き', '怒り'] else: list_emotion = ['neutral', 'happy', 'sad', 'surprise', 'anger'] # 感情色ラベル color_emotion = [(255, 255, 0), ( 0, 255, 0), ( 0, 255, 255), (255, 0, 255), ( 0, 0, 255)] bkcolor_emotion = [(120, 120, 70), ( 70, 120, 70), ( 70, 120, 120), (120, 70, 120), ( 70, 70, 120)] textcolor_emotion = [(255, 255, 255), (255, 255, 255), (255, 255, 255), (255, 255, 255), (255, 255, 255)] # インスタンス生成 icon_neutral = PNGOverlay('../image/icon_neutral.png') icon_happy = PNGOverlay('../image/icon_happy.png') icon_sad = PNGOverlay('../image/icon_sad.png') icon_surprise = PNGOverlay('../image/icon_surprise.png') icon_anger = PNGOverlay('../image/icon_anger.png') # インスタンス変数をリストにまとめる icon_emotion = [icon_neutral, icon_happy, icon_sad, icon_surprise, icon_anger] # モデルの読み込み (顔検出) ie = IECore() net = ie.read_network(model=model_detector, weights=model_detector[:-4] + '.bin') exec_net = ie.load_network(network=net, device_name=device) # モデルの読み込み (感情分類) net_emotion = ie.read_network(model=model_recognition, weights=model_recognition[:-4] + '.bin') exec_net_emotion = ie.load_network(network=net_emotion, device_name=device) # 情報表示 display_info(input_stream, model_detector, model_recognition, device, lang, titleflg, speedflg, outpath) # 入力準備 if (isstream): # カメラ cap = cv2.VideoCapture(input_stream) ret, frame = cap.read() loopflg = cap.isOpened() else: # 画像ファイル読み込み frame = cv2.imread(input_stream) if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # アスペクト比を固定してリサイズ img_h, img_w = frame.shape[:2] if (img_w > WINDOW_WIDTH): height = round(img_h * (WINDOW_WIDTH / img_w)) frame = cv2.resize(frame, dsize = (WINDOW_WIDTH, height)) loopflg = True # 1回ループ # 処理結果の記録 step1 if (outpath != 'non'): if (isstream): fps = int(cap.get(cv2.CAP_PROP_FPS)) out_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) out_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') outvideo = cv2.VideoWriter(outpath, fourcc, fps, (out_w, out_h)) # 計測値初期化 fpsWithTick = mylib.fpsWithTick() frame_count = 0 fps_total = 0 fpsWithTick.get() # fps計測開始 # メインループ while (loopflg): if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # 入力データフォーマットへ変換 img = cv2.resize(frame, (300, 300)) # サイズ変更 img = img.transpose((2, 0, 1)) # HWC > CHW img = np.expand_dims(img, axis=0) # 次元合せ # 推論実行 out = exec_net.infer(inputs={'data': img}) # 出力から必要なデータのみ取り出し out = out['detection_out'] out = np.squeeze(out) #サイズ1の次元を全て削除 # 検出されたすべての顔領域に対して1つずつ処理 for detection in out: # conf値の取得 confidence = float(detection[2]) # バウンディングボックス座標を入力画像のスケールに変換 xmin = int(detection[3] * frame.shape[1]) ymin = int(detection[4] * frame.shape[0]) xmax = int(detection[5] * frame.shape[1]) ymax = int(detection[6] * frame.shape[0]) # conf値が0.5より大きい場合のみ感情推論とバウンディングボックス表示 if confidence > 0.5: # 顔検出領域はカメラ範囲内に補正する。特にminは補正しないとエラーになる if xmin < 0: xmin = 0 if ymin < 0: ymin = 0 if xmax > frame.shape[1]: xmax = frame.shape[1] if ymax > frame.shape[0]: ymax = frame.shape[0] # 顔領域のみ切り出し frame_face = frame[ ymin:ymax, xmin:xmax ] # 入力データフォーマットへ変換 img = cv2.resize(frame_face, (64, 64)) # サイズ変更 img = img.transpose((2, 0, 1)) # HWC > CHW img = np.expand_dims(img, axis=0) # 次元合せ # 推論実行 out = exec_net_emotion.infer(inputs={'data': img}) # 出力から必要なデータのみ取り出し out = out['prob_emotion'] out = np.squeeze(out) #不要な次元の削減 # 出力値が最大のインデックスを得る emoid = np.argmax(out) emotion = list_emotion[emoid] # 文字列描画 cv2.rectangle(frame, (10, frame.shape[0] - 242), (100, frame.shape[0] - 218), bkcolor_emotion[emoid], -1) # cv2.putText(frame, list_emotion[index_max], (20, 60), cv2.FONT_HERSHEY_SIMPLEX, 2, (255, 255, 255), 4) myfunction.cv2_putText(img = frame, text = emotion, org = (20, frame.shape[0] - 220), fontFace = fontPIL, fontScale = 20, color = textcolor_emotion[emoid], mode = 0) # バウンディングボックス表示 cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=(240, 180, 0), thickness=3) # 棒グラフ表示 str_emotion = ['neu', 'hap', 'sad', 'sur', 'ang'] text_x = 10 text_y = frame.shape[0] - 180 rect_x = 80 rect_y = frame.shape[0] - 200 for i in range(5): cv2.putText(frame, str_emotion[i], (text_x, text_y), cv2.FONT_HERSHEY_SIMPLEX, 1, (240, 180, 0), 2) cv2.rectangle(frame, (rect_x, rect_y), (rect_x + int(300 * out[i]), rect_y + 20), color=(240, 180, 0), thickness=-1) text_y = text_y + 40 rect_y = rect_y + 40 # 顔アイコン表示 icon_emotion[emoid].show(frame, frame.shape[1] - 110, frame.shape[0] - 110) # 1つの顔で終了 break # FPSを計算する fps = fpsWithTick.get() st_fps = 'fps: {:>6.2f}'.format(fps) if (speedflg == 'y'): cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1) cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA) # タイトル描画 if (titleflg == 'y'): cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA) # 画像表示 window_name = title + ' (hit key to exit)' cv2.namedWindow(window_name, cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(window_name, frame) # 処理結果の記録 step2 if (outpath != 'non'): if (isstream): outvideo.write(frame) else: cv2.imwrite(outpath, frame) # 何らかのキーが押されたら終了 breakflg = False while(True): key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' breakflg = True break if not mylib_gui._is_visible(window_name): # 'Close' button breakflg = True break if (isstream): break if ((breakflg == False) and isstream): # 次のフレームを読み出す ret, frame = cap.read() if ret == False: break loopflg = cap.isOpened() else: loopflg = False # 終了処理 if (isstream): cap.release() # 処理結果の記録 step3 if (outpath != 'non'): if (isstream): outvideo.release() cv2.destroyAllWindows() print('\nFPS average: {:>10.2f}'.format(fpsWithTick.get_average())) print('\n Finished.') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
(py37w) PS > python .\image_classification.py --- Image Classification --- 3.4.2 OpenVINO inference_engine: 2021.4.2-3974-e2a469a3450-releases/2021/4 Image Classification: Starting application... - Image File : 0 - Model : ../../model/public/FP32/squeezenet1.1.xml - Device : CPU - Label : ./synset_words_jp.txt - Program Title: y - Speed flag : y - Processed out: non FPS average: 25.80 Finished.
# -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ toolkit ## Image Classification ## ## model: squeezenet1.1 ## ## 2021.04.12 Masahiro Izutsu ##------------------------------------------ ## image_classification.py ## 2021.06.23 fps display ## 2022.01.04 linux/windows # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 WINDOW_WIDTH = 640 TEXT_COLOR = (255, 255, 255) # white text from os.path import expanduser MODEL_DEF = expanduser('../../model/public/FP32/squeezenet1.1.xml') # モジュール読み込み from openvino.inference_engine import IECore from openvino.inference_engine import get_version # import処理 import sys import cv2 import numpy as np import argparse import myfunction import mylib import mylib_gui import platform # タイトル・バージョン情報 title = 'Image Classification' print(GREEN) print('--- {} ---'.format(title)) print(cv2.__version__) print("OpenVINO inference_engine:", get_version()) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = 'cam', help = 'Absolute path to image file or cam for camera stream.') parser.add_argument('-m', '--model', type=str, default = MODEL_DEF, help = 'Model Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF) parser.add_argument('-d', '--device', default='CPU', type=str, help = 'Specify a target device to infer on. CPU, GPU, FPGA, HDDL or MYRIAD is ' 'acceptable. The demo will look for a suitable plugin for the device specified. ' 'Default value is CPU') parser.add_argument('-l', '--label', metavar = 'LABEL', default = './synset_words_jp.txt', help = 'Absolute path to labels file.' 'Default value is c./synset_words_jp.txt') parser.add_argument('-t', '--title', metavar = 'TITLE', default = 'y', help = 'Program title flag.(y/n) Default value is \'y\'') parser.add_argument('-s', '--speed', metavar = 'SPEED', default = 'y', help = 'Speed display flag.(y/n) Default calue is \'y\'') parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT', default = 'non', help = 'Processed image file path. Default value is \'non\'') return parser # モデル基本情報の表示 def display_info(image, model, device, label, titleflg, speedflg, outpath): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'Model : ' + NOCOLOR, model) print(' - ' + YELLOW + 'Device : ' + NOCOLOR, device) print(' - ' + YELLOW + 'Label : ' + NOCOLOR, label) print(' - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg) print(' - ' + YELLOW + 'Speed flag : ' + NOCOLOR, speedflg) print(' - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath) # 画像の種類を判別する # 戻り値: 'jeg''png'... 画像ファイル # 'None' 画像ファイル以外 (動画ファイル) # 'NotFound' ファイルが存在しない import imghdr def is_pict(filename): try: imgtype = imghdr.what(filename) except FileNotFoundError as e: imgtype = 'NotFound' return str(imgtype) # ** main関数 ** def main(): # 日本語フォント指定 if platform.system()=='Windows': fontPIL = 'meiryo.ttc' # ゴシック体 else: fontPIL = 'NotoSansCJK-Bold.ttc' # ゴシック体 # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image label_path = ARGS.label titleflg = ARGS.title speedflg = ARGS.speed if ARGS.image.lower() == "cam" or ARGS.image.lower() == "camera": input_stream = 0 isstream = True else: filetype = is_pict(input_stream) isstream = filetype == 'None' if (filetype == 'NotFound'): print(RED + "\ninput file Not found." + NOCOLOR) quit() model = ARGS.model device = ARGS.device outpath = ARGS.out # ラベル読み込み labels = np.loadtxt(label_path,encoding='utf-8', dtype='str', delimiter='\n') # モデルの読み込み ie = IECore() net = ie.read_network(model = model, weights = model[:-4] + '.bin') exec_net = ie.load_network(network = net, device_name = device) # 入力データと出力データのキーを取得 input_blob = net.input_info['data'].name out_blob = next(iter(net.outputs)) # 情報表示 display_info(input_stream, model, device, label_path, titleflg, speedflg, outpath) # 入力準備 if (isstream): # カメラ cap = cv2.VideoCapture(input_stream) ret, frame = cap.read() loopflg = cap.isOpened() img_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) img_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) else: # 画像ファイル読み込み frame = cv2.imread(input_stream) if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # アスペクト比を固定してリサイズ img_h, img_w = frame.shape[:2] if (img_w > WINDOW_WIDTH): height = round(img_h * (WINDOW_WIDTH / img_w)) frame = cv2.resize(frame, dsize = (WINDOW_WIDTH, height)) loopflg = True # 1回ループ img_width = frame.shape[1] img_height = frame.shape[0] # 処理結果の記録 step1 if (outpath != 'non'): if (isstream): fps = int(cap.get(cv2.CAP_PROP_FPS)) out_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) out_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') outvideo = cv2.VideoWriter(outpath, fourcc, fps, (out_w, out_h)) # 計測値初期化 fpsWithTick = mylib.fpsWithTick() frame_count = 0 fps_total = 0 fpsWithTick.get() # fps計測開始 # メインループ while (loopflg): if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # 入力データフォーマットへ変換 img = cv2.resize(frame, (227, 227)) # HeightとWidth変更 img = img.transpose((2, 0, 1)) # HWC > CHW img = np.expand_dims(img, axis=0) # CHW > BCHW # 推論実行 out = exec_net.infer(inputs={input_blob : img}) # 出力から必要なデータのみ取り出し out = out[out_blob] # 不要な次元を削減 out = np.squeeze(out) # 降順でベスト3のインデックスを抽出 index_order = np.argsort(out)[::-1][:3] # テキスト表示位置y座標初期値 text_y = img_height - 80 # ベスト3のインデックスについてラベルと値を表示 for index in index_order: # 左側の文字列10文字は取り除いてラベルを取得 label = labels[index] label = label[10:] # outを百分率にして小数点2桁以下は丸めて、文字列化 value = out[index] * 100 value = round(value, 1) value = str(value) + '% ' # 文字の表示 # cv2.putText(frame, value + label, (10, text_y), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (240, 180, 0), 2) myfunction.cv2_putText(img = frame, text = value + label, org = (10, text_y), fontFace = fontPIL, fontScale = 18, color = (240, 180, 0), mode = 0) # テキスト表示位置y座標増加 text_y = text_y + 30 # FPSを計算する fps = fpsWithTick.get() st_fps = 'fps: {:>6.2f}'.format(fps) if (speedflg == 'y'): cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1) cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA) # タイトル描画 if (titleflg == 'y'): cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA) # 画像表示 window_name = title + ' (hit key to exit)' cv2.namedWindow(window_name, cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(window_name, frame) # 処理結果の記録 step2 if (outpath != 'non'): if (isstream): outvideo.write(frame) else: cv2.imwrite(outpath, frame) # 何らかのキーが押されたら終了 breakflg = False while(True): key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' breakflg = True break if not mylib_gui._is_visible(window_name): # 'Close' button breakflg = True break if (isstream): break if ((breakflg == False) and isstream): # 次のフレームを読み出す ret, frame = cap.read() if ret == False: break loopflg = cap.isOpened() else: loopflg = False # 終了処理 if (isstream): cap.release() # 処理結果の記録 step3 if (outpath != 'non'): if (isstream): outvideo.release() cv2.destroyAllWindows() print('\nFPS average: {:>10.2f}'.format(fpsWithTick.get_average())) print('\n Finished.') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
(py37w) PS > python .\virtual_fitting.py --- Virtual Fitting --- 3.4.2 OpenVINO inference_engine: 2021.4.2-3974-e2a469a3450-releases/2021/4 .\virtual_fitting.py:163: DeprecationWarning: 'inputs' property of IENetwork class is deprecated. To access DataPtrs user need to use 'input_data' property of InputInfoPtr objects which can be accessed by 'input_info' property. input_blob_face = next(iter(net_face.inputs)) Virtual Fitting: Starting application... - Item File : ../../Images/parts/glass01.png - Image File : 0 - m_detect : ../../model/intel/FP32/face-detection-retail-0005.xml - m_recognition: ../../model/intel/FP32/landmarks-regression-retail-0009.xml - Device : CPU - Program Title: y - Speed flag : y - Processed out: non FPS average: 29.30 Finished.
# -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ toolkit ## Virtual Fitting Application ## ## model: face-detection-retail-0005 ## landmarks-regression-retail-0009 ## ## 2021.04.18 Masahiro Izutsu ##------------------------------------------ ## 2021.06.23 fps display ## 2021.12.24 linux/windows # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 WINDOW_WIDTH = 640 TEXT_COLOR = (255, 255, 255) # white text ITEM_PATH = '../../Images/parts/' # ~/ で指定すると home ディレクトリの展開ができない ITEM_LIST = ['glass01.png','glass02.png','glass03.png','glass04.png','glass05.png','cap01.png','cap02.png','cap03.png','cap04.png','cap05.png'] # EPL_x, EPL_y, EPR_x, EPR_y ITEM_PARAM = [ [163, 262, 367, 262],[172, 244, 362, 244],[174, 262, 362, 262],[167, 266, 365, 266],[163, 254, 379, 254], [225, 455, 339, 455],[237, 455, 356, 455],[243, 404, 365, 404],[236, 450, 346, 450],[226, 427, 329, 427] ] from os.path import expanduser MODEL_DEF_FACE = expanduser('../../model/intel/FP32/face-detection-retail-0005.xml') MODEL_DEF_MARK = expanduser('../../model/intel/FP32/landmarks-regression-retail-0009.xml') # モジュール読み込み from openvino.inference_engine import IECore from openvino.inference_engine import get_version # import処理 import sys import cv2 import numpy as np import argparse import math from pngoverlay import PNGOverlay import mylib import mylib_gui # タイトル・バージョン情報 title = ' Virtual Fitting' print(GREEN) print('--- {} ---'.format(title)) print(cv2.__version__) print("OpenVINO inference_engine:", get_version()) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-item', '--itemindex', type=int, default = 0, help = 'Item Index number (0-9)') parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = 'cam', help = 'Absolute path to image file or cam for camera stream.') parser.add_argument('-m_dt', '--m_detector', type=str, default = MODEL_DEF_FACE, help = 'Detector Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_FACE) parser.add_argument('-m_lm', '--m_landmarks', type=str, default = MODEL_DEF_MARK, help = 'Landmarks Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_MARK) parser.add_argument('-d', '--device', default='CPU', type=str, help = 'Optional. Specify a target device to infer on. CPU, GPU, FPGA, HDDL or MYRIAD is ' 'acceptable. The demo will look for a suitable plugin for the device specified. ' 'Default value is CPU') parser.add_argument('-t', '--title', metavar = 'TITLE', default = 'y', help = 'Program title flag.(y/n) Default value is \'y\'') parser.add_argument('-s', '--speed', metavar = 'SPEED', default = 'y', help = 'Speed display flag.(y/n) Default calue is \'y\'') parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT', default = 'non', help = 'Processed image file path. Default value is \'non\'') return parser # モデル基本情報の表示 def display_info(item, image, detector, landmarks, device, titleflg, speedflg, outpath): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Item File : ' + NOCOLOR, item) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'm_detect : ' + NOCOLOR, detector) print(' - ' + YELLOW + 'm_recognition: ' + NOCOLOR, landmarks) print(' - ' + YELLOW + 'Device : ' + NOCOLOR, device) print(' - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg) print(' - ' + YELLOW + 'Speed flag : ' + NOCOLOR, speedflg) print(' - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath) # 画像の種類を判別する # 戻り値: 'jeg''png'... 画像ファイル # 'None' 画像ファイル以外 (動画ファイル) # 'NotFound' ファイルが存在しない import imghdr def is_pict(filename): try: imgtype = imghdr.what(filename) except FileNotFoundError as e: imgtype = 'NotFound' return str(imgtype) # ** main関数 ** def main(): # Argument parsing and parameter setting ARGS = parse_args().parse_args() item_index = ARGS.itemindex input_stream = ARGS.image titleflg = ARGS.title speedflg = ARGS.speed if ARGS.image.lower() == "cam" or ARGS.image.lower() == "camera": input_stream = 0 isstream = True else: filetype = is_pict(input_stream) isstream = filetype == 'None' if (filetype == 'NotFound'): print(RED + "\ninput file Not found." + NOCOLOR) quit() model_detector=ARGS.m_detector model_landmarks=ARGS.m_landmarks device = ARGS.device outpath = ARGS.out #-------------------------------------------- # PNGOverlayインスタンス生成 item_path = ITEM_PATH + ITEM_LIST[item_index] item = PNGOverlay(item_path) # EyePoint情報 EPL_x = ITEM_PARAM[item_index][0] EPL_y = ITEM_PARAM[item_index][1] EPR_x = ITEM_PARAM[item_index][2] EPR_y = ITEM_PARAM[item_index][3] # EyePoint距離 EP_distance = math.sqrt((EPR_x - EPL_x) ** 2 + (EPR_y - EPL_y) ** 2) # EyePointの角度 EP_angle = math.atan2(EPR_y - EPL_y, EPR_x - EPL_x) # アイテム座標(EyePoint_left基準) item_x_EPL = item.width/2 - EPL_x item_y_EPL = item.height/2 - EPL_y #-------------------------------------------- # モデルの読み込み (顔検出) ie = IECore() net_face = ie.read_network(model = model_detector, weights = model_detector[:-4] + '.bin') exec_net_face = ie.load_network(network = net_face, device_name = device) # 入出力設定(顔検出) input_blob_face = next(iter(net_face.inputs)) out_blob_face = next(iter(net_face.outputs)) # モデルの読み込み(landmarks) net_landmarks = ie.read_network(model = model_landmarks, weights = model_landmarks[:-4] + '.bin') exec_net_landmarks = ie.load_network(network = net_landmarks, device_name=device) # 入出力設定(landmarks) input_blob_landmarks = next(iter(net_landmarks.inputs)) out_blob_landmarks = next(iter(net_landmarks.outputs)) # 情報表示 display_info(item_path, input_stream, model_detector, model_landmarks, device, titleflg, speedflg, outpath) # 入力準備 if (isstream): # カメラ cap = cv2.VideoCapture(input_stream) ret, frame = cap.read() loopflg = cap.isOpened() else: # 画像ファイル読み込み frame = cv2.imread(input_stream) if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # アスペクト比を固定してリサイズ img_h, img_w = frame.shape[:2] if (img_w > WINDOW_WIDTH): height = round(img_h * (WINDOW_WIDTH / img_w)) frame = cv2.resize(frame, dsize = (WINDOW_WIDTH, height)) loopflg = True # 1回ループ # 処理結果の記録 step1 if (outpath != 'non'): if (isstream): fps = int(cap.get(cv2.CAP_PROP_FPS)) out_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) out_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') outvideo = cv2.VideoWriter(outpath, fourcc, fps, (out_w, out_h)) # 計測値初期化 fpsWithTick = mylib.fpsWithTick() frame_count = 0 fps_total = 0 fpsWithTick.get() # fps計測開始 # メインループ while (loopflg): if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # 入力データフォーマットへ変換 img = cv2.resize(frame, (300, 300)) # HeightとWidth変更 img = img.transpose((2, 0, 1)) # HWC > CHW img = np.expand_dims(img, axis=0) # CHW > BCHW # 推論実行 out = exec_net_face.infer(inputs={input_blob_face: img}) # 出力から必要なデータのみ取り出し out = out[out_blob_face] # 不要な次元を削減 out = np.squeeze(out) # 検出されたすべての顔領域に対して1つずつ処理 for detection in out: # conf値の取得 confidence = float(detection[2]) # バウンディングボックス座標を入力画像のスケールに変換 xmin = int(detection[3] * frame.shape[1]) ymin = int(detection[4] * frame.shape[0]) xmax = int(detection[5] * frame.shape[1]) ymax = int(detection[6] * frame.shape[0]) # conf値が0.5より大きい場合のみLandmarks推論とバウンディングボックス表示 if confidence > 0.5: # 顔検出領域はカメラ範囲内に補正する。特にminは補正しないとエラーになる if xmin < 0: xmin = 0 if ymin < 0: ymin = 0 if xmax > frame.shape[1]: xmax = frame.shape[1] if ymax > frame.shape[0]: ymax = frame.shape[0] #-------------------------------------------------- # ディープラーニングLandmarks推定 #-------------------------------------------------- # 顔領域のみ切り出し img_face = frame[ ymin:ymax, xmin:xmax ] # 入力データフォーマットへ変換 img = cv2.resize(img_face, (48, 48)) # HeightとWidth変更 img = img.transpose((2, 0, 1)) # HWC > CHW img = np.expand_dims(img, axis=0) # CHW > BCHW # 推論実行 out = exec_net_landmarks.infer(inputs={input_blob_landmarks: img}) # 出力から必要なデータのみ取り出し out = out[out_blob_landmarks] # 不要な次元を削減 out = np.squeeze(out) # 目の座標を顔画像のスケールに変換し、オフセット考慮 eye_left_x = int(out[0] * img_face.shape[1]) + xmin eye_left_y = int(out[1] * img_face.shape[0]) + ymin eye_right_x = int(out[2] * img_face.shape[1]) + xmin eye_right_y = int(out[3] * img_face.shape[0]) + ymin #-------------------------------------------------- # アイテムのスケール・座標・角度対応 #-------------------------------------------------- # 目の距離 eye_distance = math.sqrt((eye_right_x - eye_left_x) ** 2 + (eye_right_y - eye_left_y) ** 2) # アイテムのスケール item_scale = eye_distance / EP_distance # 目の角度 eye_angle = math.atan2(eye_right_y - eye_left_y, eye_right_x - eye_left_x) # アイテムの回転角度 item_angle = eye_angle - EP_angle # アイテム座標(左目基準) item_x_eyeleft = item_x_EPL * item_scale item_y_eyeleft = item_y_EPL * item_scale # アイテム座標(左目基準)をitem_angle回転させた座標 x2 = item_x_eyeleft * math.cos(item_angle) - item_y_eyeleft * math.sin(item_angle) y2 = item_x_eyeleft * math.sin(item_angle) + item_y_eyeleft * math.cos(item_angle) # アイテム座標 item_x = x2 + eye_left_x item_y = y2 + eye_left_y # アイテム描画 item.resize(item_scale) # スケール item.rotate(-math.degrees(item_angle)) # 角度 item.show(frame, int(item_x), int(item_y)) # 座標 # FPSを計算する fps = fpsWithTick.get() st_fps = 'fps: {:>6.2f}'.format(fps) if (speedflg == 'y'): cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1) cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA) # タイトル描画 if (titleflg == 'y'): cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA) # 画像表示 window_name = title + ' (hit key to exit)' cv2.namedWindow(window_name, cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(window_name, frame) # 処理結果の記録 step2 if (outpath != 'non'): if (isstream): outvideo.write(frame) else: cv2.imwrite(outpath, frame) # 何らかのキーが押されたら終了 breakflg = False while(True): key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' breakflg = True break if not mylib_gui._is_visible(window_name): # 'Close' button breakflg = True break if (isstream): break if ((breakflg == False) and isstream): # 次のフレームを読み出す ret, frame = cap.read() if ret == False: break loopflg = cap.isOpened() else: loopflg = False # 終了処理 if (isstream): cap.release() # 処理結果の記録 step3 if (outpath != 'non'): if (isstream): outvideo.release() cv2.destroyAllWindows() print('\nFPS average: {:>10.2f}'.format(fpsWithTick.get_average())) print('\n Finished.') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
(py37w) PS > python .\virtual_fitting_eyepoint_tool.py ../../Images/parts/glass01.png
# -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ toolkit ## Virtual Fitting Application eyepoint tool ## ## 2021.04.19 Masahiro Izutsu ##------------------------------------------ ## python3 virtual_fitting_ipoint_tool.py [-item (0-7)] ## 2021.12.24 linux/windows # 応用プログラミングで補助ツール作成 # https://jellyware.jp/aicorex/contents/out_c09_tool.html #================================================== # 使い方 #================================================== # Toolウィンドウ上部のトラックバーをマウスでドラッグし拡大・縮小 # Toolウィンドウ内のアイテム画像をマウスでドラッグし移動 # Eye Pointウィンドウの表示数値がEyePointの座標となる #================================================== # import #================================================== import cv2 import numpy as np import argparse from pngoverlay import PNGOverlay import mylib_gui #================================================== # 設定 #================================================== # 背景画像 image_background = '../../Images/photo_m.jpg' # アイテム画像 ITEM_PATH = '../../Images/parts/' # ~/ で指定すると home ディレクトリの展開ができない ITEM_LIST = ['glass01.png','glass02.png','glass03.png','glass04.png','glass05.png','cap01.png','cap02.png','cap03.png','cap04.png','cap05.png'] # トラックバー最大値(アイテム画像の拡大縮小の分解能) track_position_max = 1000 # トラックバー最大時のアイテム画像の倍率 scale_rate = 2 # ディープラーニング推論した目の位置の座標(背景画像を変更したら要修正) eye_left_x, eye_left_y, eye_right_x, eye_right_y = 366, 162, 432, 162 # y座標は同じ位置になるように微修正済 # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-item', '--itemindex', type=int, default = 0, help = 'Item Index number.') return parser #================================================== # マウス、トラックバーのコールバック #================================================== # マウスのコールバック関数 def callback_mouse(event, x, y, flags, param): global item_x, item_y, flag_mouse_drag, cursor_offset_x, cursor_offset_y if event == cv2.EVENT_LBUTTONDOWN: if x >= item_x - item.width/2 and x <= item_x + item.width/2 and y >= item_y - item.height/2 and y <= item_y + item.height/2: flag_mouse_drag = True cursor_offset_x = item_x - x cursor_offset_y = item_y - y elif event == cv2.EVENT_MOUSEMOVE: if flag_mouse_drag == True: item_x = x + cursor_offset_x item_y = y + cursor_offset_y elif event == cv2.EVENT_LBUTTONUP: flag_mouse_drag = False # トラックバーのコールバック関数 def changeTrack(val): global track_position track_position = cv2.getTrackbarPos('scale', 'Tool') #0はエラーになるので強制的に1にする if track_position <= 0: track_position = 1 # マウスコールバック関数の登録 cv2.namedWindow('Tool', cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.setMouseCallback('Tool', callback_mouse) # トラックバーの生成とコールバック登録 track_position = int(track_position_max/2) # トラックバー初期位置 cv2.createTrackbar('scale', 'Tool', track_position, track_position_max, changeTrack) #================================================== # 準備 #================================================== # Argument parsing and parameter setting ARGS = parse_args().parse_args() item_index = ARGS.itemindex image_item = ITEM_PATH + ITEM_LIST[item_index] print(image_item) # 透過PNG画像のインスタンス生成 item = PNGOverlay(image_item) # EyePoint確認用の別ウィンドウにも透過PNG画像インスタンス生成 item2 = PNGOverlay(image_item) # マウスカーソルとアイテム画像の中心画像のオフセット cursor_offset_x = 0 cursor_offset_y = 0 # マウスドラッグ中を示すフラグ flag_mouse_drag = False # アイテム画像の中心座標 scale = track_position/track_position_max * scale_rate item.resize(scale) item_x = int(item.width/2) item_y = int(item.height/2) #================================================== # メインループ #================================================== while True: key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' break if not mylib_gui._is_visible('Tool'): # 'Close' button break # 背景画像読み込み frame = cv2.imread(image_background) # 透過PNGを描画 scale = track_position/track_position_max * scale_rate item.resize(scale) item.show(frame, item_x , item_y) cv2.imshow('Tool', frame) # EyePointの計算 item_origin_x = item_x - int(item.width/2) #frameに対するアイテム画像の原点座標x item_origin_y = item_y - int(item.height/2) #frameに対するアイテム画像の原点座標y EyePoint_left_x = int((eye_left_x - item_origin_x) / scale) EyePoint_left_y = int((eye_left_y - item_origin_y) / scale) EyePoint_right_x = int((eye_right_x - item_origin_x) / scale) EyePoint_right_y = int((eye_right_y - item_origin_y) / scale) # EyePointの表示文字列 text_left = 'L : ' + str(EyePoint_left_x) + ', ' + str(EyePoint_left_y) text_right = 'R : ' + str(EyePoint_right_x) + ', ' + str(EyePoint_right_y) # EyePoint確認用の別ウィンドウ frame2 = np.zeros((item2.height + 300, item2.width, 3), np.uint8) + 255 # 白画生成 item2.show(frame2, int(item2.width/2) , int(item2.height/2)) cv2.circle(frame2, (EyePoint_left_x, EyePoint_left_y), 10, (0, 0, 255), thickness=-1) cv2.circle(frame2, (EyePoint_right_x, EyePoint_right_y), 10, (0, 0, 255), thickness=-1) cv2.putText(frame2, text_left, (20, frame2.shape[0] - 60), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2) cv2.putText(frame2, text_right, (20, frame2.shape[0] - 20), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2) cv2.namedWindow('Eye Point', cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow('Eye Point', frame2) #================================================== # 終了処理 #================================================== cv2.destroyAllWindows()
(py37w) PS > python .\face_mask.py --- Face Mask Check --- 3.4.2 OpenVINO inference_engine: 2021.4.2-3974-e2a469a3450-releases/2021/4 Face Mask Check: Starting application... - Image File : ../../Images/mask-test.jpg - m_detect : ../../model/intel/FP32/face-detection-adas-0001.xml - m_mask : ./models/face_mask.xml - Device : CPU - Language : jp - Input Shape1 : data - Output Shape1: detection_out - Input Shape2 : data - Output Shape2: fc5 - Program Title: y - Speed flag : y - Processed out: non FPS average: 10.70 Finished.
# -*- coding: utf-8 -*- ##------------------------------------------ ## OpenVINO™ toolkit ## Face Mask Check ## ## model: face-detection-adas-0001 ## face_mask ## ## 2021.06.21 Masahiro Izutsu ##------------------------------------------ ## face_mask.py ## 2021.12.24 linux/windows # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 WINDOW_WIDTH = 640 BOX_COLOR_OK = ( 0,255, 0) BOX_COLOR_ER = ( 0, 0, 255) LABEL_BG_COLOR_OK = ( 0, 180, 0) # greyish green background for text LABEL_BG_COLOR_ER = ( 0, 0, 240) # greyish red background for text TEXT_COLOR = (255, 255, 255) # white text from os.path import expanduser MODEL_DEF_FACE = expanduser('../../model/intel/FP32/face-detection-adas-0001.xml') MODEL_DEF_MASK = expanduser('./models/face_mask.xml') INPUT_DEF = expanduser('../../Images/mask-test.jpg') # モジュール読み込み from openvino.inference_engine import IECore from openvino.inference_engine import get_version # import処理 import sys import cv2 import numpy as np import argparse import myfunction import mylib import mylib_gui import platform # タイトル・バージョン情報 title = 'Face Mask Check' print(GREEN) print('--- {} ---'.format(title)) print(cv2.__version__) print("OpenVINO inference_engine:", get_version()) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type=str, default = INPUT_DEF, help = 'Absolute path to image file or cam for camera stream.') parser.add_argument('-m_dt', '--m_detector', type=str, default = MODEL_DEF_FACE, help = 'Detector Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_FACE) parser.add_argument('-m_mk', '--m_mask', type=str, default = MODEL_DEF_MASK, help = 'Face-mask Path to an .xml file with a trained model.' 'Default value is '+MODEL_DEF_MASK) parser.add_argument('-d', '--device', default = 'CPU', type=str, help = 'Optional. Specify a target device to infer on. CPU, GPU, FPGA, HDDL or MYRIAD is ' 'acceptable. The demo will look for a suitable plugin for the device specified. ' 'Default value is CPU') parser.add_argument('-l', '--language', metavar = 'LANGUAGE', default = 'jp', help = 'Language.(jp/en) Default value is \'jp\'') parser.add_argument('-t', '--title', metavar = 'TITLE', default = 'y', help = 'Program title flag.(y/n) Default value is \'y\'') parser.add_argument('-s', '--speed', metavar = 'SPEED', default = 'y', help = 'Speed display flag.(y/n) Default calue is \'y\'') parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT', default = 'non', help = 'Processed image file path. Default value is \'non\'') return parser # モデル基本情報の表示 def display_info(image, detector, mask, device, lang, input_blob, out_blob, input_blob_mask, out_blob_mask, titleflg, speedflg, outpath): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'm_detect : ' + NOCOLOR, detector) print(' - ' + YELLOW + 'm_mask : ' + NOCOLOR, mask) print(' - ' + YELLOW + 'Device : ' + NOCOLOR, device) print(' - ' + YELLOW + 'Language : ' + NOCOLOR, lang) print(' - ' + YELLOW + 'Input Shape1 : ' + NOCOLOR, input_blob) print(' - ' + YELLOW + 'Output Shape1: ' + NOCOLOR, out_blob) print(' - ' + YELLOW + 'Input Shape2 : ' + NOCOLOR, input_blob_mask) print(' - ' + YELLOW + 'Output Shape2: ' + NOCOLOR, out_blob_mask) print(' - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg) print(' - ' + YELLOW + 'Speed flag : ' + NOCOLOR, speedflg) print(' - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath) # 画像の種類を判別する # 戻り値: 'jeg''png'... 画像ファイル # 'None' 画像ファイル以外 (動画ファイル) # 'NotFound' ファイルが存在しない import imghdr def is_pict(filename): try: imgtype = imghdr.what(filename) except FileNotFoundError as e: imgtype = 'NotFound' return str(imgtype) # ** main関数 ** def main(): # 日本語フォント指定 if platform.system()=='Windows': fontPIL = 'meiryo.ttc' # ゴシック体 else: fontPIL = 'NotoSansCJK-Bold.ttc' # ゴシック体 # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image lang = ARGS.language titleflg = ARGS.title speedflg = ARGS.speed if ARGS.image.lower() == "cam" or ARGS.image.lower() == "camera": input_stream = 0 isstream = True else: filetype = is_pict(input_stream) isstream = filetype == 'None' if (filetype == 'NotFound'): print(RED + "\ninput file Not found." + NOCOLOR) quit() model_detector = ARGS.m_detector model_mask = ARGS.m_mask device = ARGS.device outpath = ARGS.out # 判定ラベル if (lang == 'jp'): label = ('マスクをつけて!', 'マスク装着') else: label = ('NOT wearing a Mask !!!', 'earing a Mask') # モデルの読み込み (顔検出)face-detection-adas-0001 ie = IECore() net = ie.read_network(model = model_detector, weights = model_detector[:-4] + '.bin') exec_net = ie.load_network(network = net, device_name = device) # 入出力設定(顔検出) input_key = list(net.input_info.keys())[0] # 入力データ・キー名 input_blob_name = net.input_info[input_key].name output_blob_name = next(iter(net.outputs)) input_blob = net.input_info[input_blob_name].name out_blob = next(iter(net.outputs)) n, c, h, w = net.input_info[input_blob].input_data.shape # モデルの読み込み(マスク装着)face-mask net_mask = ie.read_network(model = model_mask, weights = model_mask[:-4] + '.bin') exec_net_mask = ie.load_network(network = net_mask, device_name=device) # 入出力設定(年齢/性別) input_key_mask = list(net.input_info.keys())[0] # 入力データ・キー名 input_blob_name_mask = net.input_info[input_key_mask].name output_blob_name_mask = next(iter(net_mask.outputs)) input_blob_mask = net.input_info[input_blob_name_mask].name out_blob_mask = next(iter(net_mask.outputs)) n_mask, c_mask, h_mask, w_mask = net_mask.input_info[input_blob_mask].input_data.shape # 情報表示 display_info(input_stream, model_detector, model_mask, device, lang, input_blob, out_blob, input_blob_mask, out_blob_mask, titleflg, speedflg, outpath) # 入力準備 if (isstream): # カメラ cap = cv2.VideoCapture(input_stream) ret, frame = cap.read() loopflg = cap.isOpened() else: # 画像ファイル読み込み frame = cv2.imread(input_stream) if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # アスペクト比を固定してリサイズ img_h, img_w = frame.shape[:2] if (img_w > WINDOW_WIDTH): height = round(img_h * (WINDOW_WIDTH / img_w)) frame = cv2.resize(frame, dsize = (WINDOW_WIDTH, height)) loopflg = True # 1回ループ # 処理結果の記録 step1 if (outpath != 'non'): if (isstream): fps = int(cap.get(cv2.CAP_PROP_FPS)) out_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) out_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') outvideo = cv2.VideoWriter(outpath, fourcc, fps, (out_w, out_h)) # 計測値初期化 fpsWithTick = mylib.fpsWithTick() frame_count = 0 fps_total = 0 fpsWithTick.get() # fps計測開始 # メインループ while (loopflg): if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # 入力データフォーマットへ変換 img = cv2.resize(frame, (w, h)) # サイズ変更 img = img.transpose((2, 0, 1)) # HWC > CHW img = np.expand_dims(img, axis=0) # 次元合せ # 推論実行 out = exec_net.infer(inputs={input_blob_name: img}) # 出力から必要なデータのみ取り出し out = out[output_blob_name] out = np.squeeze(out) # サイズ1の次元を全て削除 # 検出されたすべての顔領域に対して1つずつ処理 for detection in out: # conf値の取得 confidence = float(detection[2]) # バウンディングボックス座標を入力画像のスケールに変換 xmin = int(detection[3] * frame.shape[1]) ymin = int(detection[4] * frame.shape[0]) xmax = int(detection[5] * frame.shape[1]) ymax = int(detection[6] * frame.shape[0]) # conf値が0.5より大きい場合のみバウンディングボックス表示 if confidence > 0.5: # 顔検出領域はカメラ範囲内に補正する。特にminは補正しないとエラーになる if xmin < 0: xmin = 0 if ymin < 0: ymin = 0 if xmax > frame.shape[1]: xmax = frame.shape[1] if ymax > frame.shape[0]: ymax = frame.shape[0] # 顔領域のみ切り出し frame_face = frame[ ymin:ymax, xmin:xmax ] # 入力データフォーマットへ変換 img = cv2.resize(frame_face, (w_mask, h_mask)) # サイズ変更 img = img.transpose((2, 0, 1)) # HWC > CHW img = np.expand_dims(img, axis=0) # 次元合せ # 推論実行 out = exec_net_mask.infer(inputs={input_blob_name_mask: img}) # 出力から必要なデータのみ取り出し mask_out = out[output_blob_name_mask] mask_out = np.squeeze(mask_out) #不要な次元の削減 mask_flg = False if mask_out < 0.0: box_color = BOX_COLOR_ER label_bgcolor = LABEL_BG_COLOR_ER out_str = label[0] else: mask_flg = True box_color = BOX_COLOR_OK label_bgcolor = LABEL_BG_COLOR_OK out_str = label[1] label_text_color = TEXT_COLOR # バウンディングボックス(顔領域)表示 cv2.rectangle(frame, (xmin, ymin-20), (xmax, ymin), label_bgcolor, -1) # cv2.putText(frame, out_str, (xmin, ymin-4), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.6, color=cor, lineType=cv2.LINE_AA) myfunction.cv2_putText(img = frame, text = out_str, org = (xmin+2, ymin-4), fontFace = fontPIL, fontScale = 12, color = label_text_color, mode = 0) cv2.rectangle(frame, (xmin, ymin-20), (xmax, ymax), box_color, thickness = 1) # FPSを計算する fps = fpsWithTick.get() st_fps = 'fps: {:>6.2f}'.format(fps) if (speedflg == 'y'): cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1) cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA) # タイトル描画 if (titleflg == 'y'): cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA) # 画像表示 window_name = title + ' (hit key to exit)' cv2.namedWindow(window_name, cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(window_name, frame) # 処理結果の記録 step2 if (outpath != 'non'): if (isstream): outvideo.write(frame) else: cv2.imwrite(outpath, frame) # 何らかのキーが押されたら終了 breakflg = False while(True): key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' breakflg = True break if not mylib_gui._is_visible(window_name): # 'Close' button breakflg = True break if (isstream): break if ((breakflg == False) and isstream): # 次のフレームを読み出す ret, frame = cap.read() if ret == False: break loopflg = cap.isOpened() else: loopflg = False # 終了処理 if (isstream): cap.release() # 処理結果の記録 step3 if (outpath != 'non'): if (isstream): outvideo.release() cv2.destroyAllWindows() print('\nFPS average: {:>10.2f}'.format(fpsWithTick.get_average())) print('\n Finished.') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
ソースコードの場所 (すべて Linux/Windows 環境に対応)
(py37w) PS > cd \anaconda_win\workspace_py37\pyocr
(py37w) PS > python .\initialization.py Will use tool 'Tesseract (sh)' Available languages: eng, jpn, jpn_vert, osd Will use lang 'eng'
(py37w) PS \pyocr> python .\ocrtest.py Tesseract (テッセラクト)は、さまざまなオペレーティングシステム上で動作 する光学式文字認識エンジン? 。名称のTesseractとは四次元超立方体の意で ある。Apache Licenseの下でリリースされたフリーソフトウエアである!*? 。文字認識を行うライブラリと、それを用いたコマンドラインインターフェ イスを持つ。 もともとは98o年代にプロプライエタリソフトウェアとしてヒューレット・ パッカードが開発していたが、>oo5年にオープンソースとしてリリースさ れ、開発は>oo6年からGoogleが後援している5 。 >oo6年、Tesseractは当時入手可能な最も正確なオープンソースOCRエンジン の1つと見なされた3? 7 。 歴史 Tesseractエンジンは、+985年から1994年にかけて、英国ブリストルとコロラ ド州グリーリーにあるヒューレット・パッカードラボでプロプライエタリソ フトウェアとして開発されていた。+g96年にさらに変更が加えられて Windowsへ移植され、1998年にCからC ++に移行した。コードの多くはCで 記述されており、部分的にC++で記述されている。そわれ以来、すべてのコード は少なくともC++コンパイラでコンバイルするように変換されている* 。 次の 1o年間はほとんど変更がなかった。その後、oo5年にヒュユーレット・パッカ ードとネバダ大学ラスベガス校 (UNLV) によってオープンソースとしてリリ ースされた。 Tesseractの開発はsoo6年からGoogleが後援している5 。 特徴 Tesseractは、1gg5年の時点で文字認識精度が良い上位3つのOCRエンジンの うちの一つだった? 。 TesseractはLinux、Windows、Mac OS Xで利用できる が、開発リソースの制限により、WindowsとUbuntuの開発者によってのみ厳 格なテストが行われている#? 。 バージョン>ぅまでのTesseractは、単純な+列のテキストのTIFF画像のみの入力 が可能だった。初期のバージョンにはレイアウト分析が含まれていなかったた め、複数列のデキスト、画像、数式を入力すると、文字化けした出力が生成さ れた。バージョン3.oo以降、Tesseractは出力テキストのフォーマット、 hOCR ? 位置情報、ページレイアウト分析に対応した。 また、Leptonicaライ ブラリの使用により、いくつかの新しい画像形式に対応した。 Tesseractで は、テキストが等幅かプロポーショナルかを検出するごとができる? 。
(py37w) PS > python .\ocrtest1.py --- OCR Test-1 Text recognition --- OpenCV version 3.4.2 OCR Test-1 Text recognition: Starting application... - Image File : test1.png - Language : jpn - Layout : 6 --------------------------- 最新情報 ・画像認識 (Image Recognition) とは ・物体検出アルゴリズム「YOLO V5」 ・顔認証 (Pace recognition) 概要 ・敵対的生成ネットワーク(GAN) ---------------------------
# -*- coding: utf-8 -*- ##------------------------------------------ ## OCR Test-1 Text recognition ## with tesseract & PyOCR ## ## 2021.11.15 Masahiro Izutsu ##------------------------------------------ ## ocrtest1.py # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 from os.path import expanduser DEF_INPUT_FILE = expanduser('test1.png') # import処理 from PIL import Image import sys import pyocr import pyocr.builders import cv2 import argparse # タイトル・バージョン情報 title = 'OCR Test-1 Text recognition' print(GREEN) print('--- {} ---'.format(title)) print(' OpenCV version {} '.format(cv2.__version__)) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = DEF_INPUT_FILE, help = 'Absolute path to image file. Default value is \'' + DEF_INPUT_FILE + '\'') parser.add_argument('-l', '--language', metavar = 'LANGUAGE', default = 'jpn', help = 'Language. Default value is \'jpn\'') parser.add_argument('--layout', metavar = 'LAYOUT', default = 6, help = 'tesseract layout Default value is 6') return parser # モデル基本情報の表示 def display_info(image, lang, layout): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'Language : ' + NOCOLOR, lang) print(' - ' + YELLOW + 'Layout : ' + NOCOLOR, layout) # ** main関数 ** def main(): # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image lang = ARGS.language layout = int(ARGS.layout) # 情報表示 display_info(input_stream, lang, layout) # OCR tools = pyocr.get_available_tools() if len(tools) == 0: print(RED + "\nOCR tool Not found." + NOCOLOR) quit() tool = tools[0] # txt is a Python string txt = tool.image_to_string(Image.open(input_stream), lang=lang, builder=pyocr.builders.TextBuilder(tesseract_layout=layout)) # 取得テキスト print('\n---------------------------') print(txt) print('---------------------------\n') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
# -*- coding: utf-8 -*- ##------------------------------------------ ## OCR Test-2 Text list of box objects ## with tesseract & PyOCR ## ## 2021.11.15 Masahiro Izutsu ##------------------------------------------ ## ocrtest2.py # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 from os.path import expanduser DEF_INPUT_FILE = expanduser('test1.png') # import処理 from PIL import Image import sys import pyocr import pyocr.builders import cv2 import argparse # タイトル・バージョン情報 title = 'OCR Test-2 list of box objects' print(GREEN) print('--- {} ---'.format(title)) print(' OpenCV version {} '.format(cv2.__version__)) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = DEF_INPUT_FILE, help = 'Absolute path to image file. Default value is \'' + DEF_INPUT_FILE + '\'') parser.add_argument('-l', '--language', metavar = 'LANGUAGE', default = 'jpn', help = 'Language. Default value is \'jpn\'') parser.add_argument('--layout', metavar = 'LAYOUT', default = 6, help = 'tesseract layout Default value is 6') return parser # モデル基本情報の表示 def display_info(image, lang, layout): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'Language : ' + NOCOLOR, lang) print(' - ' + YELLOW + 'Layout : ' + NOCOLOR, layout) # ** main関数 ** def main(): # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image lang = ARGS.language layout = int(ARGS.layout) # 情報表示 display_info(input_stream, lang, layout) # OCR tools = pyocr.get_available_tools() if len(tools) == 0: print(RED + "\nOCR tool Not found." + NOCOLOR) quit() tool = tools[0] # list of box objects. For each box object: # box.content is the word in the box # box.position is its position on the page (in pixels) # # Beware that some OCR tools (Tesseract for instance) # may return empty boxes # (訳) # ボックスオブジェクトのリスト ボックスオブジェクトごとに: # # box.contentはボックス内の単語 # box.positionは、ページ上の位置(ピクセル単位 # 一部のOCRツール(Tesseractなど)に注意 # 空のボックスを返す場合がある word_boxes = tool.image_to_string(Image.open(input_stream), lang=lang, builder=pyocr.builders.WordBoxBuilder(tesseract_layout=layout)) # 取得テキスト print('\n---------------------------') for box in word_boxes: print(box.content) print(box.position) print('---------------------------\n') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
# -*- coding: utf-8 -*- ##------------------------------------------ ## OCR Test-3 Text list of line objects ## with tesseract & PyOCR ## ## 2021.11.15 Masahiro Izutsu ##------------------------------------------ ## ocrtest3.py # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 from os.path import expanduser DEF_INPUT_FILE = expanduser('test1.png') # import処理 from PIL import Image import sys import pyocr import pyocr.builders import cv2 import argparse # タイトル・バージョン情報 title = 'OCR Test-3 list of line objects' print(GREEN) print('--- {} ---'.format(title)) print(' OpenCV version {} '.format(cv2.__version__)) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = DEF_INPUT_FILE, help = 'Absolute path to image file. Default value is \'' + DEF_INPUT_FILE + '\'') parser.add_argument('-l', '--language', metavar = 'LANGUAGE', default = 'jpn', help = 'Language. Default value is \'jpn\'') parser.add_argument('-c', '--confidence', metavar = 'CONFIDENCE', default = 70, help = 'Confidence Level Default value is 70') parser.add_argument('--layout', metavar = 'LAYOUT', default = 6, help = 'tesseract layout Default value is 6') return parser # モデル基本情報の表示 def display_info(image, lang, conf, layout): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'Language : ' + NOCOLOR, lang) print(' - ' + YELLOW + 'Confidence : ' + NOCOLOR, conf) print(' - ' + YELLOW + 'Layout : ' + NOCOLOR, layout) # ** main関数 ** def main(): # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image lang = ARGS.language conf = int(ARGS.confidence) layout = int(ARGS.layout) # 情報表示 display_info(input_stream, lang, conf, layout) # OCR tools = pyocr.get_available_tools() if len(tools) == 0: print(RED + "\nOCR tool Not found." + NOCOLOR) quit() tool = tools[0] # list of line objects. For each line object: # line.word_boxes is a list of word boxes (the individual words in the line) # line.content is the whole text of the line # line.position is the position of the whole line on the page (in pixels) # # Each word box object has an attribute 'confidence' giving the confidence # score provided by the OCR tool. Confidence score depends entirely on # the OCR tool. Only supported with Tesseract and Libtesseract (always 0 # with Cuneiform). # # Beware that some OCR tools (Tesseract for instance) may return boxes # with an empty content. # (訳) # ラインオブジェクトのリスト 各ラインオブジェクトの場合: # line.word_boxesは、単語ボックス(行内の個々の単語)のリスト # line.contentは行の全文 # line.positionは、ページ上の行全体の位置(ピクセル単位) # # 各ワードボックスオブジェクトには、信頼性を与える属性「confidence」がある # OCRツールによって提供されるスコア。 # # 一部のOCRツール(Tesseractなど)が空のボックスを返す場合があることに注意 # OpenCV でイメージを読む frame = cv2.imread(input_stream) if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # PILのイメージにする frame1 = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR) img = Image.fromarray(frame1) line_and_word_boxes = tool.image_to_string(img, lang=lang, builder=pyocr.builders.LineBoxBuilder(tesseract_layout=layout)) # 取得データ print('\n---------------------------') for lw_box in line_and_word_boxes: content = lw_box.content position = lw_box.position box = [] txt = [] n = 0 for lw_box in lw_box.word_boxes: txt.append(lw_box.content) box.append(lw_box.position) n = n+1 confidence = lw_box.confidence if confidence > conf: print('contents: ', content) print('position: ', position) print('confidence: ', confidence) for nm in range(n): print(' {: <8}'.format(txt[nm]), ' ', box[nm]) print('\n') print('---------------------------\n') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
# -*- coding: utf-8 -*- ##------------------------------------------ ## OCR Test-4 Text list of line objects Display ## with tesseract & PyOCR ## ## 2021.11.15 Masahiro Izutsu ##------------------------------------------ ## ocrtest4.py ## 2022.01.04 linux/windows # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 LINE_WORD_BOX_COLOR = (0, 0, 240) WORD_BOX_COLOR = (255, 0, 0) CONTENTS_COLOR = (0, 128, 0) from os.path import expanduser DEF_INPUT_FILE = expanduser('test1.png') # import処理 from PIL import Image import sys import pyocr import pyocr.builders import cv2 import argparse import myfunction import numpy as np import mylib_gui import platform # タイトル・バージョン情報 title = 'OCR Test-4 list of line objects Display' print(GREEN) print('--- {} ---'.format(title)) print(' OpenCV version {} '.format(cv2.__version__)) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = DEF_INPUT_FILE, help = 'Absolute path to image file. Default value is \'' + DEF_INPUT_FILE + '\'') parser.add_argument('-l', '--language', metavar = 'LANGUAGE', default = 'jpn', help = 'Language. Default value is \'jpn\'') parser.add_argument('-c', '--confidence', metavar = 'CONFIDENCE', default = 40, help = 'Confidence Level Default value is 40') parser.add_argument('-p', '--prosess', metavar = 'PROCESS', default = 'n', help = 'Preprocessing flag.(y/n) Default value is \'n\'') parser.add_argument('-d', '--linedel', metavar = 'LINEDEL', default = 'n', help = 'Line delete flag.(y/b/n) Default value is \'n\'') parser.add_argument('--layout', metavar = 'LAYOUT', default = 6, help = 'Tesseract layout Default value is 6') parser.add_argument('--maxsize', metavar = 'MAXSIZE', default = 0, help = 'Image max size (free=0). Default value is 0') parser.add_argument('--log', metavar = 'LOG', default = 'n', help = 'Log output flag.(y/n) Default value is \'n\'') parser.add_argument('-t', '--title', metavar = 'TITLE', default = 'y', help = 'Program title flag.(y/n) Default value is \'y\'') parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT', default = 'non', help = 'Processed image file path. Default value is \'non\'') return parser # モデル基本情報の表示 def display_info(image, lang, prosess, linedel, conf, layout, maxsize, log, titleflg, outpath): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'Language : ' + NOCOLOR, lang) print(' - ' + YELLOW + 'Preprocessing: ' + NOCOLOR, prosess) print(' - ' + YELLOW + 'Line delete : ' + NOCOLOR, linedel) print(' - ' + YELLOW + 'Confidence : ' + NOCOLOR, conf) print(' - ' + YELLOW + 'Layout : ' + NOCOLOR, layout) print(' - ' + YELLOW + 'Max size : ' + NOCOLOR, maxsize) print(' - ' + YELLOW + 'Log frag : ' + NOCOLOR, log) print(' - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg) print(' - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath) # 画像の前処理 def img_preproces(img): # グレイスケール演算 im_gray = 0.299 * img[:,:,2] + 0.587 * img[:,:,1] + 0.114 * img[:,:,0] im_gray8 = np.uint8(im_gray) # 大津アルゴリズムでは thresh, maxvalは無視されてしきい値は自動で設定される ret, im_gray8 = cv2.threshold(im_gray8, thresh=0, maxval=255, type=cv2.THRESH_BINARY + cv2.THRESH_OTSU) # すべてのチャンネル img[:,:,0] = im_gray8 img[:,:,1] = im_gray8 img[:,:,2] = im_gray8 return img # 罫線消去 def delete_line(img): # 自動パラメータの計算 h, w = img.shape[:2] thr = 100 lln = int(w/18) if lln < 44: lln = 44 gap = int(w/1000) + 4 print('\nThreshhold={}, MinLineLength={}, MaxLineGap={}, width={}, height={}'.format(thr, lln, gap, w, h)) imgw = img.copy() # グレースケール gray = cv2.cvtColor(imgw, cv2.COLOR_BGR2GRAY) # 2値化 ret, gray = cv2.threshold(gray, thresh=0, maxval=255, type=cv2.THRESH_BINARY + cv2.THRESH_OTSU) ## 反転 ネガポジ変換 gray = cv2.bitwise_not(gray) lines = cv2.HoughLinesP(gray, rho=1, theta=np.pi/360, threshold=thr, minLineLength=lln, maxLineGap=gap) if lines is not None: for line in lines: x1, y1, x2, y2 = line[0] # 線を消す(白で線を引く) imgw = cv2.line(imgw, (x1,y1), (x2,y2), (255,255,255), 3) return imgw # ** main関数 ** def main(): # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image lang = ARGS.language prosess = ARGS.prosess linedel = ARGS.linedel conf = int(ARGS.confidence) layout = int(ARGS.layout) maxsize = int(ARGS.maxsize) logflg = ARGS.log titleflg = ARGS.title outpath = ARGS.out # 情報表示 display_info(input_stream, lang, prosess, linedel, conf, layout, maxsize, logflg, titleflg, outpath) # OCR tools = pyocr.get_available_tools() if len(tools) == 0: print(RED + "\nOCR tool Not found." + NOCOLOR) quit() tool = tools[0] # list of line objects. For each line object: # line.word_boxes is a list of word boxes (the individual words in the line) # line.content is the whole text of the line # line.position is the position of the whole line on the page (in pixels) # # Each word box object has an attribute 'confidence' giving the confidence # score provided by the OCR tool. Confidence score depends entirely on # the OCR tool. Only supported with Tesseract and Libtesseract (always 0 # with Cuneiform). # # Beware that some OCR tools (Tesseract for instance) may return boxes # with an empty content. # (訳) # ラインオブジェクトのリスト 各ラインオブジェクトの場合: # line.word_boxesは、単語ボックス(行内の個々の単語)のリスト # line.contentは行の全文 # line.positionは、ページ上の行全体の位置(ピクセル単位) # # 各ワードボックスオブジェクトには、信頼性を与える属性「confidence」がある # OCRツールによって提供されるスコア。 # # 一部のOCRツール(Tesseractなど)が空のボックスを返す場合があることに注意 # OpenCV でイメージを読む frame = cv2.imread(input_stream) if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() if maxsize > 300: # アスペクト比を固定してリサイズ img_h, img_w = frame.shape[:2] if (img_w > img_h): if (img_w > maxsize): height = round(img_h * (maxsize / img_w)) frame = cv2.resize(frame, dsize = (maxsize, height)) else: if (img_h > maxsize): width = round(img_w * (maxsize / img_h)) frame = cv2.resize(frame, dsize = (width, maxsize)) # メッセージ作成 h, w = frame.shape[:2] st_pram = 'pros={}, linedel={}, conf={}, layout={} width={}, height={}'.format(prosess, linedel, conf, layout, w, h) # fontスケール(仮設定) img_h, img_w = frame.shape[:2] font_scale = 20 if img_w > 2000: font_scale = 40 # 画像の前処理 if (prosess == 'y'): # モノクロ・2値化 フォアグラウンド処理 frame = img_preproces(frame) if (linedel == 'y'): # 罫線除去 フォアグラウンド処理 frame = delete_line(frame) frame_pl = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR) elif (linedel == 'b'): # 罫線除去 バックグラウンド処理 frame_pl = delete_line(frame) frame_pl = cv2.cvtColor(frame_pl, cv2.COLOR_RGB2BGR) else: # 罫線除去なし frame_pl = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR) # PILのイメージにする img = Image.fromarray(frame_pl) # 文字認識処理 line_and_word_boxes = tool.image_to_string(img, lang=lang, builder=pyocr.builders.LineBoxBuilder(tesseract_layout=layout)) # 取得データ print('\n---------------------------') # 日本語フォント指定 if platform.system()=='Windows': fontPIL = 'meiryo.ttc' # ゴシック体 else: fontPIL = 'NotoSansCJK-Bold.ttc' # ゴシック体 for lw_box in line_and_word_boxes: content = lw_box.content position = lw_box.position box = [] txt = [] n = 0 for lw_box in lw_box.word_boxes: txt.append(lw_box.content) box.append(lw_box.position) n = n+1 confidence = lw_box.confidence if confidence > conf and len(content) > 0: xmin = position[0][0] ymin = position[0][1] xmax = position[1][0] ymax = position[1][1] cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=LINE_WORD_BOX_COLOR, thickness=3) for nm in range(n): cv2.rectangle(frame, (box[nm][0][0], box[nm][0][1]), (box[nm][1][0], box[nm][1][1]), color=WORD_BOX_COLOR, thickness=1) st_score = '#Score{:3}: '.format(confidence) + content myfunction.cv2_putText(img = frame, text = st_score, org = (xmin, ymin - 4), fontFace = fontPIL, fontScale = font_scale, color = CONTENTS_COLOR, mode = 0) print('\ncontents: ', content) print('position: ', position) print('confidence: ', confidence) if (logflg == 'y'): for nm in range(n): print(' {: <8}'.format(txt[nm]), ' ', box[nm]) print('---------------------------\n') # タイトル描画 if (titleflg == 'y'): cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA) cv2.putText(frame, st_pram, (50, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.5, color=(0, 0, 0), lineType=cv2.LINE_AA) # 画像表示 cv2.namedWindow(title, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(title, frame) # 処理結果の記録(静止画) if (outpath != 'non'): cv2.imwrite(outpath, frame) # 何らかのキーが押されたら終了 while(True): key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' break if not mylib_gui._is_visible(title): # 'Close' button break cv2.destroyAllWindows() # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
# -*- coding: utf-8 -*- ##------------------------------------------ ## OCR on python Ver0.01 ## with tesseract & PyOCR ## ## 2021.11.14 Masahiro Izutsu ##------------------------------------------ ## tryocr.py # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 WINDOW_MAX = 1280 # 最大表示サイズ LINE_WORD_BOX_COLOR = (0, 0, 240) WORD_BOX_COLOR = (255, 0, 0) CONTENTS_COLOR = (0, 128, 0) from os.path import expanduser DEF_INPUT_FILE = expanduser('test1.png') # import処理 from PIL import Image import sys import pyocr import pyocr.builders import cv2 import argparse import mylib import myfunction import numpy as np import mylib_gui import platform # タイトル・バージョン情報 title = 'OCR on python Ver0.01' print(GREEN) print('--- {} ---'.format(title)) print(' OpenCV version {} '.format(cv2.__version__)) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = DEF_INPUT_FILE, help = 'Absolute path to image file or cam for camera stream. Default value is \'' + DEF_INPUT_FILE + '\'') parser.add_argument('-l', '--language', metavar = 'LANGUAGE', default = 'jpn', help = 'Language. Default value is \'jpn\'') parser.add_argument('-c', '--confidence', metavar = 'CONFIDENCE', default = 40, help = 'Confidence Level Default value is 40') parser.add_argument('-p', '--prosess', metavar = 'PROCESS', default = 'n', help = 'Preprocessing flag.(y/n) Default value is \'n\'') parser.add_argument('-d', '--linedel', metavar = 'LINEDEL', default = 'n', help = 'Line delete flag.(y/b/n) Default value is \'n\'') parser.add_argument('--layout', metavar = 'LAYOUT', default = 6, help = 'Tesseract layout Default value is 6') parser.add_argument('--maxsize', metavar = 'MAXSIZE', default = 0, help = 'Image max size (free=0). Default value is 0') parser.add_argument('--log', metavar = 'LOG', default = 'n', help = 'Log output flag.(y/s/n) Default value is \'n\'') parser.add_argument('-t', '--title', metavar = 'TITLE', default = 'y', help = 'Program title flag.(y/n) Default value is \'y\'') parser.add_argument('-s', '--speed', metavar = 'SPEED', default = 'y', help = 'Speed display flag.(y/n) Default value is \'y\'') parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT', default = 'non', help = 'Processed image file path. Default value is \'non\'') return parser # モデル基本情報の表示 def display_info(image, lang, prosess, linedel, conf, layout, maxsize, log, titleflg, speedflg, outpath): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'Language : ' + NOCOLOR, lang) print(' - ' + YELLOW + 'Preprocessing: ' + NOCOLOR, prosess) print(' - ' + YELLOW + 'Line delete : ' + NOCOLOR, linedel) print(' - ' + YELLOW + 'Confidence : ' + NOCOLOR, conf) print(' - ' + YELLOW + 'Layout : ' + NOCOLOR, layout) print(' - ' + YELLOW + 'Max size : ' + NOCOLOR, maxsize) print(' - ' + YELLOW + 'Log frag : ' + NOCOLOR, log) print(' - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg) print(' - ' + YELLOW + 'Speed flag : ' + NOCOLOR, speedflg) print(' - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath) # 画像の種類を判別する # 戻り値: 'jeg''png'... 画像ファイル # 'None' 画像ファイル以外 (動画ファイル) # 'NotFound' ファイルが存在しない import imghdr def is_pict(filename): try: imgtype = imghdr.what(filename) except FileNotFoundError as e: imgtype = 'NotFound' return str(imgtype) # 画像の前処理 def img_preproces(img): # グレイスケール演算 im_gray = 0.299 * img[:,:,2] + 0.587 * img[:,:,1] + 0.114 * img[:,:,0] im_gray8 = np.uint8(im_gray) # 大津アルゴリズムでは thresh, maxvalは無視されてしきい値は自動で設定される ret, im_gray8 = cv2.threshold(im_gray8, thresh=0, maxval=255, type=cv2.THRESH_BINARY + cv2.THRESH_OTSU) # すべてのチャンネル img[:,:,0] = im_gray8 img[:,:,1] = im_gray8 img[:,:,2] = im_gray8 return img # 罫線消去 def delete_line(img): # 自動パラメータの計算 h, w = img.shape[:2] thr = 100 lln = int(w/18) if lln < 44: lln = 44 gap = int(w/1000) + 4 print('\nThreshhold={}, MinLineLength={}, MaxLineGap={}, width={}, height={}'.format(thr, lln, gap, w, h)) imgw = img.copy() # グレースケール gray = cv2.cvtColor(imgw, cv2.COLOR_BGR2GRAY) # 2値化 ret, gray = cv2.threshold(gray, thresh=0, maxval=255, type=cv2.THRESH_BINARY + cv2.THRESH_OTSU) ## 反転 ネガポジ変換 gray = cv2.bitwise_not(gray) lines = cv2.HoughLinesP(gray, rho=1, theta=np.pi/360, threshold=thr, minLineLength=lln, maxLineGap=gap) if lines is not None: for line in lines: x1, y1, x2, y2 = line[0] # 線を消す(白で線を引く) imgw = cv2.line(imgw, (x1,y1), (x2,y2), (255,255,255), 3) return imgw # ** main関数 ** def main(): # 日本語フォント指定 if platform.system()=='Windows': fontPIL = 'meiryo.ttc' # ゴシック体 else: fontPIL = 'NotoSansCJK-Bold.ttc' # ゴシック体 # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image lang = ARGS.language titleflg = ARGS.title speedflg = ARGS.speed linedel = ARGS.linedel conf = int(ARGS.confidence) layout = int(ARGS.layout) maxsize = int(ARGS.maxsize) logflg = ARGS.log prosess = ARGS.prosess if ARGS.image.lower() == "cam" or ARGS.image.lower() == "camera": input_stream = 0 isstream = True else: filetype = is_pict(input_stream) isstream = filetype == 'None' if (filetype == 'NotFound'): print(RED + "\ninput file Not found." + NOCOLOR) quit() outpath = ARGS.out # 情報表示 display_info(input_stream, lang, prosess, linedel, conf, layout, maxsize, logflg, titleflg, speedflg, outpath) # OCR tools = pyocr.get_available_tools() if len(tools) == 0: print(RED + "\nOCR tool Not found." + NOCOLOR) quit() tool = tools[0] # 入力準備 if (isstream): # カメラ cap = cv2.VideoCapture(input_stream) ret, frame = cap.read() if ret == False: print(RED + "\nUnable to video camera." + NOCOLOR) quit() loopflg = cap.isOpened() else: # 画像ファイル読み込み frame = cv2.imread(input_stream) if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() if maxsize > 300: # アスペクト比を固定してリサイズ img_h, img_w = frame.shape[:2] if (img_w > img_h): if (img_w > maxsize): height = round(img_h * (maxsize / img_w)) frame = cv2.resize(frame, dsize = (maxsize, height)) else: if (img_h > maxsize): width = round(img_w * (maxsize / img_h)) frame = cv2.resize(frame, dsize = (width, maxsize)) loopflg = True # 1回ループ # メッセージ作成 h, w = frame.shape[:2] st_pram = 'pros={}, linedel={}, conf={}, layout={} width={}, height={}'.format(prosess, linedel, conf, layout, w, h) # 処理結果の記録 step1 if (outpath != 'non'): if (isstream): fps = int(cap.get(cv2.CAP_PROP_FPS)) out_w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) out_h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') outvideo = cv2.VideoWriter(outpath, fourcc, fps, (out_w, out_h)) # 計測値初期化 fpsWithTick = mylib.fpsWithTick() frame_count = 0 fps_total = 0 fpsWithTick.get() # fps計測開始 # メインループ while (loopflg): if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # 画像の前処理 if (prosess == 'y'): # モノクロ・2値化 フォアグラウンド処理 frame = img_preproces(frame) if (linedel == 'y'): # 罫線除去 フォアグラウンド処理 frame = delete_line(frame) frame_pl = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR) elif (linedel == 'b'): # 罫線除去 バックグラウンド処理 frame_pl = delete_line(frame) frame_pl = cv2.cvtColor(frame_pl, cv2.COLOR_RGB2BGR) else: # 罫線除去なし frame_pl = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR) # PILのイメージにする img = Image.fromarray(frame_pl) # 文字認識処理 line_and_word_boxes = tool.image_to_string(img, lang=lang, builder=pyocr.builders.LineBoxBuilder(tesseract_layout=layout)) for lw_box in line_and_word_boxes: content = lw_box.content position = lw_box.position box = [] txt = [] n = 0 for lw_box in lw_box.word_boxes: txt.append(lw_box.content) box.append(lw_box.position) n = n+1 confidence = lw_box.confidence if confidence > conf and len(content) > 0: xmin = position[0][0] ymin = position[0][1] xmax = position[1][0] ymax = position[1][1] cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=LINE_WORD_BOX_COLOR, thickness=3) for nm in range(n): cv2.rectangle(frame, (box[nm][0][0], box[nm][0][1]), (box[nm][1][0], box[nm][1][1]), color=WORD_BOX_COLOR, thickness=1) st_score = '#Score{:3}: '.format(confidence) + content myfunction.cv2_putText(img = frame, text = st_score, org = (xmin, ymin - 4), fontFace = fontPIL, fontScale = 20, color = CONTENTS_COLOR, mode = 0) if (logflg == 'y') or (logflg == 's'): print('\ncontents: ', content) print('position: ', position) print('confidence: ', confidence) if (logflg == 's'): for nm in range(n): print(' {: <8}'.format(txt[nm]), ' ', box[nm]) # FPSを計算する fps = fpsWithTick.get() st_fps = 'fps: {:>6.2f}'.format(fps) if (speedflg == 'y'): cv2.rectangle(frame, (10, 38), (95, 55), (90, 90, 90), -1) cv2.putText(frame, st_fps, (15, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.4, color=(255, 255, 255), lineType=cv2.LINE_AA) # タイトル描画 if (titleflg == 'y'): cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA) cv2.putText(frame, st_pram, (100, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.5, color=(0, 0, 0), lineType=cv2.LINE_AA) # 画像表示 window_name = title + " (hit 'q' or 'esc' key to exit)" cv2.namedWindow(window_name, cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(window_name, frame) # 処理結果の記録 step2 if (outpath != 'non'): if (isstream): outvideo.write(frame) else: cv2.imwrite(outpath, frame) # 何らかのキーが押されたら終了 breakflg = False while(True): key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' breakflg = True break if (isstream): break if not mylib_gui._is_visible(window_name): # 'Close' button break if ((breakflg == False) and isstream): # 次のフレームを読み出す ret, frame = cap.read() if ret == False: break loopflg = cap.isOpened() else: loopflg = False # 終了処理 if (isstream): cap.release() # 処理結果の記録 step3 if (outpath != 'non'): if (isstream): outvideo.release() cv2.destroyAllWindows() print('\nFPS average: {:>10.2f}'.format(fpsWithTick.get_average())) print('\n Finished.') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
ソースコードの場所 (すべて Linux/Windows 環境に対応)
(py37w) PS > cd \anaconda_win\workspace_py37\tryocr
# -*- coding: utf-8 -*- ##------------------------------------------ ## TryOCR Test Programe Step-1 ## with tesseract & PyOCR & cvui ## ## 2021.12.19 Masahiro Izutsu ##------------------------------------------ ## tryocr_step1.py ## 2022.01.04 linux/windows # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 LINE_WORD_BOX_COLOR = (0, 0, 240) WORD_BOX_COLOR = (255, 0, 0) CONTENTS_COLOR = (0, 128, 0) from os.path import expanduser DEF_INPUT_FILE = expanduser('images/sample0.png') # import処理 from PIL import Image import sys import pyocr import pyocr.builders import cv2 import cvui import argparse import myfunction import numpy as np import mylib_gui import mylib_pros import platform # タイトル・バージョン情報 title = 'TryOCR Test Program Step-1' print(GREEN) print('--- {} ---'.format(title)) print(' OpenCV version {} '.format(cv2.__version__)) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = DEF_INPUT_FILE, help = 'Absolute path to image file. Default value is \'' + DEF_INPUT_FILE + '\'') parser.add_argument('-l', '--language', metavar = 'LANGUAGE', default = 'jpn', help = 'Language. Default value is \'jpn\'') parser.add_argument('--layout', metavar = 'LAYOUT', default = 6, help = 'Tesseract layout Default value is 6') parser.add_argument('--maxsize', metavar = 'MAXSIZE', default = 1000, help = 'Image max size (free=0). Default value is 1000') parser.add_argument('-t', '--title', metavar = 'TITLE', default = 'y', help = 'Program title flag.(y/n) Default value is \'y\'') return parser # モデル基本情報の表示 def display_info(image, lang, layout, maxsize, titleflg): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'Language : ' + NOCOLOR, lang) print(' - ' + YELLOW + 'Layout : ' + NOCOLOR, layout) print(' - ' + YELLOW + 'Max size : ' + NOCOLOR, maxsize) print(' - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg) def frame_resize(image, maxsize): if maxsize > 300: # アスペクト比を固定してリサイズ img_h, img_w = image.shape[:2] if (img_w > img_h): if (img_w > maxsize): height = round(img_h * (maxsize / img_w)) image = cv2.resize(image, dsize = (maxsize, height)) else: if (img_h > maxsize): width = round(img_w * (maxsize / img_h)) image = cv2.resize(image, dsize = (width, maxsize)) return image WINDOW_NAME = title ROI_WINDOW = 'Cut-out area' ROI_POPUP = 'OCR detection result Text' # ** main関数 ** def main(): # 日本語フォント指定 if platform.system()=='Windows': fontPIL = 'meiryo.ttc' # ゴシック体 else: fontPIL = 'NotoSansCJK-Bold.ttc' # ゴシック体 # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image lang = ARGS.language layout = int(ARGS.layout) maxsize = int(ARGS.maxsize) titleflg = ARGS.title # 情報表示 display_info(input_stream, lang, layout, maxsize, titleflg) # OCR tools = pyocr.get_available_tools() if len(tools) == 0: print(RED + "\nOCR tool Not found." + NOCOLOR) quit() tool = tools[0] # OpenCV でイメージを読む lena_frame = cv2.imread(input_stream) if lena_frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() lena_frame = mylib_pros.frame_resize(lena_frame, maxsize) frame = np.zeros(lena_frame.shape, np.uint8) popup_frame = np.zeros((120, 500, 3), np.uint8) anchor = cvui.Point() roi = cvui.Rect(0, 0, 0, 0) working = False frame_h, frame_w = frame.shape[:2] outf = False print('\n -----------') # Init cvui and tell it to create a OpenCV window, i.e. cv.namedWindow(WINDOW_NAME). cv2.namedWindow(WINDOW_NAME, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cvui.init(WINDOW_NAME) while (True): # Fill the frame with Lena's image frame[:] = lena_frame[:] # Show the coordinates of the mouse pointer on the screen cvui.text(frame, 10, 10, 'Click mouse left-button and drag the pointer around to select a cut area.') # マウス・イベント if cvui.mouse(cvui.LEFT_BUTTON, cvui.DOWN): # マウスポインタにアンカーを配置 anchor.x = cvui.mouse().x anchor.y = cvui.mouse().y # 作業中の通知(作業中はウインドウの更新しない) working = True if cvui.mouse(cvui.LEFT_BUTTON, cvui.IS_DOWN): # 領域を設定 width = cvui.mouse().x - anchor.x height = cvui.mouse().y - anchor.y roi.x = anchor.x + width if width < 0 else anchor.x roi.y = anchor.y + height if height < 0 else anchor.y roi.width = abs(width) roi.height = abs(height) # 座標とサイズを表示 cvui.printf(frame, roi.x + 5, roi.y + 5, 0.3, 0xff0000, '(%d,%d)', roi.x, roi.y) cvui.printf(frame, cvui.mouse().x + 5, cvui.mouse().y + 5, 0.3, 0xff0000, 'w:%d, h:%d', roi.width, roi.height) if cvui.mouse(cvui.UP): # 領域指定作業の終了 working = False outf = True # 領域内を確認 lenaRows, lenaCols, lenaChannels = lena_frame.shape roi.x = 0 if roi.x < 0 else roi.x roi.y = 0 if roi.y < 0 else roi.y roi.width = roi.width + lena_frame.cols - (roi.x + roi.width) if roi.x + roi.width > lenaCols else roi.width roi.height = roi.height + lena_frame.rows - (roi.y + roi.height) if roi.y + roi.height > lenaRows else roi.height # 設定領域をレンダリング cvui.rect(frame, roi.x, roi.y, roi.width, roi.height, 0xff0000) # タイトル描画 if (titleflg == 'y'): cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA) # ウインドウの更新 cvui.update() # 画面の表示 cv2.imshow(WINDOW_NAME, frame) cv2.moveWindow(WINDOW_NAME, 80, 0) # 設定領域の表示 if roi.area() > 0 and working == False: lenaRoi = lena_frame[roi.y : roi.y + roi.height, roi.x : roi.x + roi.width] cv2.namedWindow(ROI_WINDOW, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(ROI_WINDOW, lenaRoi) cv2.moveWindow(ROI_WINDOW, frame_w + 100, 0) lenaRoi_h, lenaRoi_w = lenaRoi.shape[:2] # 切り出した領域を OCR # PILのイメージにする lenaRoi1 = cv2.cvtColor(lenaRoi, cv2.COLOR_RGB2BGR) imgRoi = Image.fromarray(lenaRoi1) # txt is a Python string txt = tool.image_to_string(imgRoi, lang=lang, builder=pyocr.builders.TextBuilder(tesseract_layout=layout)) # テキストの描画 popup_frame[:,:,:] = 0 cv2.rectangle(popup_frame, (0, 88), (500, 105), (255,0,0), -1) myfunction.cv2_putText(img = popup_frame, text = txt, org = (15, 104), fontFace = fontPIL, fontScale = 12, color = (255,255,255), mode = 0) cv2.namedWindow(ROI_POPUP, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(ROI_POPUP, popup_frame) cv2.moveWindow(ROI_POPUP, frame_w + 100, lenaRoi_h + 100) if outf: print(' ', txt) outf = False key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' break if not mylib_gui._is_visible(title): # 'Close' button break cv2.destroyAllWindows() print(' -----------\n Finished.') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
# -*- coding: utf-8 -*- ##------------------------------------------ ## TryOCR Test Programe Step-2 ## with tesseract & PyOCR & cvui ## ## 2021.12.19 Masahiro Izutsu ##------------------------------------------ ## tryocr_step2.py ## 2022.01.04 linux/windows # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 LINE_WORD_BOX_COLOR = (0, 0, 240) WORD_BOX_COLOR = (255, 0, 0) CONTENTS_COLOR = (0, 128, 0) from os.path import expanduser DEF_INPUT_FILE = expanduser('images/sample0.png') # import処理 from PIL import Image import sys import pyocr import pyocr.builders import cv2 import cvui import argparse import myfunction import numpy as np import mylib_gui import mylib_frame import platform # タイトル・バージョン情報 title = 'TryOCR Test Program Step-2' print(GREEN) print('--- {} ---'.format(title)) print(' OpenCV version {} '.format(cv2.__version__)) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = DEF_INPUT_FILE, help = 'Absolute path to image file. Default value is \'' + DEF_INPUT_FILE + '\'') parser.add_argument('-l', '--language', metavar = 'LANGUAGE', default = 'jpn', help = 'Language. Default value is \'jpn\'') parser.add_argument('--layout', metavar = 'LAYOUT', default = 6, help = 'Tesseract layout Default value is 6') parser.add_argument('--maxsize', metavar = 'MAXSIZE', default = 1000, help = 'Image max size (free=0). Default value is 1000') parser.add_argument('-t', '--title', metavar = 'TITLE', default = 'y', help = 'Program title flag.(y/n) Default value is \'y\'') return parser # モデル基本情報の表示 def display_info(image, lang, layout, maxsize, titleflg): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'Language : ' + NOCOLOR, lang) print(' - ' + YELLOW + 'Layout : ' + NOCOLOR, layout) print(' - ' + YELLOW + 'Max size : ' + NOCOLOR, maxsize) print(' - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg) def frame_resize(image, maxsize): if maxsize > 300: # アスペクト比を固定してリサイズ img_h, img_w = image.shape[:2] if (img_w > img_h): if (img_w > maxsize): height = round(img_h * (maxsize / img_w)) image = cv2.resize(image, dsize = (maxsize, height)) else: if (img_h > maxsize): width = round(img_w * (maxsize / img_h)) image = cv2.resize(image, dsize = (width, maxsize)) return image WINDOW_NAME = title ROI_WINDOW = 'Cut-out area' ROI_POPUP = 'OCR detection result Text' # ** main関数 ** def main(): # 日本語フォント指定 if platform.system()=='Windows': fontPIL = 'meiryo.ttc' # ゴシック体 else: fontPIL = 'NotoSansCJK-Bold.ttc' # ゴシック体 # Argument parsing and parameter setting ARGS = parse_args().parse_args() input_stream = ARGS.image lang = ARGS.language layout = int(ARGS.layout) maxsize = int(ARGS.maxsize) titleflg = ARGS.title # 情報表示 display_info(input_stream, lang, layout, maxsize, titleflg) # OCR tools = pyocr.get_available_tools() if len(tools) == 0: print(RED + "\nOCR tool Not found." + NOCOLOR) quit() tool = tools[0] # OpenCV でイメージを読む lena_frame_org = cv2.imread(input_stream) if lena_frame_org is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # mylib_frame ライブラリ imgfr = mylib_frame.ImageFrame(lena_frame_org) # 初期化 imgfr.set_screen_size(1680, 1050) lena_frame = imgfr.frame_resize(maxsize) frame = np.zeros(lena_frame.shape, np.uint8) popup_frame = np.zeros((120, 500, 3), np.uint8) anchor = cvui.Point() roi = cvui.Rect(0, 0, 0, 0) working = False frame_h, frame_w = frame.shape[:2] outf = False org_h, org_w = imgfr.get_original_size() scale_h, scale_w = imgfr.get_scale() print('\n original w x h : {:=5} x {:=5}'.format(org_h, org_w)) print(' display w x h : {:=5} x {:=5}'.format(frame_h, frame_w)) print(' scale w x h : {:.3f} x {:.3f}'.format(scale_h, scale_w)) print(' -----------') # Init cvui and tell it to create a OpenCV window, i.e. cv.namedWindow(WINDOW_NAME). cv2.namedWindow(WINDOW_NAME, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cvui.init(WINDOW_NAME) while (True): # Fill the frame with Lena's image frame[:] = lena_frame[:] # Show the coordinates of the mouse pointer on the screen cvui.text(frame, 10, 10, 'Click mouse left-button and drag the pointer around to select a cut area.') # マウス・イベント if cvui.mouse(cvui.LEFT_BUTTON, cvui.DOWN): # マウスポインタにアンカーを配置 anchor.x = cvui.mouse().x anchor.y = cvui.mouse().y # 作業中の通知(作業中はウインドウの更新しない) working = True if cvui.mouse(cvui.LEFT_BUTTON, cvui.IS_DOWN): # 領域を設定 width = cvui.mouse().x - anchor.x height = cvui.mouse().y - anchor.y roi.x = anchor.x + width if width < 0 else anchor.x roi.y = anchor.y + height if height < 0 else anchor.y roi.width = abs(width) roi.height = abs(height) # 座標とサイズを表示 cvui.printf(frame, roi.x + 5, roi.y + 5, 0.3, 0xff0000, '(%d,%d)', roi.x, roi.y) cvui.printf(frame, cvui.mouse().x + 5, cvui.mouse().y + 5, 0.3, 0xff0000, 'w:%d, h:%d', roi.width, roi.height) if cvui.mouse(cvui.UP): # 領域指定作業の終了 working = False outf = True # 領域内を確認 lenaRows, lenaCols, lenaChannels = lena_frame.shape roi.x = 0 if roi.x < 0 else roi.x roi.y = 0 if roi.y < 0 else roi.y roi.width = roi.width + lena_frame.cols - (roi.x + roi.width) if roi.x + roi.width > lenaCols else roi.width roi.height = roi.height + lena_frame.rows - (roi.y + roi.height) if roi.y + roi.height > lenaRows else roi.height # 設定領域をレンダリング cvui.rect(frame, roi.x, roi.y, roi.width, roi.height, 0xff0000) # タイトル描画 if (titleflg == 'y'): cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA) # ウインドウの更新 cvui.update() # 画面の表示 cv2.imshow(WINDOW_NAME, frame) cv2.moveWindow(WINDOW_NAME, 80, 0) # 得られた表示座標から元画像の位置を計算して画像を切り出す if roi.area() > 50 and working == False: x0, y0 = imgfr.get_res2org_xy(roi.x, roi.y) x1, y1 = imgfr.get_res2org_xy(roi.x + roi.width, roi.y + roi.height) lenaRoi = lena_frame_org[y0 : y1, x0 : x1] cv2.namedWindow(ROI_WINDOW, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(ROI_WINDOW, lenaRoi) cv2.moveWindow(ROI_WINDOW, frame_w + 100, 0) lenaRoi_h, lenaRoi_w = lenaRoi.shape[:2] # 切り出した領域を OCR # PILのイメージにする lenaRoi1 = cv2.cvtColor(lenaRoi, cv2.COLOR_RGB2BGR) imgRoi = Image.fromarray(lenaRoi1) # txt is a Python string txt = tool.image_to_string(imgRoi, lang=lang, builder=pyocr.builders.TextBuilder(tesseract_layout=layout)) # テキストの描画 if len(txt)>0: popup_frame[:,:,:] = 0 cv2.rectangle(popup_frame, (0, 88), (500, 105), (255,0,0), -1) myfunction.cv2_putText(img = popup_frame, text = txt, org = (15, 104), fontFace = fontPIL, fontScale = 12, color = (255,255,255), mode = 0) cv2.namedWindow(ROI_POPUP, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(ROI_POPUP, popup_frame) cv2.moveWindow(ROI_POPUP, frame_w + 100, lenaRoi_h + 100) if outf: print(' ', txt) print(' <area> ({}, {}) - ({}, {})'.format(x0, y0, x1, y1)) outf = False key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' break if not mylib_gui._is_visible(title): # 'Close' button break cv2.destroyAllWindows() print(' -----------\n Finished.') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
(py37w) > python tryocr_step3.py --- TryOCR Test Program Step-3 --- OpenCV version 3.4.2 TryOCR Test Program Step-3: Starting application... - Image File : images/sample0.png - Language : jpn - Layout : 6 - Program Title: y - Log flag : y file: <images/sample0.png> Screen size: width x height = 2560 x 1440 (pixels) original w x h : 1754 x 1240 display w x h : 1390 x 983 scale w x h : 0.792 x 0.793 ----------- NSホールディングス株式会社 preprocess: 0 area: (59, 191) - (493, 241) ----------- Finished.
# -*- coding: utf-8 -*- ##------------------------------------------ ## TryOCR Test Programe Step-3 ## with tesseract & PyOCR & cvui ## ## 2021.12.19 Masahiro Izutsu ##------------------------------------------ ## tryocr_step3.py ## 2022.01.04 linux/windows ## 前処理: '白黒2値', '罫線消去', '印影消去' # Color Escape Code GREEN = '\033[1;32m' RED = '\033[1;31m' NOCOLOR = '\033[0m' YELLOW = '\033[1;33m' # 定数定義 LINE_WORD_BOX_COLOR = (0, 0, 240) WORD_BOX_COLOR = (255, 0, 0) CONTENTS_COLOR = (0, 128, 0) from os.path import expanduser DEF_INPUT_FILE = expanduser('images/sample0.png') # import処理 from PIL import Image import sys import pyocr import pyocr.builders import cv2 import cvui import argparse import myfunction import numpy as np import mylib_gui import mylib_frame import mylib_preprocess import mylib_screen import platform from tkinter import filedialog # タイトル・バージョン情報 title = 'TryOCR Test Program Step-3' print(GREEN) print('--- {} ---'.format(title)) print(' OpenCV version {} '.format(cv2.__version__)) print(NOCOLOR) # Parses arguments for the application def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = DEF_INPUT_FILE, help = 'Absolute path to image file. Default value is \'' + DEF_INPUT_FILE + '\'') parser.add_argument('-l', '--language', metavar = 'LANGUAGE', default = 'jpn', help = 'Language. Default value is \'jpn\'') parser.add_argument('--layout', metavar = 'LAYOUT', default = 6, help = 'Tesseract layout Default value is 6') parser.add_argument('-t', '--title', metavar = 'TITLE', default = 'y', help = 'Program title flag.(y/n) Default value is \'y\'') parser.add_argument('--log', metavar = 'LOG', default = 'y', help = 'Log flag.(y/n) Default value is \'y\'') return parser # モデル基本情報の表示 def display_info(image, lang, layout, titleflg, logflg): print(YELLOW + title + ': Starting application...' + NOCOLOR) print(' - ' + YELLOW + 'Image File : ' + NOCOLOR, image) print(' - ' + YELLOW + 'Language : ' + NOCOLOR, lang) print(' - ' + YELLOW + 'Layout : ' + NOCOLOR, layout) print(' - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg) print(' - ' + YELLOW + 'Log flag : ' + NOCOLOR, logflg) # 画像注釈 def image_annotation(lena_frame_org, lang='jpn', layout=6, titleflg=False, logflag=False): WINDOW_NAME = title ROI_WINDOW = 'Cut-out area' ROI_POPUP = 'OCR detection result Text' preprocess_mode = 0x0 wlock1 = 0 wlock2 = 0 wlock3 = 0 # 日本語フォント指定 if platform.system()=='Windows': fontPIL = 'meiryo.ttc' # ゴシック体 else: fontPIL = 'NotoSansCJK-Bold.ttc' # ゴシック体 # ディスプレイ解像度を得る monitor_height, monitor_width = mylib_screen.get_display_size(logflag) maxsize = monitor_height - 50 # 画像の前処理 imgpros = mylib_preprocess.ImagePreprocess(False) # 初期化 # OCR tools = pyocr.get_available_tools() if len(tools) == 0: print(RED + "\nOCR tool Not found." + NOCOLOR) quit() tool = tools[0] # mylib_frame ライブラリ imgfr = mylib_frame.ImageFrame(lena_frame_org) # 初期化 imgfr.set_screen_size(monitor_width, monitor_height) lena_frame = imgfr.frame_resize(maxsize) frame = np.zeros(lena_frame.shape, np.uint8) popup_frame = np.zeros((120, 500, 3), np.uint8) anchor = cvui.Point() roi = cvui.Rect(0, 0, 0, 0) working = False frame_h, frame_w = frame.shape[:2] outf = False org_h, org_w = imgfr.get_original_size() scale_h, scale_w = imgfr.get_scale() if logflag: print('\n original w x h : {:=5} x {:=5}'.format(org_h, org_w)) print(' display w x h : {:=5} x {:=5}'.format(frame_h, frame_w)) print(' scale w x h : {:.3f} x {:.3f}'.format(scale_h, scale_w)) print(' -----------') # Init cvui and tell it to create a OpenCV window, i.e. cv.namedWindow(WINDOW_NAME). cv2.namedWindow(WINDOW_NAME, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cvui.init(WINDOW_NAME) while (True): # Fill the frame with Lena's image frame[:] = lena_frame[:] # Show the coordinates of the mouse pointer on the screen cvui.text(frame, 10, 10, 'Click mouse left-button and drag the pointer around to select a cut area.') # マウス・イベント if cvui.mouse(cvui.LEFT_BUTTON, cvui.DOWN): # マウスポインタにアンカーを配置 anchor.x = cvui.mouse().x anchor.y = cvui.mouse().y # 作業中の通知(作業中はウインドウの更新しない) working = True if cvui.mouse(cvui.LEFT_BUTTON, cvui.IS_DOWN): # 領域を設定 width = cvui.mouse().x - anchor.x height = cvui.mouse().y - anchor.y roi.x = anchor.x + width if width < 0 else anchor.x roi.y = anchor.y + height if height < 0 else anchor.y roi.width = abs(width) roi.height = abs(height) # 座標とサイズを表示 cvui.printf(frame, roi.x + 5, roi.y + 5, 0.3, 0xff0000, '(%d,%d)', roi.x, roi.y) cvui.printf(frame, cvui.mouse().x + 5, cvui.mouse().y + 5, 0.3, 0xff0000, 'w:%d, h:%d', roi.width, roi.height) if cvui.mouse(cvui.UP): # 領域指定作業の終了 working = False outf = True wlock1 = 0 wlock2 = 0 wlock3 = 0 # 領域内を確認 lenaRows, lenaCols, lenaChannels = lena_frame.shape roi.x = 0 if roi.x < 0 else roi.x roi.y = 0 if roi.y < 0 else roi.y roi.width = roi.width + lena_frame.cols - (roi.x + roi.width) if roi.x + roi.width > lenaCols else roi.width roi.height = roi.height + lena_frame.rows - (roi.y + roi.height) if roi.y + roi.height > lenaRows else roi.height # 設定領域をレンダリング cvui.rect(frame, roi.x, roi.y, roi.width, roi.height, 0xff0000) # タイトル描画 if (titleflg == 'y'): cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA) # ウインドウの更新 cvui.update() # 画面の表示 cv2.imshow(WINDOW_NAME, frame) if wlock1 < 10: cv2.moveWindow(WINDOW_NAME, 80, 0) wlock1 = wlock1 + 1 else: wlock1 = 10 # 得られた表示座標から元画像の位置を計算して画像を切り出す if roi.area() > 50 and working == False: x0, y0 = imgfr.get_res2org_xy(roi.x, roi.y) x1, y1 = imgfr.get_res2org_xy(roi.x + roi.width, roi.y + roi.height) lenaRoi = lena_frame_org[y0 : y1, x0 : x1] # 前処理 prs_color = [(0,0,0), (0,0,0), (0,0,0)] if preprocess_mode & 0x4 != 0: lenaRoi = imgpros.image_processing_execution(lenaRoi, 4) prs_color[2] = (0,0,255) if preprocess_mode & 0x2 != 0: lenaRoi = imgpros.image_processing_execution(lenaRoi, 2) prs_color[1] = (0,0,255) if preprocess_mode & 0x1 != 0: lenaRoi = imgpros.image_processing_execution(lenaRoi, 1) prs_color[0] = (0,0,255) # OCR入力画像表示 lenaRoi_h, lenaRoi_w = lenaRoi.shape[:2] img_Roi = np.zeros((lenaRoi_h + 30, lenaRoi_w, 3), np.uint8) img_Roi[:,:,:] = 200 img_Roi[30:lenaRoi_h + 30,:] = lenaRoi ## 前処理モード表示 (サイズ2段階,エリアがないときは表示しない) if lenaRoi_w > 320: fs = 16 xs = 80 else: fs = 10 xs = 50 if xs*3 < lenaRoi_w: myfunction.cv2_putText(img_Roi, '前処理:', (10,4), fontPIL, fs, (100,100,100), 1) myfunction.cv2_putText(img_Roi, '白黒2値', (10+xs,4), fontPIL, fs, prs_color[0], 1) myfunction.cv2_putText(img_Roi, '罫線消去', (10+xs*2,4), fontPIL, fs, prs_color[1], 1) myfunction.cv2_putText(img_Roi, '印影消去', (10+xs*3,4), fontPIL, fs, prs_color[2], 1) ## cv2.namedWindow(ROI_WINDOW, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(ROI_WINDOW, img_Roi) if wlock2 < 10: cv2.moveWindow(ROI_WINDOW, frame_w + 100, 0) wlock2 = wlock2 + 1 else: wlock2 = 10 # 切り出した領域を OCR # PILのイメージにする lenaRoi1 = cv2.cvtColor(lenaRoi, cv2.COLOR_RGB2BGR) imgRoi = Image.fromarray(lenaRoi1) # txt is a Python string txt = tool.image_to_string(imgRoi, lang=lang, builder=pyocr.builders.TextBuilder(tesseract_layout=layout)) # テキストの描画 if len(txt)>0: popup_frame[:,:,:] = 0 cv2.rectangle(popup_frame, (0, 88), (500, 105), (255,0,0), -1) myfunction.cv2_putText(img = popup_frame, text = txt, org = (15, 104), fontFace = fontPIL, fontScale = 12, color = (255,255,255), mode = 0) cv2.namedWindow(ROI_POPUP, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) cv2.imshow(ROI_POPUP, popup_frame) if wlock3 < 10: cv2.moveWindow(ROI_POPUP, frame_w + 100, lenaRoi_h + 100) wlock3 = wlock3 + 1 else: wlock3 = 10 if outf and logflag: print(' ', txt) print(' preprocess: {} area: ({}, {}) - ({}, {})'.format(preprocess_mode, x0, y0, x1, y1)) outf = False key = cv2.waitKey(1) if key == 27 or key == 113: # 'esc' or 'q' break elif key >= ord('0') and key <= ord('7'): # 前処理モード変更 preprocess_mode = key - ord('0') outf = True if not mylib_gui._is_visible(title): # 'Close' button break cv2.destroyAllWindows() if logflag: print(' -----------\n') return key # ** main関数 ** def main(): loop_flg = True # Argument parsing and parameter setting ARGS = parse_args().parse_args() filename = ARGS.image lang = ARGS.language layout = int(ARGS.layout) titleflg = ARGS.title logflg = ARGS.log logflag = True if logflg == 'y' else False # 情報表示 display_info(filename, lang, layout, titleflg, logflg) while(loop_flg): if logflag: print('\n file: <{}>'.format(filename)) # OpenCV でイメージを読む frame = cv2.imread(filename) if frame is None: print(RED + "\nUnable to read the input." + NOCOLOR) quit() # 画像注釈 ret = image_annotation(frame, lang, layout, titleflg, logflag) # if ret == 27 or ret == 113 or ret == -1: # loop_flg = False # 画像ファイルの選択 filename = filedialog.askopenfilename( title = "画像ファイルを開く", filetypes = [("Image file", ".bmp .png .jpg .tif"), ("Bitmap", ".bmp"), ("PNG", ".png"), ("JPEG", ".jpg")], # ファイルフィルタ initialdir = "./" # 自分自身のディレクトリ ) if len(filename) == 0: break if logflag: print('\n Finished.') # main関数エントリーポイント(実行開始) if __name__ == "__main__": sys.exit(main())
(py37w) > cd ~/workspace_py37/tryocr (py37w) > python mylib_yaml.py :
(py37w) > cd ~/workspace_py37/tryocr (py37w) > python tryocr_step4.py :
(py37w) > cd ~/workspace_py37/tryocr (py37w) > python tryocr_step5.py :
(py37w) > cd ~/workspace_py37/tryocr (py37w) > python tryocr_step6.py :
# 日本語フォント指定 if platform.system()=='Windows': fontPIL = 'meiryo.ttc' # ゴシック体 else: fontPIL = 'NotoSansCJK-Bold.ttc' # ゴシック体
Traceback (most recent call last): File ".\image_classification.py", line 292, in <module> sys.exit(main()) File ".\image_classification.py", line 128, in main labels = np.loadtxt(label_path, dtype='str', delimiter='\n') File "C:\Users\izuts\anaconda3\envs\py37w\lib\site-packages\numpy\lib\npyio.py", line 1098, in loadtxt first_line = next(fh) UnicodeDecodeError: 'cp932' codec can't decode byte 0x86 in position 12: illegal multibyte sequence
with open(labels, encoding='utf-8') as labels_file: label_list = labels_file.read().splitlines()
Traceback (most recent call last): File ".\ocrtest2.py", line 102, in <module> sys.exit(main()) File ".\ocrtest2.py", line 90, in main builder=pyocr.builders.WordBoxBuilder(tesseract_layout=layout)) File "C:\Users\mizutu\anaconda3\envs\py37w\lib\site-packages\pyocr\tesseract.py", line 387, in image_to_string -1, "Unable to find output file (tested {})".format(tested_files) pyocr.error.TesseractError: (-1, "Unable to find output file (tested ['C:\\\\__temp\\\\tmp8lcbfvf6\\\\output.html', 'C:\\\\__temp\\\\tmp8lcbfvf6\\\\output.hocr'])")
QWindowsWindow::setGeometry: Unable to set geometry 98x69+81+31 (frame: 114x108+73+0) on QWidgetWindow/"TryOCR Test Program Step-1Window" on "\\.\DISPLAY1". Resulting geometry: 120x69+81+31 (frame: 136x108+73+0) margins: 8, 31, 8, 8 minimum size: 98x69 maximum size: 98x69 MINMAXINFO maxSize=0,0 maxpos=0,0 mintrack=114,108 maxtrack=114,108)
PukiWiki 1.5.2 © 2001-2019 PukiWiki Development Team. Powered by PHP 7.4.3-4ubuntu2.19. HTML convert time: 0.492 sec.