PyTorch9

PyTorch ではじめる AI開発 9 †

　公開されている AIモデルを改良して性能を向上させる開発手法を学ぶ。
　一般的な条件下での性能をある程度妥協して、特定の条件下において性能を向上させる。

PyTorch ではじめる AI開発 9
CHAPTER 09 OCR を完成させる
参考資料

※ 最終更新:2021/10/18　

CHAPTER 09 OCR を完成させる †

　「PyTorch ではじめる AI開発」の著者が公開している OCR プログラムでは、文に縦書きと横書きがあるという日本語の特性によって認識精度が低下するという問題を含んでいる。
　OCR プログラムを横書き専用とすることで、横書きの文字列に対する OCR の認識精度を向上させる。

↑

SECTION-025 横書き専用 OCR の作成 †

　「Chapter 08」では「文字種類の認識」を行うモデルのファインチューニングを行った。　ここでは「文の検出」と「文字領域の検出」を行うニューラルネットワークを位置から学習させることで新たに作成する。

↑

横書き文字列の確認 †

作業用ディレクトリの作成

(py37) $ cd ~/workspace_py37/
(py37) $ mkdir chapter09
(py37) $ cd chapter09

著者が公開する OCR プログラムをダウンロードする。

(py37) $ git clone -b version2 https://github.com/tanreinama/OCR_Japanease/
Cloning into 'OCR_Japanease'...
remote: Enumerating objects: 128, done.
remote: Counting objects: 100% (128/128), done.
remote: Compressing objects: 100% (97/97), done.
remote: Total 128 (delta 57), reused 80 (delta 24), pack-reused 0
Receiving objects: 100% (128/128), 3.46 MiB | 21.32 MiB/s, done.
Resolving deltas: 100% (57/57), done.
(py37) $ ls
OCR_Japanease

学習済みモデルをダウンロードする。

(py37) $ cd OCR_Japanease/
(py37) $ wget https://nama.ne.jp/models/ocr_jp-v2.zip
--2021-10-11 16:34:14--  https://nama.ne.jp/models/ocr_jp-v2.zip
nama.ne.jp (nama.ne.jp) をDNSに問いあわせています... 112.78.112.176
nama.ne.jp (nama.ne.jp)|112.78.112.176|:443 に接続しています... 接続しました。
HTTP による接続要求を送信しました、応答を待っています... 200 OK
長さ: 180256769 (172M) [application/zip]
`ocr_jp-v2.zip' に保存中

ocr_jp-v2.zip       100%[===================>] 171.91M  11.6MB/s    in 14s     

2021-10-11 16:34:28 (12.2 MB/s) - `ocr_jp-v2.zip' へ保存完了 [180256769/180256769]

(py37) $ unzip ocr_jp-v2.zip
Archive:  ocr_jp-v2.zip
  inflating: models/detectionnet.model  
  inflating: models/classifiernet.model  
(py37) $ cp -r models OCR_Japanease/

必要なサンプルデータをコピーする。

(py37) $ cd ~/workspace_py37/chapter09/
(py37) $ cp -r OCR_Japanease/misc ./
(py37) $ cp -r OCR_Japanease/nets ./
(py37) $ cp -r OCR_Japanease/models ./
(py37) $ cp -r ~/workspace_py37/sample/chapt09/chapt09.ipynb ./
(py37) $ cp -r ~/workspace_py37/sample/chapt09/yokogaki.png ./
(py37) $ cp -r ~/workspace_py37/sample/chapt09/chapt09_1.py ./chapt09_1a.py

動作確認に必要なファイル一覧

~/workspace_py37/chapter09/
.
├── chapt09.ipynb
├── chapt09_1a.py
├── misc
│   ├── detection.py
│   ├── nihongo.py
│   ├── nms.py
│   └── structure.py
├── models
│   ├── classifiernet.model
│   └── detectionnet.model
├── nets
│   ├── block.py
│   ├── classifiernet.py
│   └── detectionnet.py
└── yokogaki.png

OCR プログラムを CPU環境で動かす。

(py37) $ cd ~/workspace_py37/chapter09/OCR_Japanease/
(py37) $ python3 ocr_japanease.py --cpu ../yokogaki.png 
file "../yokogaki.png" detected in 150 dpi.
[Block #0]
横書き日本語のテスト
文字文字
[Block #1]
AIAこは
ひ
カタ
[Block #2]
らがな
[Block #3]
認識認識
[Block #4]
カナ
[Block #5]
I
[Block #6]
こ横書き

※ テスト画像では画像に含まれている文字は認識されているが、文字と文字のつながりでできる「文」については正しく認識できていない。

↑

DetectionNet の動作を確認する。 †

「DetectionNet」の動作を可視化して確認する。

「Jupyter Notebook」をインストールする。

(py37) $ cd ~/workspace_py37/chapter09/
(py37) $ pip install jupyter
Collecting jupyter
  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
    :

▼　インストール・ログ詳細

(py37) $ pip install jupyter
Collecting jupyter
  Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Collecting notebook
  Downloading notebook-6.4.4-py3-none-any.whl (9.9 MB)
     |████████████████████████████████| 9.9 MB 4.0 MB/s 
Collecting ipykernel
  Downloading ipykernel-6.4.1-py3-none-any.whl (124 kB)
     |████████████████████████████████| 124 kB 9.0 MB/s 
Collecting jupyter-console
  Downloading jupyter_console-6.4.0-py3-none-any.whl (22 kB)
Collecting ipywidgets
  Downloading ipywidgets-7.6.5-py2.py3-none-any.whl (121 kB)
     |████████████████████████████████| 121 kB 3.6 MB/s 
Collecting nbconvert
  Downloading nbconvert-6.2.0-py3-none-any.whl (553 kB)
     |████████████████████████████████| 553 kB 52.5 MB/s 
Collecting qtconsole
  Downloading qtconsole-5.1.1-py3-none-any.whl (119 kB)
     |████████████████████████████████| 119 kB 27.7 MB/s 
Collecting matplotlib-inline<0.2.0,>=0.1.0
  Downloading matplotlib_inline-0.1.3-py3-none-any.whl (8.2 kB)
Requirement already satisfied: importlib-metadata<5 in /home/mizutu/anaconda3/envs/py37/lib/python3.7/site-packages (from ipykernel->jupyter) (4.8.1)
Collecting ipython-genutils
  Downloading ipython_genutils-0.2.0-py2.py3-none-any.whl (26 kB)
Requirement already satisfied: tornado<7.0,>=4.2 in /home/mizutu/anaconda3/envs/py37/lib/python3.7/site-packages (from ipykernel->jupyter) (6.1)
Collecting traitlets<6.0,>=4.1.0
  Downloading traitlets-5.1.0-py3-none-any.whl (101 kB)
     |████████████████████████████████| 101 kB 13.7 MB/s 
Collecting argcomplete>=1.12.3
  Downloading argcomplete-1.12.3-py2.py3-none-any.whl (38 kB)
Collecting debugpy<2.0,>=1.0.0
  Downloading debugpy-1.5.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.9 MB)
     |████████████████████████████████| 1.9 MB 20.7 MB/s 
Collecting ipython<8.0,>=7.23.1
  Downloading ipython-7.28.0-py3-none-any.whl (788 kB)
     |████████████████████████████████| 788 kB 27.9 MB/s 
Collecting jupyter-client<8.0
  Downloading jupyter_client-7.0.6-py3-none-any.whl (125 kB)
     |████████████████████████████████| 125 kB 52.8 MB/s 
Requirement already satisfied: typing-extensions>=3.6.4 in /home/mizutu/anaconda3/envs/py37/lib/python3.7/site-packages (from importlib-metadata<5->ipykernel->jupyter) (3.10.0.2)
Requirement already satisfied: zipp>=0.5 in /home/mizutu/anaconda3/envs/py37/lib/python3.7/site-packages (from importlib-metadata<5->ipykernel->jupyter) (3.5.0)
Collecting backcall
  Downloading backcall-0.2.0-py2.py3-none-any.whl (11 kB)
Collecting jedi>=0.16
  Downloading jedi-0.18.0-py2.py3-none-any.whl (1.4 MB)
     |████████████████████████████████| 1.4 MB 38.5 MB/s 
Collecting prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0
  Downloading prompt_toolkit-3.0.20-py3-none-any.whl (370 kB)
     |████████████████████████████████| 370 kB 51.3 MB/s 
Collecting pickleshare
  Downloading pickleshare-0.7.5-py2.py3-none-any.whl (6.9 kB)
Collecting pygments
  Downloading Pygments-2.10.0-py3-none-any.whl (1.0 MB)
     |████████████████████████████████| 1.0 MB 39.7 MB/s 
Collecting decorator
  Downloading decorator-5.1.0-py3-none-any.whl (9.1 kB)
Collecting pexpect>4.3
  Downloading pexpect-4.8.0-py2.py3-none-any.whl (59 kB)
     |████████████████████████████████| 59 kB 7.7 MB/s 
Requirement already satisfied: setuptools>=18.5 in /home/mizutu/anaconda3/envs/py37/lib/python3.7/site-packages (from ipython<8.0,>=7.23.1->ipykernel->jupyter) (52.0.0.post20210125)
Collecting parso<0.9.0,>=0.8.0
  Downloading parso-0.8.2-py2.py3-none-any.whl (94 kB)
     |████████████████████████████████| 94 kB 3.6 MB/s 
Collecting entrypoints
  Downloading entrypoints-0.3-py2.py3-none-any.whl (11 kB)
Collecting pyzmq>=13
  Downloading pyzmq-22.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.1 MB)
     |████████████████████████████████| 1.1 MB 43.7 MB/s 
Requirement already satisfied: python-dateutil>=2.1 in /home/mizutu/anaconda3/envs/py37/lib/python3.7/site-packages (from jupyter-client<8.0->ipykernel->jupyter) (2.8.2)
Collecting nest-asyncio>=1.5
  Downloading nest_asyncio-1.5.1-py3-none-any.whl (5.0 kB)
Collecting jupyter-core>=4.6.0
  Downloading jupyter_core-4.8.1-py3-none-any.whl (86 kB)
     |████████████████████████████████| 86 kB 6.1 MB/s 
Collecting ptyprocess>=0.5
  Downloading ptyprocess-0.7.0-py2.py3-none-any.whl (13 kB)
Collecting wcwidth
  Downloading wcwidth-0.2.5-py2.py3-none-any.whl (30 kB)
Requirement already satisfied: six>=1.5 in /home/mizutu/anaconda3/envs/py37/lib/python3.7/site-packages (from python-dateutil>=2.1->jupyter-client<8.0->ipykernel->jupyter) (1.16.0)
Collecting jupyterlab-widgets>=1.0.0
  Downloading jupyterlab_widgets-1.0.2-py3-none-any.whl (243 kB)
     |████████████████████████████████| 243 kB 10.5 MB/s 
Collecting nbformat>=4.2.0
  Downloading nbformat-5.1.3-py3-none-any.whl (178 kB)
     |████████████████████████████████| 178 kB 35.5 MB/s 
Collecting widgetsnbextension~=3.5.0
  Downloading widgetsnbextension-3.5.1-py2.py3-none-any.whl (2.2 MB)
     |████████████████████████████████| 2.2 MB 43.1 MB/s 
Collecting jsonschema!=2.5.0,>=2.4
  Downloading jsonschema-4.1.0-py3-none-any.whl (69 kB)
     |████████████████████████████████| 69 kB 2.4 MB/s 
Collecting pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0
  Downloading pyrsistent-0.18.0-cp37-cp37m-manylinux1_x86_64.whl (119 kB)
     |████████████████████████████████| 119 kB 34.6 MB/s 
Collecting attrs>=17.4.0
  Downloading attrs-21.2.0-py2.py3-none-any.whl (53 kB)
     |████████████████████████████████| 53 kB 3.0 MB/s 
Collecting terminado>=0.8.3
  Downloading terminado-0.12.1-py3-none-any.whl (15 kB)
Collecting prometheus-client
  Downloading prometheus_client-0.11.0-py2.py3-none-any.whl (56 kB)
     |████████████████████████████████| 56 kB 5.3 MB/s 
Collecting argon2-cffi
  Downloading argon2_cffi-21.1.0-cp35-abi3-manylinux_2_5_x86_64.manylinux1_x86_64.whl (96 kB)
     |████████████████████████████████| 96 kB 4.5 MB/s 
Collecting jinja2
  Downloading Jinja2-3.0.2-py3-none-any.whl (133 kB)
     |████████████████████████████████| 133 kB 38.0 MB/s 
Collecting Send2Trash>=1.5.0
  Downloading Send2Trash-1.8.0-py3-none-any.whl (18 kB)
Collecting cffi>=1.0.0
  Downloading cffi-1.15.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (427 kB)
     |████████████████████████████████| 427 kB 35.3 MB/s 
Collecting pycparser
  Downloading pycparser-2.20-py2.py3-none-any.whl (112 kB)
     |████████████████████████████████| 112 kB 24.6 MB/s 
Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-2.0.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (31 kB)
Collecting jupyterlab-pygments
  Downloading jupyterlab_pygments-0.1.2-py2.py3-none-any.whl (4.6 kB)
Collecting bleach
  Downloading bleach-4.1.0-py2.py3-none-any.whl (157 kB)
     |████████████████████████████████| 157 kB 31.6 MB/s 
Collecting defusedxml
  Downloading defusedxml-0.7.1-py2.py3-none-any.whl (25 kB)
Collecting nbclient<0.6.0,>=0.5.0
  Downloading nbclient-0.5.4-py3-none-any.whl (66 kB)
     |████████████████████████████████| 66 kB 5.6 MB/s 
Collecting pandocfilters>=1.4.1
  Downloading pandocfilters-1.5.0-py2.py3-none-any.whl (8.7 kB)
Collecting mistune<2,>=0.8.1
  Downloading mistune-0.8.4-py2.py3-none-any.whl (16 kB)
Collecting testpath
  Downloading testpath-0.5.0-py3-none-any.whl (84 kB)
     |████████████████████████████████| 84 kB 4.0 MB/s 
Collecting webencodings
  Downloading webencodings-0.5.1-py2.py3-none-any.whl (11 kB)
Collecting packaging
  Downloading packaging-21.0-py3-none-any.whl (40 kB)
     |████████████████████████████████| 40 kB 6.2 MB/s 
Requirement already satisfied: pyparsing>=2.0.2 in /home/mizutu/anaconda3/envs/py37/lib/python3.7/site-packages (from packaging->bleach->nbconvert->jupyter) (2.4.7)
Collecting qtpy
  Downloading QtPy-1.11.2-py2.py3-none-any.whl (58 kB)
     |████████████████████████████████| 58 kB 7.8 MB/s 
Installing collected packages: traitlets, pyrsistent, attrs, wcwidth, pyzmq, ptyprocess, parso, nest-asyncio, jupyter-core, jsonschema, ipython-genutils, entrypoints, webencodings, pygments, pycparser, prompt-toolkit, pickleshare, pexpect, packaging, nbformat, matplotlib-inline, MarkupSafe, jupyter-client, jedi, decorator, backcall, testpath, pandocfilters, nbclient, mistune, jupyterlab-pygments, jinja2, ipython, defusedxml, debugpy, cffi, bleach, argcomplete, terminado, Send2Trash, prometheus-client, nbconvert, ipykernel, argon2-cffi, notebook, widgetsnbextension, qtpy, jupyterlab-widgets, qtconsole, jupyter-console, ipywidgets, jupyter
Successfully installed MarkupSafe-2.0.1 Send2Trash-1.8.0 argcomplete-1.12.3 argon2-cffi-21.1.0 attrs-21.2.0 backcall-0.2.0 bleach-4.1.0 cffi-1.15.0 debugpy-1.5.0 decorator-5.1.0 defusedxml-0.7.1 entrypoints-0.3 ipykernel-6.4.1 ipython-7.28.0 ipython-genutils-0.2.0 ipywidgets-7.6.5 jedi-0.18.0 jinja2-3.0.2 jsonschema-4.1.0 jupyter-1.0.0 jupyter-client-7.0.6 jupyter-console-6.4.0 jupyter-core-4.8.1 jupyterlab-pygments-0.1.2 jupyterlab-widgets-1.0.2 matplotlib-inline-0.1.3 mistune-0.8.4 nbclient-0.5.4 nbconvert-6.2.0 nbformat-5.1.3 nest-asyncio-1.5.1 notebook-6.4.4 packaging-21.0 pandocfilters-1.5.0 parso-0.8.2 pexpect-4.8.0 pickleshare-0.7.5 prometheus-client-0.11.0 prompt-toolkit-3.0.20 ptyprocess-0.7.0 pycparser-2.20 pygments-2.10.0 pyrsistent-0.18.0 pyzmq-22.3.0 qtconsole-5.1.1 qtpy-1.11.2 terminado-0.12.1 testpath-0.5.0 traitlets-5.1.0 wcwidth-0.2.5 webencodings-0.5.1 widgetsnbextension-3.5.1

「Jupyter Notebook」を起動する。

(py37) $ cd ~/workspace_py37/chapter09/
(py37) $ jupyter notebook
[I 14:23:06.039 NotebookApp] ローカルディレクトリからノートブックをサーブ: /home/mizutu/workspace_py37/chapter09
[I 14:23:06.039 NotebookApp] Jupyter Notebook 6.4.4 is running at:
[I 14:23:06.039 NotebookApp] http://localhost:8888/?token=478e25ec2281b76d3b65067aade61b647252f7789574b678
[I 14:23:06.039 NotebookApp]  or http://127.0.0.1:8888/?token=478e25ec2281b76d3b65067aade61b647252f7789574b678
[I 14:23:06.039 NotebookApp] サーバを停止し全てのカーネルをシャットダウンするには Control-C を使って下さい(確認をスキップするには2回)。
[C 14:23:06.093 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///home/mizutu/.local/share/jupyter/runtime/nbserver-5174-open.html
    Or copy and paste one of these URLs:
        http://localhost:8888/?token=478e25ec2281b76d3b65067aade61b647252f7789574b678
     or http://127.0.0.1:8888/?token=478e25ec2281b76d3b65067aade61b647252f7789574b678
    :

自動的にブラウザが起動し「http://localhost:8888」ノートブックのトップページが表示される。

書籍のサンプルコードが入力されているノートブック「chapt09.ipynb」を選択する。

・セル[2]のファイル名「chapt09-model1.pth」を「odels/detectionnet.model」に変更する。
・セル[1]から順に実行する。
　[ 1] 必要なパッケージのインポート。
　[ 2] モデルを作成して学習済みのモデルファイルから重みを読み込む。(ここでは常に'cpu'動作する設定)
　[ 3] OCRする画像ファイルと解像度を設定する。
　[ 4] 読み込んだ画像をグレースケール画像として「gray_img」変数に格納する。
　[ 5]「DetectorNet」クラスを作成する。「word_threshold」は文字である可能性をカットオフする閾値(デフォールトは 0.01)
　[ 6] DetectorNetの実行時の解像度を作成する。512pixel四方の解像度で動作するが、より大きな画像の場合は1024pixel四方か2048pixel四方で4個か16個に分割して実行する。
　[ 7]「DetectorNet」クラスの関数を使って画像をリサイズする。
　[ 8] 解像度に合わせて「_detectNx」関数を呼び出し DetectionNet を得る。
　　　ノイズリダクションとして実行結果のヒートマップ画像は文字の大きさが 0.01以下と認識された箇所を 0に置き換える。

　[ 9] matplotlib の「imshow」関数で DetectionNetの出力を可視化する。入力画像を表示する。

　[10] ニューラルネットワークの出力で文字である可能性のマップ画像は「hm_wd」に2次元データとして入っているので表示する。
　[11] 文である確率のマップ画像は「hm_sent」を表示して可視化する。
　　※「hm_sent」から縦方向に並んでいる文字について薄く縦書きと認識されている部分があることがわかる。
　　　複雑なレイアウトを認識する領域抽出モードで動作し縦書きと横書きがご認識されてしまう場合、デフォールトの公開モデルでは分のレイアウトを誤検出してしまう。

　[12] OCRプログラムが文のレイアウトをどのように認識しているかの確認。「Detector」クラスの「_preprocess」関数を使って文のブロックごとのマップ画像を作成する。
　　　「_preprocess」関数の引数にある値は実験的に決められたアルゴリズムの閾値。子の閾値を変更することで OCRプログラムの動作を調整することができる。
　　　ノートブック内で値を変更しその後のレイアウトが正しく認識されるように閾値をチューニングすることができる。
　[13]「all_map」と「hm_sent_preprocessed」は OCRが単純なレイアウト用のモードで動作したか領域抽出モードで動作したかによって片方が None になりもう片方がマップ画像になる。
　　　領域抽出モードで動作した場合はさらにマップ画像から文の領域を分離する処理を行う。

　[14]「all_map」には認識された文のブロックのマスク画像が入るので横に並べて表示する。

　　※ デフォールトの公開モデルでは縦書きとご認識された箇所が複数の行からなる大きなブロックとされてしまっている。　　　そのブロックにも見込まれる形でそれ以外の文のブロックも間違って認識されている。
　　※ OCRプログラムの閾値をチューニングする場合はノートブックを実行してマップ画像が適切に出力される値を探していくことになる。
ノートブックの終了の仕方
・ノートブックのホームページから「終了」ボタンを押す。
・ブラウザを終了する。
　※ サーバーは自動的に終了する。終了しないときは「CTRL＋C」を押す。
```
    :
[I 15:01:35.584 NotebookApp] Shutting down 0 terminals
(py37) mizutu@ubuntu-vbox:~/workspace_py37/chapter09$
```

↑

SECTION-026 DetectionNet の実装 †

↑

U-Net について †

公開されているモデルと同じ動作をするニューラルネットワークを作成し横書きの文字専用に学習させる。
・DetectionNet の基礎は「U-Net」と呼ばれるニューラルネットワーク。
・Unet は画像を出力するタイプの畳み込みニューラルネットワーク。
・このタイプのニューラルネットワークでは画像認識ネットワークの構造をもとに画像を出力する層を接続する。
・このときに使用する画像認識ネットワークはバックボーンと呼ばれる。
・CHAPTER 05 で使用した「DeepLabV3」のモデルは「ResNet101」という画像認識ネットワークを利用しているがここではバックボーンを独自に作成する。

U-Net のブロックを実装する。
・「nets」ディレクトリを作成し「block.py」を作る。

ソースファイル

▼　nets/block.py

import torch
import torch.nn as nn
import torch.nn.functional as F

class Block(nn.Module):
    def __init__(self,in_channel,out_channel,stride=1):
        super(Block, self).__init__()

        if in_channel != out_channel or stride != 1:
            self.skip = nn.Conv2d(in_channel,out_channel,1,stride=stride, bias=False)
            self.skipbn = nn.BatchNorm2d(out_channel)
        else:
            self.skip=None

        self.act = nn.ReLU()
        self.conv1 = nn.Conv2d(in_channel,out_channel,3,stride=stride,padding=1,bias=False)
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.conv2 = nn.Conv2d(out_channel,out_channel,3,stride=1,padding=1,bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)

    def forward(self,inp):
        x = self.act(inp)
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.act(x)
        x = self.conv2(x)
        x = self.bn2(x)

        if self.skip is not None:
            skip = self.skip(inp)
            skip = self.skipbn(skip)
        else:
            skip = inp

        x += skip
        return x

↑

U-Net を実装する †

nets/ ディレクトリ内に「detectionnet.py」を作成し「UNet」クラスを作る。
・Unetクラスの初期化のための引数にはバックボーンとなるブロックに含まれる基礎ブロックの数「n_blocks」とチャンネル数「n_channels」がある。
・チャンネル数刃ブロックの入力チャンネルと出力チャンネルなのでブロック数より一つ大きい配列。

・この U-Netha は後で DetectionBet を作成する部品なので入出力のチャンネル数は24チャンネルで通常のカラーやモノクロ画像ではない。
・DetectionNet では入力データの解像度が大きく出力チャンネル数は小さいためここで実装する U-Net では通常の U-Net 解像度を小さくする回数を多くしてチャンネル数を少なく設定している。
・最も小さい解像度のデータのチャンネル数は 384 となっている。・基礎ブロック数は実行速度（≒消費GPUメモリ）およびモデルサイズとのトレードオフで決定され小さめの解像度（192チャンネル）の個所で最大の数となるよう設定されている。

バックボーンの実装
・PyTorch の「nn_ModuleList」クラスでバックボーンとなる「self.backbone」と出力側の畳み込み層となる「self.upstep」バックボーンの中間層と出力側をつなぐ「self.downstep」の3変数を作成する。
・こうすることで「nn_ModuleList」クラスの中にあるブロックを1つづつ取り出して実行することができる
・ブロックは「n_blochs」で指定された数の基礎ブロックを積み重ねてさらにもう一つ解像度を半分のものにするブロックを追加し「nn.Sequential」クラスに渡して作成する。
・「nn_Sequential」クラスでできたブロックのリストを「nn.ModuleList」クラスにして「self.backbone」変数に入れる。
出力側の実装
・基礎ブロックからなるブロックを「nn.ModuleList」クラスに入れて作成する。
・入力チャンネルを調整するブロックと出力チャンネル数を減らすブロックの2つからなるそれぞれのブロックの入力側と出力側のチャンネル数を計算して「nn.ModuleList」クラスを作成する。

U-Net の実行
・最初のループでバックボーン内のブロックすべてを実行し中間のデータを取り出しておく。
・次のループでそれらのデータを小さな解像度から1つづつ処理して最終的に出力データを作成する。
・ループ内の PyTorch の「F.interpolate」関数は画像データの解像度を上げるアップサンプル処理を実行する。
・最後の出力層の前にもう一つアップサンプル処理を入れると入力と出力の解像度が同じになるがここでは入力された画像の半分の解像度の画像を利用するためこの U-Net では最後のアップサンプル処理を省略し、入力されたバックボーンの最初のブロックの出力と同じ解像度のまま出力する。

DetectionNet の実装
・作成した U-Net を使って画像から文字や分の領域を検出する DetectionNet を作成する。
・U-Net の入出力チャンネルは24チャンネルとしていたので入力層となる畳み込み層を使って24チャンネルの画像データを作成し U-Net に入力する。
・そのために DetectionNet では複数の出力層を使用しそれぞれの層の出力をディクショナリとして返す。
・必要な出力数は「文字の領域を表すマップ画像」「文の領域を表すマップ画像」「文字サイズを表す2チャンネル画像」で合計4チャンネルとなる。

・「foward」関数では「hm_wd」「hm_sent」「of_size」というキーにそれぞれの出力を入れて返す。

DetectionNet の全体
・DetectionNet のモデルを返す関数を「detection.py」のトップレベルに作成する。

ソースファイル

▼　nets/detectionnet.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from .block import Block

class UNet(nn.Module):
    def __init__(self, n_blocks=[1,2,6,10,4], n_channels=[24,32,48,96,192,384]):
        super(UNet, self).__init__()

        backbone = []
        in_channel = n_channels[0]
        for i in range(len(n_blocks)):
            channel = n_channels[i+1]
            layers = [Block(in_channel,channel)]
            for _ in range(n_blocks[i]-1):
                layers.append(Block(channel,channel))
            layers.append(Block(channel,channel,2))
            in_channel = channel
            backbone.append(nn.Sequential(*layers))
        self.backbone = nn.ModuleList(backbone)

        upstep = [Block(n_channels[1], n_channels[0])]
        for i in range(len(n_blocks)-1):
            channel = n_channels[i+1]
            upstep.append(Block(channel*2, channel))
        self.upstep = nn.ModuleList(upstep)

        downstep = []
        out_channel = n_channels[0]
        for i in range(len(n_blocks)):
            channel = n_channels[i+1]
            downstep.append(Block(channel, out_channel))
            out_channel = channel
        self.downstep = nn.ModuleList(downstep)

    def forward(self, x):
        back_out = []
        for i in range(len(self.backbone)):
            x = self.backbone[i](x)
            back_out.append(x)

        out = back_out[len(back_out)-1]
        for i in range(len(back_out)-1):
            low = self.downstep[len(self.downstep)-i-1](out)
            up1 = F.interpolate(low, scale_factor=2)
            up2 = back_out[len(back_out)-i-2]
            up = torch.cat([up1,up2], dim=1)
            out = self.upstep[len(self.upstep)-i-1](up)
        return self.upstep[0](out)


class DetectionNet(nn.Module):
    def __init__(self):
        super(DetectionNet, self).__init__()

        self.conv1 = nn.Conv2d(1, 24, 3, 1, 1, bias=True)
        self.block1 = UNet()

        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()

        self.out1 = nn.Conv2d(24, 1, 1, bias=True)
        self.out2 = nn.Conv2d(24, 1, 1, bias=True)
        self.out3 = nn.Conv2d(24, 2, 1, bias=True)

    def forward(self, input):
        x = self.conv1(input)
        x = self.block1(x)
        out = self.relu(x)
        result = {'hm_wd':self.sigmoid(self.out1(out)),
                'hm_sent':self.sigmoid(self.out2(out)),
                'of_size':self.sigmoid(self.out3(out))}
        return result

def get_detectionnet():
    model = DetectionNet()
    return model

↑

SECTION-027 モデルを学習させる †

↑

学習のためのコードを実装 †

必要なファイルを用意する。・作業ディレクトリ「~workspace_py37/chapter09/」にコピーする。
```
(py37) cd ~/workspace_py37/chapter09
(py37) $ cp ~/workspace_py37/sample/chapt09/chapt09_1.py chapt09_1a.py
    :
```

学習に必要なファイル一覧

~/workspace_py37/chapter09/
.
├── nets
│   ├── block.py
│   └── detectionnet.py
├── misc
│   └── nihongo.py
├── fontss
│   ├── ipaexg.ttf
│   └── ipaexm.ttf
└── chapt09_1a.py

ファイルを編集する

▼　chapt09_1a.py

# -*- coding: utf-8 -*-
##------------------------------------------
## 「PyTorch で始める AI開発」
##   Chapter 09 / Section 027
##   OCRを完成させる/横書き専用モデルを学習させる
##
##               2021.10.17 Masahiro Izutsu
##------------------------------------------
## chapt08_19.py  (original: chapt09_1.py)

import os
import numpy as np
from tqdm import tqdm
from PIL import Image, ImageFont, ImageDraw
import itertools
import cv2
import torch
import torch.nn as nn
from torchvision import transforms

from nets.detectionnet import get_detectionnet
from misc.nihongo import nihongo, hiragana, katakana, jyouyou_kanji

# GPUを使うかどうか
USE_DEVICE = 'cuda:0' if torch.cuda.is_available() else 'cpu'
IMAGE_SIZE = 512 # 画像のサイズ
BATCH_SIZE = 2 # 学習時のバッチサイズ
NUM_WORKERS = 4 # 読み込みスレッド数
NUM_ITERATIONS = 200000 # 学習回数

# PyTorchの内部を決定論的に設定する
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

# 乱数を初期化する
np.random.seed(0)
torch.manual_seed(0)

# フォントを読み込む
font_path = os.listdir('fonts/') # フォントファイルのリスト
font_size = [9,14,20,26,32,48] # 使用するフォントサイズ
ttf_fonts = [(ImageFont.truetype('fonts/'+fp, fs),fs)
                for fp,fs in itertools.product(font_path, font_size)]

# 文を構成する文字のクラス
char_classes = (nihongo, hiragana, katakana, jyouyou_kanji)

# 一行の文字列を描写する関数
def draw_line(img, draw, heatmap, offset_w, offset_h, sentence, y_pos):
    # ランダムにフォントと前景色を作成する
    font, cur_size = ttf_fonts[np.random.randint(len(ttf_fonts))]
    fgcolor = np.random.randint(128) # 前景色
    sentsize = np.random.randint(5) + 5 # 描写する文字数
    # 行を描写する位置
    x_pos = max(1,IMAGE_SIZE-sentsize*cur_size*2)
    x_pos = np.random.randint(x_pos) # ランダムな開始位置
    mx1,mx2,my1,my2 = -1,-1,-1,-1 # 文字を書いた矩形
    sentheight = 0 # 描写した行の高さ
    img_org = np.array(img) # 一文字描写する前の画像
    char_class = char_classes[np.random.randint(len(char_classes))] # 文字クラス
    # 一行描写する
    for i in range(sentsize): # 文字数分だけ
        # ランダムな文字を描写する
        char = char_class[np.random.randint(len(char_class))]
        draw.text((x_pos,y_pos), char, font=font, fill=fgcolor)
        aimg = np.array(img)
        where = np.where((img_org - aimg) != 0) # 画像の変わったところ
        img_org = aimg # 前の画像
        if len(where[0]) > 0 and len(where[1]) > 0:
            # 画像の変わったところの位置を得る
            y1,y2,x1,x2 = (min(where[0])//2, max(where[0])//2,
                            min(where[1])//2, max(where[1])//2)
            width = x2 - x1 # 描写した文字の幅
            height = y2 - y1 # 描写した文字の高さ
            x_c = (x1 + x2) / 2 # 描写した文字の中央位置
            y_c = (y1 + y2) / 2 # 描写した文字の中央位置
            if width > 0 and height > 0:
                # 文字である可能性を表すヒートマップ画像
                heatmap += ((np.exp(-(((np.arange(IMAGE_SIZE//2)-x_c) /
                                    (width/10))**2)/2)).reshape(1,-1) *
                            (np.exp(-(((np.arange(IMAGE_SIZE//2)-y_c) /
                                    (height/10))**2)/2)).reshape(-1,1))
                # 文字の大きさを表す画像
                offset_w[y1:y2,x1:x2] = width / IMAGE_SIZE
                offset_h[y1:y2,x1:x2] = height / IMAGE_SIZE
            if i == 0: # 最初の文字
                mx1,my1 = int(np.round(x_c)),int(np.round(y_c))
            mx2,my2 = int(np.round(x_c)),int(np.round(y_c))
            sentheight = max(sentheight, height) # 行の高さ
            x_pos += width + cur_size
    if sentheight > 1 and min(mx1,mx2,my1,my2) >= 0:
        # 文字列であることを表すヒートマップ画像
        for w in 1.0 - np.exp(-(((np.arange(sentheight)/2.4))**2)/2)[::-1]:
            lw = int(np.round(w*sentheight/4))+1
            cv2.line(sentence,(mx1,my1),(mx2,my2),
                        int(np.round(255-255*w)),lw,cv2.LINE_AA)
    return y_pos + cur_size * 2

# 学習データを一枚作る関数
def make_image():
    min_fontsize = font_size[0]
    max_fontsize = font_size[-1]
    # 画像に文字を書く
    bgcolor = np.random.randint(128) + 127 # 背景色
    # 文字を書く画像はPillowのImageで作る
    img = Image.new(size=(IMAGE_SIZE,IMAGE_SIZE), mode='L', color=bgcolor)
    draw = ImageDraw.Draw(img)
    # 出力画像はNumpyの配列で作る
    heatmap = np.zeros((IMAGE_SIZE//2,IMAGE_SIZE//2))
    offset_w = np.zeros((IMAGE_SIZE//2,IMAGE_SIZE//2))
    offset_h = np.zeros((IMAGE_SIZE//2,IMAGE_SIZE//2))
    sentence = np.zeros((IMAGE_SIZE//2,IMAGE_SIZE//2))
    # Y座標を増やしながら横書き文字列を描写してゆく
    y_pos = min_fontsize
    while y_pos < IMAGE_SIZE - max_fontsize:
        y_pos = draw_line(img, draw, heatmap, offset_w, offset_h,
                            sentence, y_pos)
    sentence = sentence/255 # 0から1の値の範囲にする
    img = np.array(img, dtype=np.uint8)
    return img, heatmap, sentence, offset_w, offset_h

# 正弦波のリスト
def make_wave(start_phase, end_phase, num):
    return [np.sin(p) for p in np.linspace(start_phase, end_phase, num)]
# 2次元の歪み座標マップを作る関数
def make_rnd_matwave(x_shape, y_shape, size, n_wave):
    dst = np.ones((y_shape, x_shape), dtype=np.int32)
    # ランダムな位相の正弦波を3つ作る
    w1 = make_wave(n_wave*np.pi*np.random.random(),
                    n_wave*np.pi*np.random.random(),y_shape)
    w2 = make_wave(n_wave*np.pi*np.random.random(),
                    n_wave*np.pi*np.random.random(),y_shape)
    w3 = make_wave(n_wave*np.pi*np.random.random(),
                    n_wave*np.pi*np.random.random(),y_shape)
    # 正弦波を組み合わせて歪みの座標を作る
    for y, (sp, ep, s) in enumerate(zip(w1,w2,w3)):
        # 開始位相と終了位相が変化してゆく正弦波
        d = np.array(make_wave(n_wave*np.pi*sp,n_wave*np.pi*ep,x_shape))
        # 波の大きさが変化してゆく正弦波
        d *= (s * y_shape * size)
        # 場所に対する出力座標にする
        d -= np.linspace(d[0], d[-1], x_shape)
        d = np.round(d)
        d += np.arange(x_shape)
        d = d.astype(np.int32)
        d = np.clip(d, 0, x_shape-1)
        dst[y,:] = d # 入力座標に対する歪んだ座標のマップ
    return dst
# 正弦波の組み合わせから2次元の歪み座標を作っておく
tmp_wave = [make_rnd_matwave(512, 512,
                size=np.random.random()*0.12, n_wave=1.2)
            for _ in range(10)]

# 画像にランダムなノイズを入れる関数
def _rnd_noize_img(img, scale=15):
    scale = np.random.random() * scale
    # ノイズの大きさを3段階で作る
    a = np.random.normal(loc=0.0, scale=scale,
                        size=(img.shape[0]//4,img.shape[1]//4))
    b = np.random.normal(loc=0.0, scale=scale,
                        size=(img.shape[0]//2,img.shape[1]//2))
    c = np.random.normal(loc=0.0, scale=scale,
                        size=(img.shape[0],img.shape[1]))
    a = cv2.resize(a, (img.shape[0],img.shape[1]),
                        interpolation=cv2.INTER_CUBIC)
    b = cv2.resize(b, (img.shape[0],img.shape[1]),
                        interpolation=cv2.INTER_CUBIC)
    return img + a + b + c # すべて足し合わせる

# 画像をリサイズして配置する関数
def _clip_img(img, scale=0.9):
    dst = np.zeros(img.shape)
    img = cv2.resize(img, (int(img.shape[1]*scale),
                    int(img.shape[0]*scale)))
    dst[:img.shape[0],:img.shape[1]] = img
    return dst

# 文字画像を生成して返すクラス
class MyDataset(object):
    def __init__(self):
        pass

    def __getitem__(self, idx):
        # 画像を作る
        _img, _heat, _sent, _offw, _offh = make_image()
        # 歪みを入れる
        y_wave = tmp_wave[np.random.randint(len(tmp_wave))]
        x_wave = tmp_wave[np.random.randint(len(tmp_wave))]
        img = _img.copy()
        heat = _heat.copy()
        sent = _sent.copy()
        offw = _offw.copy()
        offh = _offh.copy()
        for x in range(IMAGE_SIZE):
            img[:,x] = _img[y_wave[x,:],x_wave[:,x]]
        y_wave = cv2.resize(y_wave, (IMAGE_SIZE//2, IMAGE_SIZE//2),
                            interpolation=cv2.INTER_NEAREST)
        x_wave = cv2.resize(x_wave, (IMAGE_SIZE//2, IMAGE_SIZE//2),
                            interpolation=cv2.INTER_NEAREST)
        for x in range(IMAGE_SIZE//2):
            heat[:,x] = _heat[y_wave[x,:]//2,x_wave[:,x]//2]
            sent[:,x] = _sent[y_wave[x,:]//2,x_wave[:,x]//2]
            offw[:,x] = _offw[y_wave[x,:]//2,x_wave[:,x]//2]
            offh[:,x] = _offh[y_wave[x,:]//2,x_wave[:,x]//2]
        # ランダムなノイズを入れる
        img = _rnd_noize_img(img)
        # ランダムにリサイズしてクリップ
        scale = 0.9 + np.random.random() * 0.1
        img = _clip_img(img, scale)
        heat = _clip_img(heat, scale)
        sent = _clip_img(sent, scale)
        offw = _clip_img(offw, scale)
        offh = _clip_img(offh, scale)
        # チャンネルが先に来る形にする
        img = img.reshape((1,img.shape[0],img.shape[1]))
        heat = heat.reshape((1,heat.shape[0],heat.shape[1]))
        sent = sent.reshape((1,sent.shape[0],sent.shape[1]))
        # サイズは2チャンネルなので積み重ねる
        of_size = np.stack([offw,offh]).astype(np.float32)
        # 0~1の範囲にする
        img = np.clip(img / 255, 0.0, 1.0)
        heat = np.clip(heat, 0.0, 1.0)
        sent = np.clip(sent, 0.0, 1.0)
        of_size = np.clip(of_size, 0.01, 0.1)  # 5~50px
        # PyTorchのTensorにする
        img = torch.tensor(img, dtype=torch.float32)
        heat = torch.tensor(heat, dtype=torch.float32)
        sent = torch.tensor(sent, dtype=torch.float32)
        of_size = torch.tensor(of_size, dtype=torch.float32)
        # Tensorを返す
        return img, heat, sent, of_size

    def __len__(self):
        return NUM_ITERATIONS # 学習回数

def main():
    # OCRプログラムに合わせたクラス数でモデルを作る 
    model = get_detectionnet()
    model.to(USE_DEVICE) # GPUを使うときはGPUメモリに乗せる
    model.train() # モデルを学習用に設定する

    # 学習の準備
    optimizer = torch.optim.Adam(model.parameters())
    # 損失関数
    mseloss = nn.MSELoss()
    def loss(v, hm_wd, hm_sent, of_size):
        r = mseloss(v['hm_wd'], hm_wd)
        r += mseloss(v['hm_sent'], hm_sent)
        r += mseloss(v['of_size'], of_size)
        return r

    # データセットの作成
    dataset = MyDataset()
    # 別スレッドでデータを読み込む
    data_loader = torch.utils.data.DataLoader(
        dataset, batch_size=BATCH_SIZE,
        shuffle=True, num_workers=NUM_WORKERS)

    # 学習ループ
    for X, y_wd, y_sent, of_size in tqdm(data_loader):
        X = X.to(USE_DEVICE) # GPUを使うときはGPUメモリ上に乗せる
        y_wd = y_wd.to(USE_DEVICE) # GPUを使うときはGPUメモリ上に乗せる
        y_sent = y_sent.to(USE_DEVICE) # GPUを使うときはGPUメモリ上に乗せる
        of_size = of_size.to(USE_DEVICE) # GPUを使うときはGPUメモリ上に乗せる

        # ニューラルネットワークを実行
        res = model(X)
        # 合計の損失値を求める
        losses = loss(res, y_wd, y_sent, of_size)

        optimizer.zero_grad() # 一つ前の勾配をクリア
        losses.backward() # 損失値を逆伝播させる
        optimizer.step() # 新しい勾配からパラメーターを更新する

    # 最終的なモデルを保存する
    torch.save(model.state_dict(), 'chapt09-model1.pth')

if __name__ == '__main__':
    main()

↑

「文の検出/文字領域の検出」(DetectionNet) の学習 †

学習プログラムの実行~結果
・GPU (GeForce GTX 1050 Ti)

(py37) > cd ~\workspace_py37\chapter09
(py37) > python3 chapt09_1a.py 
100%|███████████████████████████████| 100000/100000 [28:34:11<00:00,  1.03s/it]

・CPU (Intel® Core™ i7-1185G7)

(py37) $ cd ~/workspace_py37/chapter09
(py37) $ python3 chapt09_1a.py 
  0%|                                   | 11/100000 [01:19<199:44:48,  7.19s/it]
    :

学習にかかった時間
機種開始日時終了日時処理時間 (h:m)

GeForce GTX 1050 Ti
Intel® Core™ i7-6700 10/17 09:30 10/18 13:05 27:35

DELL Latitude 7520
Intel® Core™ i7-1185G7 CPU --/-- --:-- --/-- --:-- --:--

機種	開始日時	終了日時	処理時間 (h:m)
GeForce GTX 1050 Ti Intel® Core™ i7-6700	10/17 09:30	10/18 13:05	27:35
DELL Latitude 7520 Intel® Core™ i7-1185G7 CPU	--/-- --:--	--/-- --:--	--:--

↑

SECTION-028 モデルの動作を確認する †

↑

Jupyter Notebook で確認 †

● 学習結果のモデルは「chapt09-model.pth」として保存されている。
●「Jupyter Notebook」を起動しブラウザからアクセスして「chapt09.ipynb」を開く。

(py37) $ cd ~/workspace_py37/chapter09
(py37) $ jupyter notebook

● [2]モデルファイルを読み込んでいるセルのファイル名を変更する。
● ノートブック全体を再実行する。

文字である確率と文である確率のヒートマップ画像

OCRプログラムが認識する文のブロック

※ すべてのブロックがそれぞれ異なったブロックへと認識されていることがわかる。

↑

横書き専用 OCR の実行 †

DetectionNet のみ入れ替えて実行

(py37) $ cd ~/workspace_py37/chapter09/OCR_Japanease
(py37) $ cp ../chapt09-model1.pth ./models/detectionnet.model 
(py37) $ python3 ocr_japanease.py --cpu ../yokogaki.png
file "../yokogaki.png" detected in 150 dpi.
[Block #0]
のテスト
文
AIAIご
ひらがな
カタカナ
[Block #1]
認識認識
[Block #2]
こば横書き
[Block #3]
日本語
[Block #4]
字文字
[Block #5]
横書き

ClassifierNet も入れ替えて実行

(py37) $ cd ~/workspace_py37/chapter09/OCR_Japanease
(py37) $ cp ../../chapter08/chapt08-model1.pth ./models/classifiernet.model 
(py37) $ python3 ocr_japanease.py --cpu ../yokogaki.png
[Block #0]
のテスト
文
AIAIご
ひらがな
カタカナ
[Block #1]
認識認識
[Block #2]
こば横書き
[Block #3]
日本語
[Block #4]
字文字
[Block #5]
横書き

※ 書籍の結果と違って認識率はもう一つ。原因を後日調査する。

検証モデルの日付

(py37) $ ls -l ~/workspace_py37/chapter09/OCR_Japanease/models
合計 188768
-rw-r--r-- 1 mizutu mizutu 96085770 10月 18 15:06 classifiernet.model
-rw-r--r-- 1 mizutu mizutu 97209519 10月 18 15:01 detectionnet.model

(py37) $ls -l ~/workspace_py37/chapter09
合計 95048
drwxrwxr-x 8 mizutu mizutu     4096 10月 18 14:48 OCR_Japanease
-rw-rw-r-- 1 mizutu mizutu 97209519 10月 18 13:05 chapt09-model1.pth
-rw-rw-r-- 1 mizutu mizutu    66617 10月 18 13:37 chapt09.ipynb
-rw-rw-r-- 1 mizutu mizutu    12235 10月 17 08:15 chapt09_1a.py
drwxrwxr-x 2 mizutu mizutu     4096 10月 11 19:55 fonts
drwxrwxr-x 3 mizutu mizutu     4096 10月 16 14:16 misc
drwxrwxr-x 2 mizutu mizutu     4096  2月 27  2021 models
drwxrwxr-x 3 mizutu mizutu     4096 10月 16 14:16 nets
-rw-rw-r-- 1 mizutu mizutu    15372  3月 26  2021 yokogaki.png

(py37) $ ls -l ~/workspace_py37/chapter08
合計 93860
drwxrwxr-x 8 mizutu mizutu     4096 10月 18 14:49 OCR_Japanease
-rw-rw-r-- 1 mizutu mizutu 96085770 10月 15 14:13 chapt08-model1.pth
-rw-rw-r-- 1 mizutu mizutu     5033 10月 11 20:05 chapt08_1a.py
drwxrwxr-x 2 mizutu mizutu     4096 10月 11 19:55 fonts
drwxrwxr-x 3 mizutu mizutu     4096 10月 12 05:51 misc
drwxrwxr-x 3 mizutu mizutu     4096 10月 12 05:51 nets

↑

更新履歴 †

2021/10/16 初版

↑

最新の20件