Tesseract

文字認識エンジン「Tesseract」 †

　実用的な AI開発に向けて、文字認識エンジン「Tesseract」(テッセラクト) を試してみる。

文字認識エンジン「Tesseract」
参考資料

※ 最終更新:2021/12/23　

環境構築 †

「PyTorch ではじめる AI開発」で使用した Anaconda プラットホームを利用する。
　あらかじめプロジェクトフォルダ「pyocr」を作成しておく。

$ conda activate py37
(py37) $ cd ~/workspace_py37/
(py37) $ mkdir pyocr

↑

必要なパッケージをインストール †

「Tesseract」をインストールする

(py37) $ conda install -c conda-forge tesseract

▼「tesseract」インストール・ログ詳細

(py37) $ conda install -c conda-forge tesseract
Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/mizutu/anaconda3/envs/py37

  added / updated specs:
    - tesseract


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2021.10.8  |       ha878542_0         139 KB  conda-forge
    certifi-2021.10.8          |   py37h89c1867_1         145 KB  conda-forge
    giflib-5.2.1               |       h36c2ea0_2          77 KB  conda-forge
    leptonica-1.80.0           |       h950d820_0         2.6 MB  conda-forge
    libarchive-3.5.2           |       hccf745f_1         1.6 MB  conda-forge
    libwebp-1.2.0              |       h89dd481_0         493 KB
    lzo-2.10                   |    h516909a_1000         314 KB  conda-forge
    openjpeg-2.4.0             |       hb52868f_1         444 KB  conda-forge
    tesseract-4.1.1            |       h84e3e21_5       309.7 MB  conda-forge
    ------------------------------------------------------------
                                           Total:       315.5 MB

The following NEW packages will be INSTALLED:

  giflib             conda-forge/linux-64::giflib-5.2.1-h36c2ea0_2
  leptonica          conda-forge/linux-64::leptonica-1.80.0-h950d820_0
  libarchive         conda-forge/linux-64::libarchive-3.5.2-hccf745f_1
  libwebp            pkgs/main/linux-64::libwebp-1.2.0-h89dd481_0
  lzo                conda-forge/linux-64::lzo-2.10-h516909a_1000
  openjpeg           conda-forge/linux-64::openjpeg-2.4.0-hb52868f_1
  tesseract          conda-forge/linux-64::tesseract-4.1.1-h84e3e21_5

The following packages will be UPDATED:

  ca-certificates                      2021.5.30-ha878542_0 --> 2021.10.8-ha878542_0
  certifi                          2021.5.30-py37h89c1867_0 --> 2021.10.8-py37h89c1867_1


Proceed ([y]/n)? y


Downloading and Extracting Packages
lzo-2.10             | 314 KB    | ##################################### | 100% 
certifi-2021.10.8    | 145 KB    | ##################################### | 100% 
tesseract-4.1.1      | 309.7 MB  | ##################################### | 100% 
openjpeg-2.4.0       | 444 KB    | ##################################### | 100% 
ca-certificates-2021 | 139 KB    | ##################################### | 100% 
libarchive-3.5.2     | 1.6 MB    | ##################################### | 100% 
libwebp-1.2.0        | 493 KB    | ##################################### | 100% 
giflib-5.2.1         | 77 KB     | ##################################### | 100% 
leptonica-1.80.0     | 2.6 MB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done

「PyOCR」をインストールする

$ conda activate py37
(py37) $ pip install pyocr

▼「PyOCR」インストール・ログ詳細

(py37) $ pip install pyocr
Collecting pyocr
  Downloading pyocr-0.8.tar.gz (65 kB)
     |████████████████████████████████| 65 kB 624 kB/s 
Requirement already satisfied: Pillow in /home/mizutu/anaconda3/envs/py37/lib/python3.7/site-packages (from pyocr) (8.3.1)
Building wheels for collected packages: pyocr
  Building wheel for pyocr (setup.py) ... done
  Created wheel for pyocr: filename=pyocr-0.8-py3-none-any.whl size=36928 sha256=41abedb2760bb571cf0094ad3fab6644a77ccf84483905f92e19502fc74b7107
  Stored in directory: /home/mizutu/.cache/pip/wheels/ad/ca/be/7bf9a562ca9fd00f1097ad0a952c4f0b2584f1e046588ff192
Successfully built pyocr
Installing collected packages: pyocr
Successfully installed pyocr-0.8

↑

環境構築と対応言語の確認 †

テストソフトの作成

(py37) $ cd ~/workspace_py37/pyocr
(py37) $ vi initialization.py

▼「initialization.py」ソース・コード

from PIL import Image
import sys

import pyocr
import pyocr.builders

tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)
# The tools are returned in the recommended order of usage
tool = tools[0]
print("Will use tool '%s'" % (tool.get_name()))
# Ex: Will use tool 'libtesseract'

langs = tool.get_available_languages()
print("Available languages: %s" % ", ".join(langs))
lang = langs[0]
print("Will use lang '%s'" % (lang))
# Ex: Will use lang 'fra'
# Note that languages are NOT sorted in any way. Please refer
# to the system locale settings for the default language
# to use.

実行結果

(py37) $ cd ~/workspace_py37/pyocr/
(py37) $ python3 initialization.py
Will use tool 'Tesseract (sh)'
Available languages: afr, amh, ara, asm, aze, aze_cyrl, bel, ben, bod, bos, bre, bul, cat, ceb, ces, chi_sim, chi_sim_vert, chi_tra, chi_tra_vert, chr, cos, cym, dan, deu, div, dzo, ell, eng, enm, epo, est, eus, fao, fas, fil, fin, fra, frk, frm, fry, gla, gle, glg, grc, guj, hat, heb, hin, hrv, hun, hye, iku, ind, isl, ita, ita_old, jav, jpn, jpn_vert, kan, kat, kat_old, kaz, khm, kir, kmr, kor, kor_vert, lao, lat, lav, lit, ltz, mal, mar, mkd, mlt, mon, mri, msa, mya, nep, nld, nor, oci, ori, osd, pan, pol, por, pus, que, ron, rus, san, script/Arabic, script/Armenian, script/Bengali, script/Canadian_Aboriginal, script/Cherokee, script/Cyrillic, script/Devanagari, script/Ethiopic, script/Fraktur, script/Georgian, script/Greek, script/Gujarati, script/Gurmukhi, script/HanS, script/HanS_vert, script/HanT, script/HanT_vert, script/Hangul, script/Hangul_vert, script/Hebrew, script/Japanese, script/Japanese_vert, script/Kannada, script/Khmer, script/Lao, script/Latin, script/Malayalam, script/Myanmar, script/Oriya, script/Sinhala, script/Syriac, script/Tamil, script/Telugu, script/Thaana, script/Thai, script/Tibetan, script/Vietnamese, sin, slk, slv, snd, spa, spa_old, sqi, srp, srp_latn, sun, swa, swe, syr, tam, tat, tel, tgk, tha, tir, ton, tur, uig, ukr, urd, uzb, uzb_cyrl, vie, yid, yor
Will use lang 'afr'

※「jpn」があるので日本語対応が確認できる。

↑

OCR としての簡単なテスト †

「ocrtest.py」の作成 → ここから引用

from PIL import Image
import sys

import pyocr
import pyocr.builders

tools = pyocr.get_available_tools()
tool = tools[0]
txt = tool.image_to_string(
    Image.open('test.jpg'),
    lang="jpn",
    builder=pyocr.builders.TextBuilder(tesseract_layout=6)
)

print(txt)

実行結果 (右の画像ファイル読み取り)

(py37) $ python3 ocrtest.py 
Tesseract (テッセラクト)は、さまざまなオペレーティングシステム上で動作
する光学式文字認識エンジン? 。名称のTesseractとは四次元超立方体の意で
ある。Apache Licenseの下でリリースされたフリーソフトウェエアである!*ぅ

。文字認識を行うライブラリと、それを用いたコマンドラインインターフェ
イスを持つ。

もともとは198o年代にプロプライエタリソフトウェアとしてヒューレット・
パッカードが開発していたが、>oo5年にオープンソースとしてリリースさ

れ、開発は>oo6年からGoogleが後援している5 。

っoo6年、Tesseractは当時入手可能な最も正確なオープンソースOCRエンジン
の1つと見なされた3? 7 。

歴史

Tesseractエンジンは、+985年から1994年にかけて、英国ブリストルとコロラ
ド州グリーリーにあるヒューレット・パッカードラボでプロプライエタリソ
フトウェアとして開発されていた。+g96年にさらに変更が加えられて
Windowsへ移植され、1998年にCからC ++に移行した。コードの多くはCで
記述されており、部分的にC++で記述されている。それ以来、すべてのコード
は少なくともC++コンパイラでコンバイルするように変換されている* 。 次の
1o年間はほとんど変更がなかった。その後、oo5年にヒュユーレット・パッカ
ードとネバダ大学ラスベガス校 (UNLV) によってオープンソースとしてリリ
ースされた。 Tesseractの開発はsoo6年からGoogleが後援している5 。
特徴

Tesseractは、1gg5年の時点で文字認識精度が良い上位3つのOCRエンジンの
うちの一つだった9 。 TesseractはLinux、Windows、Mac OS Xで利用できる
が、開発リソースの制限にこより、WindowsとUbuntuの開発者によってのみ厳
格なテストが行われている3 3 。

バージョン>ぅまでのTesseractは、単純な+列のテキストのTIFF画像のみの入力
が可能だった。初期のバージョンにはレイアウト分析が含まれていなかったた
め、複数列のデキスト、画像、数式を入力すると、文字化けした出力が生成さ
れた。バージョン3.oo以降、Tesseractは出力テキストのフォーマット、

hOCR ? 位置情報、ページレイアウト分析に対応した。 また、Leptonicaライ
ブラリの使用により、いくつかの新しい画像形式に対応した。 Tesseractで
は、テキストが等幅かプロポーショナルかを検出するごとができる3? 。

↑

Webカメラ入力の簡単なテスト †

「ocrtest_cam.py」の作成 → ここから引用

from PIL import Image
import cv2
import sys
import pyocr
import pyocr.builders
import time

tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)
tool = tools[0]

langs = tool.get_available_languages()
lang = langs[0]
capture = cv2.VideoCapture(0)
last_txt = ""
while True:
    ret, frame = capture.read()
    orgHeight, orgWidth = frame.shape[:2]
    size = (int(orgWidth/4), int(orgHeight/4))
    glay = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    image = cv2.resize(glay, size)
    t = time.time()
    txt = tool.image_to_string(
        Image.fromarray(image),
        lang="jpn",
        builder=pyocr.builders.TextBuilder(tesseract_layout=6)
    )
    #print(time.time() - t)
    if len(txt) != 0 and txt != last_txt:
        last_txt = txt
        print( txt )

    cv2.imshow("Capture", image)
       
    if cv2.waitKey(33) >= 0:
        break

cv2.destroyAllWindows()

実行結果 (連続出力)

(py37) $ python3 ocrtest_cam.py 
    :
〒明  Interfa(
ョーーュり
表記
ー机  Interface
時
es
    :

↑

OCR テストプログラム †

　　Python 用の光学式文字認識 (OCR) ツールラッパー PyOCR のドキュメントを参考にテストプログラムを作成する。
　デフォールトのテスト画像は「test1.png」→
　できるだけコマンド入力でパラメータを設定できるようにする。

「tesseract」で利用可能な言語
afr, amh, ara, asm, aze, aze_cyrl, bel, ben, bod, bos, bre, bul, cat, ceb, ces, chi_sim, chi_sim_vert, chi_tra, chi_tra_vert, chr, cos, cym, dan, deu, div, dzo, ell, eng, enm, epo, est, eus, fao, fas, fil, fin, fra, frk, frm, fry, gla, gle, glg, grc, guj, hat, heb, hin, hrv, hun, hye, iku, ind, isl, ita, ita_old, jav, jpn, jpn_vert, kan, kat, kat_old, kaz, khm, kir, kmr, kor, kor_vert, lao, lat, lav, lit, ltz, mal, mar, mkd, mlt, mon, mri, msa, mya, nep, nld, nor, oci, ori, osd, pan, pol, por, pus, que, ron, rus, san, script/Arabic, script/Armenian, script/Bengali, script/Canadian_Aboriginal, script/Cherokee, script/Cyrillic, script/Devanagari, script/Ethiopic, script/Fraktur, script/Georgian, script/Greek, script/Gujarati, script/Gurmukhi, script/HanS, script/HanS_vert, script/HanT, script/HanT_vert, script/Hangul, script/Hangul_vert, script/Hebrew, script/Japanese, script/Japanese_vert, script/Kannada, script/Khmer, script/Lao, script/Latin, script/Malayalam, script/Myanmar, script/Oriya, script/Sinhala, script/Syriac, script/Tamil, script/Telugu, script/Thaana, script/Thai, script/Tibetan, script/Vietnamese, sin, slk, slv, snd, spa, spa_old, sqi, srp, srp_latn, sun, swa, swe, syr, tam, tat, tel, tgk, tha, tir, ton, tur, uig, ukr, urd, uzb, uzb_cyrl, vie, yid, yor

「tesseract」のレイアウト指定 (Page segmentation modes)

モード	意味 (機械翻訳)
0	Orientation and script detection (OSD) only. (方向およびスクリプト検出（OSD）のみ)
1	Automatic page segmentation with OSD. (OSDによる自動ページセグメンテーション)
2	Automatic page segmentation, but no OSD, or OCR. (not implemented) (自動ページセグメンテーション OSDまたはOCRは実装されていない)
3	Fully automatic page segmentation, but no OSD. (Default) (完全自動のページセグメンテーション OSDなし(初期値))
4	Assume a single column of text of variable sizes. (単一カラムの様々なサイズのテキストとみなす)
5	Assume a single uniform block of vertically aligned text. (垂直方向に整列した単一カラムの均一ブロックテキストとみなす)
6	Assume a single uniform block of text. (単一カラムの均一ブロックテキストとみなす)
7	Treat the image as a single text line. (画像を単一のテキスト行として扱う)
8	Treat the image as a single word. (画像を1つの単語として扱う)
9	Treat the image as a single word in a circle. (画像を円の中の1つの単語として扱う)
10	Treat the image as a single character. (画像を1文字として扱う)
11	Sparse text. Find as much text as possible in no particular order. (Sparse text: 不特定の順序でできるだけ多くのテキストを探す)
12	Sparse text with OSD. (Sparse text: OSDあり)
13	Raw line. Treat the image as a single text line, bypassing hacks that are Tesseract-specific. (Raw line: 内部の処理をバイパスしつつ画像内にテキストが1行だけあるものとして扱う)

↑

Step 1 イメージ画像からテキストを認識する「ocrtest1.py」 †

入力画像から文字列を得る API を使う
・戻り値は、画像に含まれる Python 文字列
```
 txt = tool.image_to_string(Image.open('test1.png'), lang='jpg', builder=pyocr.builders.TextBuilder())
```

ソースコード

▼「ocrtest1.py」

# -*- coding: utf-8 -*-
##------------------------------------------
## OCR Test-1 Text recognition
##   with tesseract & PyOCR
##
##               2021.11.15 Masahiro Izutsu
##------------------------------------------
## ocrtest1.py

# Color Escape Code
GREEN = '\033[1;32m'
RED = '\033[1;31m'
NOCOLOR = '\033[0m'
YELLOW = '\033[1;33m'

# 定数定義
from os.path import expanduser
DEF_INPUT_FILE = expanduser('test1.png')

# import処理
from PIL import Image
import sys

import pyocr
import pyocr.builders
import cv2
import argparse

# タイトル・バージョン情報
title = 'OCR Test-1 Text recognition'
print(GREEN)
print('--- {} ---'.format(title))
print(' OpenCV version {} '.format(cv2.__version__))
print(NOCOLOR)

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = DEF_INPUT_FILE,
            help = 'Absolute path to image file. Default value is \'' + DEF_INPUT_FILE + '\'')
    parser.add_argument('-l', '--language', metavar = 'LANGUAGE',
            default = 'jpn',
            help = 'Language. Default value is \'jpn\'')
    parser.add_argument('--layout', metavar = 'LAYOUT',
            default = 6,
            help = 'tesseract layout Default value is 6')
    return parser

# モデル基本情報の表示
def display_info(image, lang, layout):
    print(YELLOW + title + ': Starting application...' + NOCOLOR)
    print('   - ' + YELLOW + 'Image File   : ' + NOCOLOR, image)
    print('   - ' + YELLOW + 'Language     : ' + NOCOLOR, lang)
    print('   - ' + YELLOW + 'Layout       : ' + NOCOLOR, layout)

# ** main関数 **
def main():
    # Argument parsing and parameter setting
    ARGS = parse_args().parse_args()
    input_stream = ARGS.image
    lang = ARGS.language
    layout = int(ARGS.layout)

    # 情報表示
    display_info(input_stream, lang, layout)

    # OCR
    tools = pyocr.get_available_tools()
    if len(tools) == 0:
        print(RED + "\nOCR tool Not found." + NOCOLOR)
        quit()
    tool = tools[0]

    # txt is a Python string
    txt = tool.image_to_string(Image.open(input_stream), 
                    lang=lang,
                    builder=pyocr.builders.TextBuilder(tesseract_layout=layout))
    # 取得テキスト
    print('\n---------------------------')
    print(txt)
    print('---------------------------\n')

# main関数エントリーポイント(実行開始)
if __name__ == "__main__":
    sys.exit(main())

実行時に利用できるコマンドオプション

コマンドオプション	デフォールト設定	意味
-h, --help		ヘルプ表示
-i, --image	test1.png	入力画像ファイル
-l, --language	jpn	言語
--layout	6	tesseractレイアウト(0-13)

▼(py37) $ python3 ocrtest1.py -h

(py37) $ python3 ocrtest1.py -h

--- OCR Test-1 Text recognition ---
 OpenCV version 4.5.2 

usage: ocrtest1.py [-h] [-i IMAGE_FILE] [-l LANGUAGE] [--layout LAYOUT]

optional arguments:
  -h, --help            show this help message and exit
  -i IMAGE_FILE, --image IMAGE_FILE
                        Absolute path to image file. Default value is
                        'test1.png'
  -l LANGUAGE, --language LANGUAGE
                        Language. Default value is 'jpn'
  --layout LAYOUT       tesseract layout Default value is 6

実行結果

(py37) cd ~/workspace_py37/pyocr/
(py37) $ python3 ocrtest1.py

--- OCR Test-1 Text recognition ---
 OpenCV version 4.5.2 

OCR Test-1 Text recognition: Starting application...
   - Image File   :  test1.png
   - Language     :  jpn
   - Layout       :  6

---------------------------
最新情報

・画像認識 (Image Recognition) とは
・物体検出アルゴリズム「YOLO V5」
・顔認証 (Pace recognition) 概要

・敵対的生成ネットワーク(GAN)
---------------------------

・明朝体のサンプル

(py37) $ python3 ocrtest1.py -i test-m.png

--- OCR Test-1 Text recognition ---
 OpenCV version 4.5.2 

OCR Test-1 Text recognition: Starting application...
   - Image File   :  test-m.png
   - Language     :  jpn
   - Layout       :  6

---------------------------
・ ディープラーニングは、』l (人工知能) 実現方法のーつである機械学習分類される方式である。

・ ディープラーニングは、入が一つひとつルールを実装するのではなく、生物の脳の一部機能を模擬したニューラルネット
ワークに基本的な学習アルゴリズムを実装し、適切なデータを与えることで、コンピュータが-データの特徴を自動で学
習・見つけ出し、徒来手法以上に高性能な識別中処理などができるようになるものである。

・ 学習フェーズ

現在、ディープラーニングを活用して、画像処理・音声処理・言語処理・予測処理などが従来手法よりも大きな成果を
上げているが、現時点では学習に必要なアルゴリズムやプラットホームは発展途上にあり、用途による向き不向きや、使
い勝手の良しあし、精度と処理速度などが大きく異なる。ディープラーニングで学習させるためには大きなマシンパワー
が必要であり、実装は容易ではない。

・ 推論フェーズ

一部の学習結果について、オープンソース・モデルとして、事前学習済みモデルと呼ばれるディープラーニング学習済
みの結果が分開されている。この学習済みモデルを利用すると、カメラ画像やビデオデータ、写真データなどに対して比
較的容易に画像認識やや音声説識などを行うことが可能になる。
---------------------------

・ゴシック体のサンプル

(py37) $ python3 ocrtest1.py -i test-g.png

--- OCR Test-1 Text recognition ---
 OpenCV version 4.5.2 

OCR Test-1 Text recognition: Starting application...
   - Image File   :  test-g.png
   - Language     :  jpn
   - Layout       :  6

---------------------------
・ ディーブプラーニングは、Al (人工知能) 実現方法のーつである機械学習分類される方式である。

・ ディーブプラーニングは、入が一つづひとつルールを実装するのではなく、生物の脳の一部機能を模擬したニューラルネット
ワークに基本的な学習アルゴリズムを実装し、適切なデータ を与えることで、コンピュータが-データの特徴を自動で学
習・見つけ出し、徒来手法以上に高性能な識別や処理などができるようになるものである。

・ 学習フェーズ

現在、ディーブラーニングを活用して、画像処理・音声処理・言語処理・予測処理などが従来手法よりも大きな成果を
上げているが、現時点では学習に必要おアルゴリズムやプラットホームは発展途上にあり、用才による向き不向きや、使
い勝手の良しあし、精度と処理速度おどが大きく異なる。ディープラーニングで学習させるためには大きなマシンパロワー
が必要であり、実装は容易ではない。

・ 推論フェーズ

一部の学習結果について、オーブンソース・モデルとして、事前学習済みモデルと呼ばれるディープラーニング学百済
みの結果が公開されている。この学習済みモデルを利用すると、カメラ画像ビデオデータ、写真データなどに対して比
較的容易に画像認識や音声向識などを行うことが可能になる。
---------------------------

↑

Step 2 イメージ画像から文字と文字の位置を認識する「ocrtest2.py」 †

入力画像から文字と文字の位置を得る API を使う
・戻り値は、領域に含まれる文字(box.content)と文字領域(box.position)のリスト・オブジェクト
```
 word_boxes = tool.image_to_string(Image.open('test1.png'), lang="jpg", builder=pyocr.builders.WordBoxBuilder())
```

ソースコード

▼「ocrtest2.py」

# -*- coding: utf-8 -*-
##------------------------------------------
## OCR Test-2 Text list of box objects
##   with tesseract & PyOCR
##
##               2021.11.15 Masahiro Izutsu
##------------------------------------------
## ocrtest2.py

# Color Escape Code
GREEN = '\033[1;32m'
RED = '\033[1;31m'
NOCOLOR = '\033[0m'
YELLOW = '\033[1;33m'

# 定数定義
from os.path import expanduser
DEF_INPUT_FILE = expanduser('test1.png')

# import処理
from PIL import Image
import sys

import pyocr
import pyocr.builders
import cv2
import argparse

# タイトル・バージョン情報
title = 'OCR Test-2 list of box objects'
print(GREEN)
print('--- {} ---'.format(title))
print(' OpenCV version {} '.format(cv2.__version__))
print(NOCOLOR)

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = DEF_INPUT_FILE,
            help = 'Absolute path to image file. Default value is \'' + DEF_INPUT_FILE + '\'')
    parser.add_argument('-l', '--language', metavar = 'LANGUAGE',
            default = 'jpn',
            help = 'Language. Default value is \'jpn\'')
    parser.add_argument('--layout', metavar = 'LAYOUT',
            default = 6,
            help = 'tesseract layout Default value is 6')
    return parser

# モデル基本情報の表示
def display_info(image, lang, layout):
    print(YELLOW + title + ': Starting application...' + NOCOLOR)
    print('   - ' + YELLOW + 'Image File   : ' + NOCOLOR, image)
    print('   - ' + YELLOW + 'Language     : ' + NOCOLOR, lang)
    print('   - ' + YELLOW + 'Layout       : ' + NOCOLOR, layout)

# ** main関数 **
def main():
    # Argument parsing and parameter setting
    ARGS = parse_args().parse_args()
    input_stream = ARGS.image
    lang = ARGS.language
    layout = int(ARGS.layout)

    # 情報表示
    display_info(input_stream, lang, layout)

    # OCR
    tools = pyocr.get_available_tools()
    if len(tools) == 0:
        print(RED + "\nOCR tool Not found." + NOCOLOR)
        quit()
    tool = tools[0]

    # list of box objects. For each box object:
    #   box.content is the word in the box
    #   box.position is its position on the page (in pixels)
    #
    # Beware that some OCR tools (Tesseract for instance)
    # may return empty boxes
    # (訳)
    # ボックスオブジェクトのリスト ボックスオブジェクトごとに:
    #
    # box.contentはボックス内の単語
    # box.positionは、ページ上の位置（ピクセル単位
    # 一部のOCRツール（Tesseractなど）に注意
    # 空のボックスを返す場合がある

    word_boxes = tool.image_to_string(Image.open(input_stream),
                        lang=lang,
                        builder=pyocr.builders.WordBoxBuilder(tesseract_layout=layout))
    
    # 取得テキスト
    print('\n---------------------------')
    for box in word_boxes:
        print(box.content)
        print(box.position)

    print('---------------------------\n')

# main関数エントリーポイント(実行開始)
if __name__ == "__main__":
    sys.exit(main())

実行時に利用できるコマンドオプション

コマンドオプション	デフォールト設定	意味
-h, --help		ヘルプ表示
-i, --image	test1.png	入力画像ファイル
-l, --language	jpn	言語
--layout	6	tesseractレイアウト(0-13)

▼(py37) $ python3 ocrtest2.py -h

(py37) $ python3 ocrtest2.py -h

--- OCR Test-2 list of box objects ---
 OpenCV version 4.5.2 

usage: ocrtest2.py [-h] [-i IMAGE_FILE] [-l LANGUAGE] [--layout LAYOUT]

optional arguments:
  -h, --help            show this help message and exit
  -i IMAGE_FILE, --image IMAGE_FILE
                        Absolute path to image file. Default value is
                        'test1.png'
  -l LANGUAGE, --language LANGUAGE
                        Language. Default value is 'jpn'
  --layout LAYOUT       tesseract layout Default value is 6

実行結果

(py37) cd ~/workspace_py37/pyocr/
(py37) $ python3 ocrtest2.py

--- OCR Test-2 list of box objects ---
 OpenCV version 4.5.2 

OCR Test-2 list of box objects: Starting application...
   - Image File   :  test1.png
   - Language     :  jpn
   - Layout       :  6

---------------------------
最新
((67, 96), (128, 125))
情報
((131, 96), (191, 125))
・
((78, 156), (84, 161))
画
((99, 145), (158, 173))
像
((163, 146), (190, 173))
認識
((194, 138), (235, 180))
(Image
((242, 144), (341, 176))
Recognition)
((352, 144), (535, 176))
と
((561, 145), (577, 171))
は
((589, 147), (615, 171))
・
((78, 204), (84, 209))
物
((99, 192), (157, 220))
体
((163, 193), (191, 221))
検出
((197, 193), (262, 220))
アル
((262, 196), (317, 215))
ゴリ
((316, 194), (343, 218))
ズム
((360, 193), (413, 216))
「YOLO
((439, 192), (541, 218))
V5」
((550, 195), (601, 221))
・
((78, 252), (84, 257))
顔
((98, 241), (156, 269))
認証
((163, 236), (208, 277))
(Pace
((210, 240), (286, 268))
recognition)
((297, 240), (471, 272))
概要
((490, 240), (551, 268))
・
((78, 289), (114, 316))
敵
((115, 288), (155, 316))
対
((156, 288), (191, 316))
的
((196, 289), (221, 315))
生成
((227, 289), (285, 316))
ネッ
((297, 299), (337, 315))
トワ
((340, 292), (380, 313))
ー
((388, 301), (414, 303))
ク
((423, 291), (440, 313))
(GAN)
((421, 288), (543, 316))
---------------------------

↑

Step 3 イメージ画像から文章と文章の位置と信頼性スコアを取得する「ocrtest3.py」 †

入力画像から文章と文章の位置と信頼性スコアを得る API を使う
・戻り値は、行内の個々の単語のリスト(line.word_boxes)と行の全文(line.content)と行全体の領域(line.position)のリスト・オブジェクト
・line.word_boxes オブジェクト内には信頼性スコア(confidence)を含む。
・領域のスケールはピクセル単位。
```
 line_and_word_boxes = tool.image_to_string(Image.open('test1.png'), lang="jpg", builder=pyocr.builders.LineBoxBuilder())
```

ソースコード

▼「ocrtest3.py」

# -*- coding: utf-8 -*-
##------------------------------------------
## OCR Test-3 Text list of line objects
##   with tesseract & PyOCR
##
##               2021.11.15 Masahiro Izutsu
##------------------------------------------
## ocrtest3.py

# Color Escape Code
GREEN = '\033[1;32m'
RED = '\033[1;31m'
NOCOLOR = '\033[0m'
YELLOW = '\033[1;33m'

# 定数定義
from os.path import expanduser
DEF_INPUT_FILE = expanduser('test1.png')

# import処理
from PIL import Image
import sys

import pyocr
import pyocr.builders
import cv2
import argparse

# タイトル・バージョン情報
title = 'OCR Test-3 list of line objects'
print(GREEN)
print('--- {} ---'.format(title))
print(' OpenCV version {} '.format(cv2.__version__))
print(NOCOLOR)

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = DEF_INPUT_FILE,
            help = 'Absolute path to image file. Default value is \'' + DEF_INPUT_FILE + '\'')
    parser.add_argument('-l', '--language', metavar = 'LANGUAGE',
            default = 'jpn',
            help = 'Language. Default value is \'jpn\'')
    parser.add_argument('-c', '--confidence', metavar = 'CONFIDENCE',
            default = 70,
            help = 'Confidence Level Default value is 70')
    parser.add_argument('--layout', metavar = 'LAYOUT',
            default = 6,
            help = 'tesseract layout Default value is 6')
    return parser

# モデル基本情報の表示
def display_info(image, lang, conf, layout):
    print(YELLOW + title + ': Starting application...' + NOCOLOR)
    print('   - ' + YELLOW + 'Image File   : ' + NOCOLOR, image)
    print('   - ' + YELLOW + 'Language     : ' + NOCOLOR, lang)
    print('   - ' + YELLOW + 'Confidence   : ' + NOCOLOR, conf)
    print('   - ' + YELLOW + 'Layout       : ' + NOCOLOR, layout)

# ** main関数 **
def main():
    # Argument parsing and parameter setting
    ARGS = parse_args().parse_args()
    input_stream = ARGS.image
    lang = ARGS.language
    conf = int(ARGS.confidence)
    layout = int(ARGS.layout)

    # 情報表示
    display_info(input_stream, lang, conf, layout)

    # OCR
    tools = pyocr.get_available_tools()
    if len(tools) == 0:
        print(RED + "\nOCR tool Not found." + NOCOLOR)
        quit()
    tool = tools[0]

    # list of line objects. For each line object:
    #   line.word_boxes is a list of word boxes (the individual words in the line)
    #   line.content is the whole text of the line
    #   line.position is the position of the whole line on the page (in pixels)
    #
    # Each word box object has an attribute 'confidence' giving the confidence
    # score provided by the OCR tool. Confidence score depends entirely on
    # the OCR tool. Only supported with Tesseract and Libtesseract (always 0
    # with Cuneiform).
    #
    # Beware that some OCR tools (Tesseract for instance) may return boxes
    # with an empty content.
    # (訳)
    # ラインオブジェクトのリスト 各ラインオブジェクトの場合:
    # line.word_boxesは、単語ボックス（行内の個々の単語）のリスト
    # line.contentは行の全文
    # line.positionは、ページ上の行全体の位置（ピクセル単位）
    #
    # 各ワードボックスオブジェクトには、信頼性を与える属性「confidence」がある
    # OCRツールによって提供されるスコア。
    #
    # 一部のOCRツール（Tesseractなど）が空のボックスを返す場合があることに注意

    # OpenCV でイメージを読む
    frame = cv2.imread(input_stream)
    if frame is None:
        print(RED + "\nUnable to read the input." + NOCOLOR)
        quit()

    # PILのイメージにする
    frame1 = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
    img = Image.fromarray(frame1)

    line_and_word_boxes = tool.image_to_string(img, lang=lang,
                            builder=pyocr.builders.LineBoxBuilder(tesseract_layout=layout))

    # 取得データ
    print('\n---------------------------')
    for lw_box in line_and_word_boxes:
        content = lw_box.content
        position = lw_box.position
        box = []
        txt = []
        n = 0
        for lw_box in lw_box.word_boxes:
            txt.append(lw_box.content)
            box.append(lw_box.position)
            n = n+1
        confidence = lw_box.confidence

        if confidence > conf:
            print('contents: ', content)
            print('position: ', position)
            print('confidence: ', confidence)

            for nm in range(n):
                print('　 {:　<8}'.format(txt[nm]), '　', box[nm])
            print('\n')

    print('---------------------------\n')

# main関数エントリーポイント(実行開始)
if __name__ == "__main__":
    sys.exit(main())

実行時に利用できるコマンドオプション

コマンドオプション	デフォールト設定	意味
-h, --help		ヘルプ表示
-i, --image	test1.png	入力画像ファイル
-l, --language	jpn	言語
-c, --confidence	70	有効とする信頼性スコア値
--layout	6	tesseractレイアウト(0-13)

▼(py37) $ python3 ocrtest3.py -h

(py37) $ python3 ocrtest3.py -h

--- OCR Test-3 list of line objects ---
 OpenCV version 4.5.2 

usage: ocrtest3.py [-h] [-i IMAGE_FILE] [-l LANGUAGE] [-c CONFIDENCE]
                   [--layout LAYOUT]

optional arguments:
  -h, --help            show this help message and exit
  -i IMAGE_FILE, --image IMAGE_FILE
                        Absolute path to image file. Default value is
                        'test1.png'
  -l LANGUAGE, --language LANGUAGE
                        Language. Default value is 'jpn'
  -c CONFIDENCE, --confidence CONFIDENCE
                        Confidence Level Default value is 70
  --layout LAYOUT       tesseract layout Default value is 6

実行結果

(py37) cd ~/workspace_py37/pyocr/
(py37) $ python3 ocrtest3.py

--- OCR Test-3 list of line objects ---
 OpenCV version 4.5.2 

OCR Test-3 list of line objects: Starting application...
   - Image File   :  test1.png
   - Language     :  jpn
   - Confidence   :  70
   - Layout       :  6

---------------------------
contents:  最新 情報
position:  ((67, 96), (191, 125))
confidence:  96
　 最新　　　　　　 　 ((67, 96), (128, 125))
　 情報　　　　　　 　 ((131, 96), (191, 125))


contents:  ・ 画 像 認識 (Image Recognition) と は
position:  ((78, 144), (615, 176))
confidence:  96
　 ・　　　　　　　 　 ((78, 156), (84, 161))
　 画　　　　　　　 　 ((99, 145), (158, 173))
　 像　　　　　　　 　 ((163, 146), (190, 173))
　 認識　　　　　　 　 ((194, 138), (235, 180))
　 (Image　　 　 ((242, 144), (341, 176))
　 Recognition) 　 ((352, 144), (535, 176))
　 と　　　　　　　 　 ((561, 145), (577, 171))
　 は　　　　　　　 　 ((589, 147), (615, 171))


contents:  ・ 物 体 検出 アル ゴリ ズム 「YOLO V5」
position:  ((78, 192), (601, 221))
confidence:  74
　 ・　　　　　　　 　 ((78, 204), (84, 209))
　 物　　　　　　　 　 ((99, 192), (157, 220))
　 体　　　　　　　 　 ((163, 193), (191, 221))
　 検出　　　　　　 　 ((197, 193), (262, 220))
　 アル　　　　　　 　 ((262, 196), (317, 215))
　 ゴリ　　　　　　 　 ((316, 194), (343, 218))
　 ズム　　　　　　 　 ((360, 193), (413, 216))
　 「YOLO　　　 　 ((439, 192), (541, 218))
　 V5」　　　　　 　 ((550, 195), (601, 221))


contents:  ・ 敵 対 的 生成 ネッ トワ ー ク (GAN)
position:  ((78, 288), (543, 316))
confidence:  92
　 ・　　　　　　　 　 ((78, 289), (114, 316))
　 敵　　　　　　　 　 ((115, 288), (155, 316))
　 対　　　　　　　 　 ((156, 288), (191, 316))
　 的　　　　　　　 　 ((196, 289), (221, 315))
　 生成　　　　　　 　 ((227, 289), (285, 316))
　 ネッ　　　　　　 　 ((297, 299), (337, 315))
　 トワ　　　　　　 　 ((340, 292), (380, 313))
　 ー　　　　　　　 　 ((388, 301), (414, 303))
　 ク　　　　　　　 　 ((423, 291), (440, 313))
　 (GAN)　　　 　 ((421, 288), (543, 316))


---------------------------

・4行目が検出できないので信頼性レベルを下げて実行

(py37) $ python3 ocrtest3.py -c 10

--- OCR Test-3 list of line objects ---
 OpenCV version 4.5.2 

OCR Test-3 list of line objects: Starting application...
   - Image File   :  test1.png
   - Language     :  jpn
   - Confidence   :  10
   - Layout       :  6

---------------------------
contents:  最新 情報
position:  ((67, 96), (191, 125))
confidence:  96
　 最新　　　　　　 　 ((67, 96), (128, 125))
　 情報　　　　　　 　 ((131, 96), (191, 125))


contents:  ・ 画 像 認識 (Image Recognition) と は
position:  ((78, 144), (615, 176))
confidence:  96
　 ・　　　　　　　 　 ((78, 156), (84, 161))
　 画　　　　　　　 　 ((99, 145), (158, 173))
　 像　　　　　　　 　 ((163, 146), (190, 173))
　 認識　　　　　　 　 ((194, 138), (235, 180))
　 (Image　　 　 ((242, 144), (341, 176))
　 Recognition) 　 ((352, 144), (535, 176))
　 と　　　　　　　 　 ((561, 145), (577, 171))
　 は　　　　　　　 　 ((589, 147), (615, 171))


contents:  ・ 物 体 検出 アル ゴリ ズム 「YOLO V5」
position:  ((78, 192), (601, 221))
confidence:  74
　 ・　　　　　　　 　 ((78, 204), (84, 209))
　 物　　　　　　　 　 ((99, 192), (157, 220))
　 体　　　　　　　 　 ((163, 193), (191, 221))
　 検出　　　　　　 　 ((197, 193), (262, 220))
　 アル　　　　　　 　 ((262, 196), (317, 215))
　 ゴリ　　　　　　 　 ((316, 194), (343, 218))
　 ズム　　　　　　 　 ((360, 193), (413, 216))
　 「YOLO　　　 　 ((439, 192), (541, 218))
　 V5」　　　　　 　 ((550, 195), (601, 221))


contents:  ・ 顔 認証 (Pace recognition) 概要
position:  ((78, 240), (551, 272))
confidence:  20
　 ・　　　　　　　 　 ((78, 252), (84, 257))
　 顔　　　　　　　 　 ((98, 241), (156, 269))
　 認証　　　　　　 　 ((163, 236), (208, 277))
　 (Pace　　　 　 ((210, 240), (286, 268))
　 recognition) 　 ((297, 240), (471, 272))
　 概要　　　　　　 　 ((490, 240), (551, 268))


contents:  ・ 敵 対 的 生成 ネッ トワ ー ク (GAN)
position:  ((78, 288), (543, 316))
confidence:  92
　 ・　　　　　　　 　 ((78, 289), (114, 316))
　 敵　　　　　　　 　 ((115, 288), (155, 316))
　 対　　　　　　　 　 ((156, 288), (191, 316))
　 的　　　　　　　 　 ((196, 289), (221, 315))
　 生成　　　　　　 　 ((227, 289), (285, 316))
　 ネッ　　　　　　 　 ((297, 299), (337, 315))
　 トワ　　　　　　 　 ((340, 292), (380, 313))
　 ー　　　　　　　 　 ((388, 301), (414, 303))
　 ク　　　　　　　 　 ((423, 291), (440, 313))
　 (GAN)　　　 　 ((421, 288), (543, 316))


---------------------------

↑

Step 4 文章と文章の位置と信頼性スコアを表示する「ocrtest4.py」 †

文章と文章の位置と信頼性スコアを GUI 表示で可視化する。
・Step 3 の入力画像から文章と文章の位置と信頼性スコアを得る API を使う。

【追加した主な機能】
・信頼性スコアの閾値を設けて信頼性の低いものは除外する。
・入力画像の前処理として「グレイスケール変換/2値化」「罫線消去」の有無を設定できる。
・Tesseract 動作パラメータの一つである「tesseractレイアウト(0-13)」を指定できる。
　この値は入力画像の内容により認識精度に大きく影響する。デフォールト値は一般的な文章画像(帳票を含む)の反復テストにより決定した。
・入力画像のサイズの制限値(--maxsize)により値が'0'でなければ縦横このサイズ内に収まるように同比率で自動リサイズする。
・認識結果の文はコンソール出力も行うが各文字の詳細出力も可能なように設定フラグ(--log)がある。
・タイトル表示が有効であれば実行時の設定値の情報を表示する。
・GUI 出力表示結果を画像ファイルとして出力できる。

ソースコード

▼「ocrtest4.py」

# -*- coding: utf-8 -*-
##------------------------------------------
## OCR Test-4 Text list of line objects Display
##   with tesseract & PyOCR
##
##               2021.11.15 Masahiro Izutsu
##------------------------------------------
## ocrtest4.py

# Color Escape Code
GREEN = '\033[1;32m'
RED = '\033[1;31m'
NOCOLOR = '\033[0m'
YELLOW = '\033[1;33m'

# 定数定義
LINE_WORD_BOX_COLOR = (0, 0, 240)
WORD_BOX_COLOR = (255, 0, 0)
CONTENTS_COLOR = (0, 128, 0)
from os.path import expanduser
DEF_INPUT_FILE = expanduser('test1.png')

# import処理
from PIL import Image
import sys

import pyocr
import pyocr.builders
import cv2
import argparse
import myfunction
import numpy as np
import mylib_gui

# タイトル・バージョン情報
title = 'OCR Test-4 list of line objects Display'
print(GREEN)
print('--- {} ---'.format(title))
print(' OpenCV version {} '.format(cv2.__version__))
print(NOCOLOR)

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-i', '--image', metavar = 'IMAGE_FILE', type = str, default = DEF_INPUT_FILE,
            help = 'Absolute path to image file. Default value is \'' + DEF_INPUT_FILE + '\'')
    parser.add_argument('-l', '--language', metavar = 'LANGUAGE',
            default = 'jpn',
            help = 'Language. Default value is \'jpn\'')
    parser.add_argument('-c', '--confidence', metavar = 'CONFIDENCE',
            default = 40,
            help = 'Confidence Level Default value is 40')
    parser.add_argument('-p', '--prosess', metavar = 'PROCESS',
            default = 'n',
            help = 'Preprocessing flag.(y/n) Default value is \'n\'')
    parser.add_argument('-d', '--linedel', metavar = 'LINEDEL',
            default = 'n',
            help = 'Line delete flag.(y/b/n) Default value is \'n\'')
    parser.add_argument('--layout', metavar = 'LAYOUT',
            default = 6,
            help = 'Tesseract layout Default value is 6')
    parser.add_argument('--maxsize', metavar = 'MAXSIZE',
            default = 0,
            help = 'Image max size (free=0). Default value is 0')
    parser.add_argument('--log', metavar = 'LOG',
            default = 'n',
            help = 'Log output flag.(y/n) Default value is \'n\'')
    parser.add_argument('-t', '--title', metavar = 'TITLE',
            default = 'y',
            help = 'Program title flag.(y/n) Default value is \'y\'')
    parser.add_argument('-o', '--out', metavar = 'IMAGE_OUT',
            default = 'non',
            help = 'Processed image file path. Default value is \'non\'')
    return parser

# モデル基本情報の表示
def display_info(image, lang, prosess, linedel, conf, layout, maxsize, log, titleflg, outpath):
    print(YELLOW + title + ': Starting application...' + NOCOLOR)
    print('   - ' + YELLOW + 'Image File   : ' + NOCOLOR, image)
    print('   - ' + YELLOW + 'Language     : ' + NOCOLOR, lang)
    print('   - ' + YELLOW + 'Preprocessing: ' + NOCOLOR, prosess)
    print('   - ' + YELLOW + 'Line delete  : ' + NOCOLOR, linedel)
    print('   - ' + YELLOW + 'Confidence   : ' + NOCOLOR, conf)
    print('   - ' + YELLOW + 'Layout       : ' + NOCOLOR, layout)
    print('   - ' + YELLOW + 'Max size     : ' + NOCOLOR, maxsize)
    print('   - ' + YELLOW + 'Log frag     : ' + NOCOLOR, log)
    print('   - ' + YELLOW + 'Program Title: ' + NOCOLOR, titleflg)
    print('   - ' + YELLOW + 'Processed out: ' + NOCOLOR, outpath)

# 画像の前処理
def img_preproces(img):
    # グレイスケール演算
    im_gray = 0.299 * img[:,:,2] + 0.587 * img[:,:,1] + 0.114 * img[:,:,0]
    im_gray8 = np.uint8(im_gray)
    # 大津アルゴリズムでは thresh, maxvalは無視されてしきい値は自動で設定される
    ret, im_gray8 = cv2.threshold(im_gray8, thresh=0, maxval=255, type=cv2.THRESH_BINARY + cv2.THRESH_OTSU)
    # すべてのチャンネル
    img[:,:,0] = im_gray8
    img[:,:,1] = im_gray8
    img[:,:,2] = im_gray8
    return img

# 罫線消去
def delete_line(img):
    # 自動パラメータの計算
    h, w = img.shape[:2]
    thr = 100
    lln = int(w/18)
    if lln < 44:
        lln = 44
    gap = int(w/1000) + 4
    print('\nThreshhold={}, MinLineLength={}, MaxLineGap={},  width={}, height={}'.format(thr, lln, gap, w, h))

    imgw = img.copy()
    # グレースケール
    gray = cv2.cvtColor(imgw, cv2.COLOR_BGR2GRAY)

    # 2値化
    ret, gray = cv2.threshold(gray, thresh=0, maxval=255, type=cv2.THRESH_BINARY + cv2.THRESH_OTSU)

    ## 反転 ネガポジ変換
    gray = cv2.bitwise_not(gray)
    lines = cv2.HoughLinesP(gray, rho=1, theta=np.pi/360, threshold=thr, minLineLength=lln, maxLineGap=gap)

    if lines is not None:
        for line in lines:
            x1, y1, x2, y2 = line[0]

            # 線を消す(白で線を引く)
            imgw = cv2.line(imgw, (x1,y1), (x2,y2), (255,255,255), 3)
    return imgw

# ** main関数 **
def main():
    # Argument parsing and parameter setting
    ARGS = parse_args().parse_args()
    input_stream = ARGS.image
    lang = ARGS.language
    prosess = ARGS.prosess
    linedel = ARGS.linedel
    conf = int(ARGS.confidence)
    layout = int(ARGS.layout)
    maxsize = int(ARGS.maxsize)
    logflg = ARGS.log
    titleflg = ARGS.title
    outpath = ARGS.out

    # 情報表示
    display_info(input_stream, lang, prosess, linedel, conf, layout, maxsize, logflg, titleflg, outpath)

    # OCR
    tools = pyocr.get_available_tools()
    if len(tools) == 0:
        print(RED + "\nOCR tool Not found." + NOCOLOR)
        quit()
    tool = tools[0]

    # list of line objects. For each line object:
    #   line.word_boxes is a list of word boxes (the individual words in the line)
    #   line.content is the whole text of the line
    #   line.position is the position of the whole line on the page (in pixels)
    #
    # Each word box object has an attribute 'confidence' giving the confidence
    # score provided by the OCR tool. Confidence score depends entirely on
    # the OCR tool. Only supported with Tesseract and Libtesseract (always 0
    # with Cuneiform).
    #
    # Beware that some OCR tools (Tesseract for instance) may return boxes
    # with an empty content.
    # (訳)
    # ラインオブジェクトのリスト 各ラインオブジェクトの場合:
    # line.word_boxesは、単語ボックス（行内の個々の単語）のリスト
    # line.contentは行の全文
    # line.positionは、ページ上の行全体の位置（ピクセル単位）
    #
    # 各ワードボックスオブジェクトには、信頼性を与える属性「confidence」がある
    # OCRツールによって提供されるスコア。
    #
    # 一部のOCRツール（Tesseractなど）が空のボックスを返す場合があることに注意

    # OpenCV でイメージを読む
    frame = cv2.imread(input_stream)
    if frame is None:
        print(RED + "\nUnable to read the input." + NOCOLOR)
        quit()

    if maxsize > 300:
        # アスペクト比を固定してリサイズ
        img_h, img_w = frame.shape[:2]
        if (img_w > img_h):
            if (img_w > maxsize):
                height = round(img_h * (maxsize / img_w))
                frame = cv2.resize(frame, dsize = (maxsize, height))
        else:
            if (img_h > maxsize):
                width =  round(img_w * (maxsize / img_h))
                frame = cv2.resize(frame, dsize = (width, maxsize))

    # メッセージ作成
    h, w = frame.shape[:2]
    st_pram = 'pros={}, linedel={}, conf={}, layout={}  width={}, height={}'.format(prosess, linedel, conf, layout, w, h)

    # fontスケール(仮設定)
    img_h, img_w = frame.shape[:2]
    font_scale = 20
    if img_w > 2000:
        font_scale = 40

    # 画像の前処理
    if (prosess == 'y'):        # モノクロ・2値化 フォアグラウンド処理
        frame = img_preproces(frame)

    if (linedel == 'y'):         # 罫線除去 フォアグラウンド処理
        frame = delete_line(frame)
        frame_pl = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
    elif (linedel == 'b'):   # 罫線除去 バックグラウンド処理
        frame_pl = delete_line(frame)
        frame_pl = cv2.cvtColor(frame_pl, cv2.COLOR_RGB2BGR)
    else:                       # 罫線除去なし
        frame_pl = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)

    # PILのイメージにする
    img = Image.fromarray(frame_pl)

    # 文字認識処理
    line_and_word_boxes = tool.image_to_string(img, lang=lang,
                            builder=pyocr.builders.LineBoxBuilder(tesseract_layout=layout))

    # 取得データ
    print('\n---------------------------')

    # ウインドウ表示
    fontPIL = 'NotoSansCJK-Bold.ttc'
    
    for lw_box in line_and_word_boxes:
        content = lw_box.content
        position = lw_box.position
        box = []
        txt = []
        n = 0
        for lw_box in lw_box.word_boxes:
            txt.append(lw_box.content)
            box.append(lw_box.position)
            n = n+1
        confidence = lw_box.confidence

        if confidence > conf and len(content) > 0:
            xmin = position[0][0]
            ymin = position[0][1]
            xmax = position[1][0]
            ymax = position[1][1]
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color=LINE_WORD_BOX_COLOR, thickness=3)
            for nm in range(n):
                cv2.rectangle(frame, (box[nm][0][0], box[nm][0][1]), (box[nm][1][0], box[nm][1][1]), color=WORD_BOX_COLOR, thickness=1)
            st_score = '#Score{:3}:  '.format(confidence) + content
            myfunction.cv2_putText(img = frame,
                               text = st_score,
                               org = (xmin, ymin - 4),
                               fontFace = fontPIL,
                               fontScale = font_scale,
                               color = CONTENTS_COLOR,
                               mode = 0)

            print('\ncontents: ', content)
            print('position: ', position)
            print('confidence: ', confidence)

            if (logflg == 'y'):
                for nm in range(n):
                    print('　 {:　<8}'.format(txt[nm]), '　', box[nm])

    print('---------------------------\n')

    # タイトル描画
    if (titleflg == 'y'):
        cv2.putText(frame, title, (10, 30), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.8, color=(200, 200, 0), lineType=cv2.LINE_AA)
        cv2.putText(frame, st_pram, (50, 50), cv2.FONT_HERSHEY_DUPLEX, fontScale=0.5, color=(0, 0, 0), lineType=cv2.LINE_AA)

    # 画像表示 
    cv2.namedWindow(title, flags=cv2.WINDOW_AUTOSIZE | cv2.WINDOW_GUI_NORMAL) 
    cv2.imshow(title, frame)

    # 処理結果の記録(静止画)
    if (outpath != 'non'):
        cv2.imwrite(outpath, frame)

    # 何らかのキーが押されたら終了 
    while(True):
        key = cv2.waitKey(1)
        if key == 27 or key == 113:                    # 'esc' or 'q'
            break
        if not mylib_gui._is_visible(title):           # 'Close' button
            break

    cv2.destroyAllWindows()

# main関数エントリーポイント(実行開始)
if __name__ == "__main__":
    sys.exit(main())

→「mylib_gui.py」

実行時に利用できるコマンドオプション

コマンドオプション	デフォールト設定	意味
-h, --help		ヘルプ表示
-i, --image	test1.png	入力画像ファイル
-l, --language	jpn	言語
-c, --confidence	40	有効とする信頼性スコア値
-p, --process	n	前処理(グレイスケール変換/2値化)フラグ (y/n)
-d, --linedel	n	前処理(罫線消去)フラグ (y/b/n) b=バックグラウンド実行
--layout	6	tesseractレイアウト(0-13)
--maxsize	0	処理画像の最大ピクセル(0=リサイズしない)
--log	n	詳細ログ出力フラグ (y/n)
-t, --title	y	タイトル表示 (y/n)
-o, --out	non	処理結果を出力する場合のファイルパス

▼(py37) $ python3 ocrtest4.py -h

(py37) $ python3 ocrtest4.py -h

--- OCR Test-4 list of line objects Display ---
 OpenCV version 4.5.2 

usage: ocrtest4.py [-h] [-i IMAGE_FILE] [-l LANGUAGE] [-c CONFIDENCE]
                   [-p PROCESS] [-d LINEDEL] [--layout LAYOUT]
                   [--maxsize MAXSIZE] [--log LOG] [-t TITLE] [-o IMAGE_OUT]

optional arguments:
  -h, --help            show this help message and exit
  -i IMAGE_FILE, --image IMAGE_FILE
                        Absolute path to image file. Default value is
                        'test1.png'
  -l LANGUAGE, --language LANGUAGE
                        Language. Default value is 'jpn'
  -c CONFIDENCE, --confidence CONFIDENCE
                        Confidence Level Default value is 40
  -p PROCESS, --prosess PROCESS
                        Preprocessing flag.(y/n) Default value is 'n'
  -d LINEDEL, --linedel LINEDEL
                        Line delete flag.(y/b/n) Default value is 'n'
  --layout LAYOUT       Tesseract layout Default value is 6
  --maxsize MAXSIZE     Image max size (free=0). Default value is 0
  --log LOG             Log output flag.(y/n) Default value is 'n'
  -t TITLE, --title TITLE
                        Program title flag.(y/n) Default value is 'y'
  -o IMAGE_OUT, --out IMAGE_OUT
                        Processed image file path. Default value is 'non'

実行結果

(py37) cd ~/workspace_py37/pyocr/
(py37) $ python3 ocrtest4.py

--- OCR Test-4 list of line objects Display ---
 OpenCV version 4.5.2 

OCR Test-4 list of line objects Display: Starting application...
   - Image File   :  test1.png
   - Language     :  jpn
   - Preprocessing:  n
   - Line delete  :  n
   - Confidence   :  40
   - Layout       :  6
   - Max size     :  0
   - Log frag     :  n
   - Program Title:  y
   - Processed out:  non

---------------------------

contents:  最新 情報
position:  ((67, 96), (191, 125))
confidence:  96

contents:  ・ 画 像 認識 (Image Recognition) と は
position:  ((78, 144), (615, 176))
confidence:  96

contents:  ・ 物 体 検出 アル ゴリ ズム 「YOLO V5」
position:  ((78, 192), (601, 221))
confidence:  74

contents:  ・ 敵 対 的 生成 ネッ トワ ー ク (GAN)
position:  ((78, 288), (543, 316))
confidence:  92
---------------------------

・4行目が検出できないので信頼性レベルを下げて実行

(py37) $ python3 ocrtest4.py -c 10

--- OCR Test-4 list of line objects Display ---
 OpenCV version 4.5.2 

OCR Test-4 list of line objects Display: Starting application...
   - Image File   :  test1.png
   - Language     :  jpn
   - Preprocessing:  n
   - Line delete  :  n
   - Confidence   :  10
   - Layout       :  6
   - Max size     :  0
   - Log frag     :  n
   - Program Title:  y
   - Processed out:  non

---------------------------

contents:  最新 情報
position:  ((67, 96), (191, 125))
confidence:  96

contents:  ・ 画 像 認識 (Image Recognition) と は
position:  ((78, 144), (615, 176))
confidence:  96

contents:  ・ 物 体 検出 アル ゴリ ズム 「YOLO V5」
position:  ((78, 192), (601, 221))
confidence:  74

contents:  ・ 顔 認証 (Pace recognition) 概要
position:  ((78, 240), (551, 272))
confidence:  20

contents:  ・ 敵 対 的 生成 ネッ トワ ー ク (GAN)
position:  ((78, 288), (543, 316))
confidence:  92
---------------------------

↑

OCR エンジン「tesseract」の詳細 †

　認識精度の向上にはさらに調整が必要と思われる。

↑

「tesseract」のコマンド入力パラメータ †

(py37) $ tesseract --help-extra
Usage:
  tesseract --help | --help-extra | --help-psm | --help-oem | --version
  tesseract --list-langs [--tessdata-dir PATH]
  tesseract --print-parameters [options...] [configfile...]
  tesseract imagename|imagelist|stdin outputbase|stdout [options...] [configfile...]

OCR options:
  --tessdata-dir PATH   Specify the location of tessdata path.
  --user-words PATH     Specify the location of user words file.
  --user-patterns PATH  Specify the location of user patterns file.
  --dpi VALUE           Specify DPI for input image.
  -l LANG[+LANG]        Specify language(s) used for OCR.
  -c VAR=VALUE          Set value for config variables.
                        Multiple -c arguments are allowed.
  --psm NUM             Specify page segmentation mode.
  --oem NUM             Specify OCR Engine mode.
NOTE: These options must occur before any configfile.

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR. (not implemented)
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.

OCR Engine modes:
  0    Legacy engine only.
  1    Neural nets LSTM engine only.
  2    Legacy + LSTM engines.
  3    Default, based on what is available.

Single options:
  -h, --help            Show minimal help message.
  --help-extra          Show extra help for advanced users.
  --help-psm            Show page segmentation modes.
  --help-oem            Show OCR Engine modes.
  -v, --version         Show version information.
  --list-langs          List available languages for tesseract engine.
  --print-parameters    Print tesseract parameters.

↑

更新履歴 †

2021/11/15 初版

↑

最新の20件

文字認識エンジン「Tesseract」 †

環境構築 †

必要なパッケージをインストール †

環境構築と対応言語の確認 †

OCR としての簡単なテスト †

Webカメラ入力の簡単なテスト †

OCR テストプログラム †

Step 1 イメージ画像からテキストを認識する「ocrtest1.py」 †

Step 2 イメージ画像から文字と文字の位置を認識する「ocrtest2.py」 †

Step 3 イメージ画像から文章と文章の位置と信頼性スコアを取得する「ocrtest3.py」 †

Step 4 文章と文章の位置と信頼性スコアを表示する「ocrtest4.py」 †

OCR エンジン「tesseract」の詳細 †

「tesseract」のコマンド入力パラメータ †

更新履歴 †

参考資料 †