私的AI研究会 > PyTorch8
実用的な用途を想定した広義の AI 開発において重要になるのは、ニューラルネットワークのモデル構造の詳細ではなく、学習させたモデルをどのように使うのか、といった応用面であることが多い。
広義の AI を構成するプログラム部分はそのままに、ニューラルネットワークのモデルのみを入れ替えることで AI 機能を拡張する手法をやってみる。
「PyTorch ではじめる AI開発」の著者が公開している OCR プログラムのニューラルネットワークのモデルのみを差し替えることで新しい OCR を作成する。
→ 以前の検証 日本語OCR の検証
著者のサイト「日本語OCR解説」に詳しい解説がある。
ここでは理解のため、本書の内容をまとめる。(説明の図は著者のサイトからの引用)
(py37) $ cd ~/workspace_py37/ (py37) $ mkdir chapter08 (py37) $ cd chapter08
(py37) $ git clone -b version2 https://github.com/tanreinama/OCR_Japanease/ Cloning into 'OCR_Japanease'... remote: Enumerating objects: 128, done. remote: Counting objects: 100% (128/128), done. remote: Compressing objects: 100% (97/97), done. remote: Total 128 (delta 57), reused 80 (delta 24), pack-reused 0 Receiving objects: 100% (128/128), 3.46 MiB | 21.32 MiB/s, done. Resolving deltas: 100% (57/57), done. (py37) $ ls OCR_Japanease
(py37) $ cd OCR_Japanease/ (py37) $ wget https://nama.ne.jp/models/ocr_jp-v2.zip --2021-10-11 16:34:14-- https://nama.ne.jp/models/ocr_jp-v2.zip nama.ne.jp (nama.ne.jp) をDNSに問いあわせています... 112.78.112.176 nama.ne.jp (nama.ne.jp)|112.78.112.176|:443 に接続しています... 接続しました。 HTTP による接続要求を送信しました、応答を待っています... 200 OK 長さ: 180256769 (172M) [application/zip] `ocr_jp-v2.zip' に保存中 ocr_jp-v2.zip 100%[===================>] 171.91M 11.6MB/s in 14s 2021-10-11 16:34:28 (12.2 MB/s) - `ocr_jp-v2.zip' へ保存完了 [180256769/180256769] (py37) $ unzip ocr_jp-v2.zip Archive: ocr_jp-v2.zip inflating: models/detectionnet.model inflating: models/classifiernet.model (py37) $ cp -r models OCR_Japanease/
(py37) $ python3 ocr_japanease.py --cpu testshot1.png file "testshot1.png" detected in 72 dpi. [Block #0] コロナウイルスにまけるな [Block #1] がんばろう [Block #2] 日本
認識する領域 | 領域の意味 | プログラム内での定義 | データ構造 |
文字 | 日本語に文字一つを含む領域 | BoundingBoxクラス | 入力画像中の矩形領域 |
文 | 一連の文字の連なりからなる1行の文字列領域 | CenterLineクラス | 中心線の始点と終点 |
文章 | 複数の文を含むかもしれない一連の領域 | SentenceBoxクラス | BoundingBoxクラスのリスト |
出力 | チャンネル数 | プログラム中の変数名 |
文字である確率マップ画像 | 1 | hm_wd |
文である確率マップ画像 | 1 | hm_sent |
文字の場合の大きさ | 2 | hm_pos |
import numpy as np import cv2 import torch import math from sklearn.cluster import OPTICS from .structure import BoundingBox, SentenceBox, CenterLine from .nihongo import nihongo_class class BoundingBoxDataset(object): def __init__(self, org_img, scale_wh, boundbox, output_size=(56,56)): self.org_img = org_img self.scale_wh = scale_wh self.boundbox = boundbox self.output_size = output_size def __getitem__(self, idx): bb = self.boundbox[idx] x1, y1, x2, y2 = bb.x1 * 2, bb.y1 * 2, bb.x2 * 2, bb.y2 * 2 x1 = int(np.round(x1 * self.scale_wh[0])) y1 = int(np.round(y1 * self.scale_wh[1])) x2 = int(np.round(x2 * self.scale_wh[0])) y2 = int(np.round(y2 * self.scale_wh[1])) x1 = min(max(0, x1), self.org_img.shape[1]) y1 = min(max(0, y1), self.org_img.shape[0]) x2 = min(max(0, x2), self.org_img.shape[1]) y2 = min(max(0, y2), self.org_img.shape[0]) im = self.org_img[y1:y2,x1:x2] if im.shape[0]==0 or im.shape[1]==0: im = np.zeros((1,1)) im = cv2.resize(im, self.output_size) im = im.reshape((1,self.output_size[1],self.output_size[0])) return im.astype(np.float32) / 255. def __len__(self): return len(self.boundbox) class Detector(object): def __init__(self, use_cuda=True, word_threshold=0.01, class_threshold=0.25, low_gpu_memory=False): self.use_cuda = use_cuda self.word_threshold = word_threshold self.class_threshold = class_threshold self.low_gpu_memory = low_gpu_memory def _preprocess(self, hm_wd, hm_sent, hm_pos, simple_mode_lines=5, hm_dup=2.5, high_threshold=0.5, low_threshold=0.15): ln = np.clip((hm_sent*255),0,255).astype(np.uint8) lines = cv2.HoughLinesP(ln, rho=2, theta=np.pi/360, threshold=80, minLineLength=30, maxLineGap=15) center_ln = [] # Center line detection for line in lines: x1, y1, x2, y2 = line[0] cl = CenterLine(hm_sent, hm_wd, hm_pos, [x1, y1], [x2, y2]) center_ln.append(cl) lines = sorted(center_ln, key=lambda x: x.score)[::-1] drops = [] # NMS for line (drop duplicate line) for i in range(len(lines)): if i not in drops: cur = lines[i] for j in range(i+1, len(lines), 1): otr = lines[j] if cur.contain(otr) and j not in drops: cur.score += otr.score drops.append(j) lines = [l for i,l in enumerate(lines) if i not in drops] drops = [] # NMS for line (drop cross line) for i in range(len(lines)): if i not in drops: cur = lines[i] for j in range(i+1, len(lines), 1): otr = lines[j] if cur.cross(otr) and j not in drops: drops.append(j) lines = [l for i,l in enumerate(lines) if i not in drops] if len(lines) <= simple_mode_lines: # simple image all_map = [] for line in lines: out = np.zeros(hm_sent.shape, dtype=np.uint8) out = cv2.line(out, tuple(line.p1), tuple(line.p2), (255,255,255), line.maxw*2) all_map.append(out) return all_map, None else: wd_size_avg = (int(np.round(hm_pos[0][hm_pos[0]!=0].mean()*hm_pos[0].shape[1])), int(np.round(hm_pos[1][hm_pos[1]!=0].mean()*hm_pos[1].shape[0]))) kernel = cv2.getStructuringElement(cv2.MORPH_CROSS, wd_size_avg) out = np.zeros(hm_sent.shape, dtype=np.uint8) for line in lines: out = cv2.line(out, tuple(line.p1), tuple(line.p2), (255,255,255), line.maxw*2) flt = hm_sent.copy() * hm_dup flt[hm_sent>high_threshold] = 1 hm_sent_preprocessed = np.clip(flt+out/255,0,1) hm_sent_preprocessed[hm_sent<=low_threshold] = 0 hm_sent_preprocessed = cv2.dilate(hm_sent_preprocessed, kernel) return None, hm_sent_preprocessed def _get_class(self, im, size=128): minmax = (im.min(), im.max()) if minmax[1]-minmax[0] == 0: return np.array([]) im = (im-minmax[0]) / (minmax[1]-minmax[0]) sc = cv2.resize(im, (size,size), interpolation=cv2.INTER_NEAREST) clf = OPTICS(max_eps=5, metric='euclidean', min_cluster_size=75) a = [] for x in range(sc.shape[0]): for y in range(sc.shape[1]): if sc[x][y] > 0.01: a.append([x,y]) b = clf.fit_predict(a) p = {v:k for k,v in enumerate(set(b))} b = [p[j] for j in b] c = np.zeros(sc.shape, dtype=np.int32) for i in range(len(b)): c[a[i][0],a[i][1]] = b[i]+1 c = cv2.resize(c, (im.shape[1], im.shape[0]), interpolation=cv2.INTER_NEAREST) return c def _get_map(self, clz_map): all_map = [] for i in range(1,int(np.max(clz_map)+1)): clz_wd = np.zeros(clz_map.shape, dtype=np.uint8) where = np.where(clz_map == i) clz_wd[where] = 255 cnts = cv2.findContours(clz_wd, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) cnts = cnts[0] if len(cnts) == 2 else cnts[1] for cnt in cnts: rect = cv2.minAreaRect(cnt) box = cv2.boxPoints(rect).reshape((-1,1,2)).astype(np.int32) cv2.drawContours(clz_wd,[box],0,255,2) cv2.drawContours(clz_wd,[box],0,255,-1) all_map.append(clz_wd) return all_map def _filt_map(self, all_map): maps = [] dindx = [] for i1, m1 in enumerate(all_map): for i2, m2 in enumerate(all_map): if i1 != i2: if np.sum(m2[m1 != 0]) == np.sum(m2): dindx.append(i2) for i1, m1 in enumerate(all_map): if not i1 in dindx: for i2, m2 in enumerate(all_map[i1+1:]): an = ((m1 == 0) + (m2 == 0)) == 0 if np.sum(an) != 0: if np.sum(m1) > np.sum(m2): m2[an] = 0 else: m1[an] = 0 maps.append(m1) return np.array(sorted(maps, key=lambda x:-np.sum(x))) def _scale_image(self, img, long_size): if img.shape[0] < img.shape[1]: scale = img.shape[1] / long_size size = (long_size, math.ceil(img.shape[0] / scale)) else: scale = img.shape[0] / long_size size = (math.ceil(img.shape[1] / scale), long_size) return cv2.resize(img, size, interpolation=cv2.INTER_CUBIC) def _detect1x(self, detector_model, gray_img_scaled): im = np.zeros((1,1,512,512)) im[0,0,0:gray_img_scaled.shape[0],0:gray_img_scaled.shape[1]] = gray_img_scaled x = np.clip(im / 255, 0.0, 1.0).astype(np.float32) x = torch.tensor(x) dp = detector_model if self.use_cuda: dp = torch.nn.DataParallel(detector_model) x = x.cuda() dp = dp.cuda() dp.eval() y = dp(x) hm_wd = y['hm_wd'].detach().cpu().numpy().reshape(256,256) hm_sent = y['hm_sent'].detach().cpu().numpy().reshape(256,256) hm_pos = y['of_size'].detach().cpu().numpy().reshape(2,256,256) del x, y if self.use_cuda: torch.cuda.empty_cache() return hm_wd, hm_sent, hm_pos def _detect4x(self, detector_model, gray_img_scaled): tmp = np.zeros((1024,1024)) tmp[0:gray_img_scaled.shape[0],0:gray_img_scaled.shape[1]] = gray_img_scaled im = np.zeros((4,1,512,512)) im[0,0] = tmp[0:512,0:512] im[1,0] = tmp[512:1024,0:512] im[2,0] = tmp[0:512,512:1024] im[3,0] = tmp[512:1024,512:1024] x = np.clip(im / 255, 0.0, 1.0).astype(np.float32) if (not self.low_gpu_memory) or (not self.use_cuda): x = torch.tensor(x) dp = detector_model if self.use_cuda: dp = torch.nn.DataParallel(detector_model) x = x.cuda() dp = dp.cuda() dp.eval() y = dp(x) org_hm_wd = [y['hm_wd'][i].detach().cpu().numpy().reshape(256,256) for i in range(4)] org_hm_sent = [y['hm_sent'][i].detach().cpu().numpy().reshape(256,256) for i in range(4)] org_of_size = [y['of_size'][i].detach().cpu().numpy().reshape(2,256,256) / 2 for i in range(4)] del x, y if self.use_cuda: torch.cuda.empty_cache() else: org_hm_wd, org_hm_sent, org_of_size = [], [], [] for i in range(4): _x = torch.tensor([x[i]]) dp = detector_model if self.use_cuda: dp = torch.nn.DataParallel(detector_model) _x = _x.cuda() dp = dp.cuda() dp.eval() y = dp(_x) org_hm_wd.append(y['hm_wd'][0].detach().cpu().numpy().reshape(256,256)) org_hm_sent.append(y['hm_sent'][0].detach().cpu().numpy().reshape(256,256)) org_of_size.append(y['of_size'][0].detach().cpu().numpy().reshape(2,256,256) / 2) del _x, y if self.use_cuda: torch.cuda.empty_cache() del x hm_wd = np.zeros((512,512)) hm_sent = np.zeros((512,512)) hm_pos = np.zeros((2,512,512)) hm_wd[0:256,0:256] = org_hm_wd[0] hm_wd[256:512,0:256] = org_hm_wd[1] hm_wd[0:256,256:512] = org_hm_wd[2] hm_wd[256:512,256:512] = org_hm_wd[3] hm_sent[0:256,0:256] = org_hm_sent[0] hm_sent[256:512,0:256] = org_hm_sent[1] hm_sent[0:256,256:512] = org_hm_sent[2] hm_sent[256:512,256:512] = org_hm_sent[3] hm_pos[:,0:256,0:256] = org_of_size[0] hm_pos[:,256:512,0:256] = org_of_size[1] hm_pos[:,0:256,256:512] = org_of_size[2] hm_pos[:,256:512,256:512] = org_of_size[3] return hm_wd, hm_sent, hm_pos def _detect16x(self, detector_model, gray_img_scaled): tmp = np.zeros((2048,2048)) tmp[0:gray_img_scaled.shape[0],0:gray_img_scaled.shape[1]] = gray_img_scaled hm_wd = np.zeros((1024,1024)) hm_sent = np.zeros((1024,1024)) hm_pos = np.zeros((2,1024,1024)) dp = detector_model if self.use_cuda: dp = torch.nn.DataParallel(detector_model) dp = dp.cuda() dp.eval() for ygrid_i in range(4): im = np.zeros((4,1,512,512)) im[0,0] = tmp[512*ygrid_i:512*ygrid_i+512,0:512] im[1,0] = tmp[512*ygrid_i:512*ygrid_i+512,512:1024] im[2,0] = tmp[512*ygrid_i:512*ygrid_i+512,1024:1536] im[3,0] = tmp[512*ygrid_i:512*ygrid_i+512,1536:2048] x = np.clip(im / 255, 0.0, 1.0).astype(np.float32) if (not self.low_gpu_memory) or (not self.use_cuda): x = torch.tensor(x) if self.use_cuda: x = x.cuda() y = dp(x) org_hm_wd = [y['hm_wd'][i].detach().cpu().numpy().reshape(256,256) for i in range(4)] org_hm_sent = [y['hm_sent'][i].detach().cpu().numpy().reshape(256,256) for i in range(4)] org_of_size = [y['of_size'][i].detach().cpu().numpy().reshape(2,256,256) / 4 for i in range(4)] del x, y if self.use_cuda: torch.cuda.empty_cache() else: org_hm_wd, org_hm_sent, org_of_size = [], [], [] for i in range(4): _x = torch.tensor([x[i]]) dp = torch.nn.DataParallel(detector_model) if self.use_cuda: _x = _x.cuda() dp = dp.cuda() dp.eval() y = dp(_x) org_hm_wd.append(y['hm_wd'][0].detach().cpu().numpy().reshape(256,256)) org_hm_sent.append(y['hm_sent'][0].detach().cpu().numpy().reshape(256,256)) org_of_size.append(y['of_size'][0].detach().cpu().numpy().reshape(2,256,256) / 4) del _x, y if self.use_cuda: torch.cuda.empty_cache() del x hm_wd[256*ygrid_i:256*ygrid_i+256,0:256] = org_hm_wd[0] hm_wd[256*ygrid_i:256*ygrid_i+256,256:512] = org_hm_wd[1] hm_wd[256*ygrid_i:256*ygrid_i+256,512:768] = org_hm_wd[2] hm_wd[256*ygrid_i:256*ygrid_i+256,768:1024] = org_hm_wd[3] hm_sent[256*ygrid_i:256*ygrid_i+256,0:256] = org_hm_sent[0] hm_sent[256*ygrid_i:256*ygrid_i+256,256:512] = org_hm_sent[1] hm_sent[256*ygrid_i:256*ygrid_i+256,512:768] = org_hm_sent[2] hm_sent[256*ygrid_i:256*ygrid_i+256,768:1024] = org_hm_sent[3] hm_pos[:,256*ygrid_i:256*ygrid_i+256,0:256] = org_of_size[0] hm_pos[:,256*ygrid_i:256*ygrid_i+256,256:512] = org_of_size[1] hm_pos[:,256*ygrid_i:256*ygrid_i+256,512:768] = org_of_size[2] hm_pos[:,256*ygrid_i:256*ygrid_i+256,768:1024] = org_of_size[3] return hm_wd, hm_sent, hm_pos def _get_maps(self, detector_model, gray_img, dpi, min_word_size_cm): if dpi == 0: img_size = max(gray_img.shape) if img_size <= 512: detect_size = 512 elif img_size <= 1024: detect_size = 1024 else: detect_size = 2048 div_size = min(img_size, 2048) else: long_size = max(gray_img.shape) inch_size = long_size / dpi pix_size = int(np.round(52 * (inch_size * 2.54))) if pix_size <= 512: detect_size = 512 elif pix_size <= 1024: detect_size = 1024 else: detect_size = 2048 div_size = min(pix_size,detect_size) pix_image = self._scale_image(gray_img, div_size) gray_img_scaled = np.zeros((detect_size, detect_size), dtype=np.uint8) gray_img_scaled[0:pix_image.shape[0],0:pix_image.shape[1]] = pix_image scale_image = (gray_img.shape[1]/pix_image.shape[1], gray_img.shape[0]/pix_image.shape[0]) with torch.no_grad(): if detect_size == 512: hm_wd, hm_sent, hm_pos = self._detect1x(detector_model, gray_img_scaled) elif detect_size == 1024: hm_wd, hm_sent, hm_pos = self._detect4x(detector_model, gray_img_scaled) elif detect_size == 2048: hm_wd, hm_sent, hm_pos = self._detect16x(detector_model, gray_img_scaled) hm_wd[np.mean(hm_pos, axis=0) < 0.01] = 0 hm_sent[np.mean(hm_pos, axis=0) < 0.01] = 0 return pix_image, scale_image, hm_wd, hm_sent, hm_pos def _find_best_dpi(self, detector_model, gray_img, dpi, min_word_size_cm): tests = [] for testdpi in (72,100,150,200,300): res = self._get_maps(detector_model, gray_img, testdpi, min_word_size_cm) tests.append((testdpi, np.sum(res[3] > 0.01) / (res[3].shape[0]*res[3].shape[1]), res)) result = sorted(tests, key=lambda x:x[1])[-1] return result[0], result[2] def detect_image(self, detector_model, gray_img, dpi=72, min_word_size_cm=0.5): min_bound = int(np.round(min_word_size_cm * dpi / 2.54)) if dpi >= 0: pix_image, scale_image, hm_wd, hm_sent, hm_pos = self._get_maps(detector_model, gray_img, dpi, min_word_size_cm) else: dpi, (pix_image, scale_image, hm_wd, hm_sent, hm_pos) = self._find_best_dpi(detector_model, gray_img, dpi, min_word_size_cm) all_map, hm_sent_preprocessed = self._preprocess(hm_wd, hm_sent, hm_pos) if hm_sent_preprocessed is not None: class_map = self._get_class(hm_sent_preprocessed) all_map = self._get_map(class_map) all_map = self._filt_map(all_map) sent_box = [] for i, now_map in enumerate(all_map): clz_wd = hm_wd.copy() clz_wd[now_map == 0] = 0 clz_wd = (clz_wd - np.min(clz_wd)) / (np.max(clz_wd) - np.min(clz_wd)) sbox = SentenceBox(self.word_threshold) sbox.make_boundingbox(clz_wd, hm_pos, min_bound) if len(sbox.boundingboxs) > 0: sent_box.append(sbox) return dpi, sent_box, gray_img, scale_image, hm_wd def _bounding_box(self, classifier_model, gray_img, scale_image, boundings, batch_size_classifier, num_workers): dataset = BoundingBoxDataset(gray_img, scale_image, boundings) loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size_classifier, shuffle=False, num_workers=num_workers) with torch.no_grad(): dp = torch.nn.DataParallel(classifier_model) if self.use_cuda: dp = dp.cuda() dp.eval() num_pred = 0 for x in loader: if self.use_cuda: x = x.cuda() val = dp(x) val = torch.nn.functional.softmax(val, dim=1) val = val.detach().cpu().numpy() for v in val: boundings[num_pred].set_prediction(v.copy()) num_pred += 1 del x del dp if self.use_cuda: torch.cuda.empty_cache() def bounding_box(self, classifier_model, detection, batch_size_classifier=64, num_workers=2, repeat_box=1): if self.low_gpu_memory: batch_size_classifier = batch_size_classifier//8 dpi, sent_box, gray_img, scale_image, hm_wd = detection for i, sbox in enumerate(sent_box): sbox.make_sentenceid(i) sbox.make_detectionscore(hm_wd) all_bounding = sum([sbox.boundingboxs for sbox in sent_box], []) ignore_idx = [] self._bounding_box(classifier_model, gray_img, scale_image, all_bounding, batch_size_classifier, num_workers) for _ in range(repeat_box): extra_bounding = [] for i, bbox in enumerate(all_bounding): if (not bbox.iswordbox()) and (i not in ignore_idx): ignore_idx.append(i) classindex = np.argmax(bbox.prediction) if classindex == len(nihongo_class): # 横並び w, h = bbox.x2 - bbox.x1, bbox.y2 - bbox.y1 cx = (bbox.x1 + bbox.x2) // 2 if cx-w > 0: b1 = BoundingBox(cx-w, bbox.y1, cx, bbox.y2) b1.sentenceindex = bbox.sentenceindex extra_bounding.append(b1) if cx+w < hm_wd.shape[1]: b2 = BoundingBox(cx, bbox.y1, cx+w, bbox.y2) b2.sentenceindex = bbox.sentenceindex extra_bounding.append(b2) elif classindex == len(nihongo_class)+1: # 縦並び w, h = bbox.x2 - bbox.x1, bbox.y2 - bbox.y1 cy = (bbox.y1 + bbox.y2) // 2 if cy-h > 0: b1 = BoundingBox(bbox.x1, cy-h, bbox.x2, cy) b1.sentenceindex = bbox.sentenceindex extra_bounding.append(b1) if cy+h < hm_wd.shape[0]: b2 = BoundingBox(bbox.x1, cy, bbox.x2, cy+h) b2.sentenceindex = bbox.sentenceindex extra_bounding.append(b2) if len(extra_bounding) == 0: break sbox = SentenceBox(self.word_threshold) sbox.boundingboxs = extra_bounding sbox.make_detectionscore(hm_wd) self._bounding_box(classifier_model, gray_img, scale_image, sbox.boundingboxs, batch_size_classifier, num_workers) all_bounding = all_bounding + sbox.boundingboxs all_bounding = [b for b in all_bounding if b.classifiercore() > self.class_threshold] return all_bounding
import numpy as np import cv2 import torch import math from sklearn.cluster import OPTICS from .nihongo import nihongo_class class BoundingBox(object): def __init__(self, x1, y1, x2, y2): self.x1 = x1 self.y1 = y1 self.x2 = x2 self.y2 = y2 self.detectionscore = 0.0 self.sentenceindex = 0 self.prediction = None def set_prediction(self, pred): self.prediction = pred def iswordbox(self): return self.prediction is not None and np.argmax(self.prediction) < len(nihongo_class) def isword(self): return self.iswordbox() and np.argmax(self.prediction) > 0 def word(self): idx = np.argmax(self.prediction[1:len(nihongo_class)]) + 1 return idx, self.prediction[idx] def classifiercore(self): return self.word()[1] def score(self): return self.detectionscore * self.classifiercore() class SentenceBox(object): def __init__(self, word_threshold): self.boundingboxs = [] self.word_threshold = word_threshold def _conv3_filter(self, img, pos): points = [] rects = [] sizes = [] for y in range(1,img.shape[0]-1): for x in range(1,img.shape[1]-1): if img[y,x]>self.word_threshold and img[y,x]>img[y-1,x] and img[y,x]>img[y-1,x-1] and img[y,x]>img[y-1,x+1] and img[y,x]>img[y,x-1] and img[y,x]>img[y,x+1] and img[y,x]>img[y+1,x] and img[y,x]>img[y+1,x-1] and img[y,x]>img[y+1,x+1]: points.append((x,y)) w, h = pos[0][y,x], pos[1][y,x] sizes.append(max(w,h)) w, h = 1+w*img.shape[1], 1+h*img.shape[0] offw, offh = int(np.round(w)), int(np.round(h)) rects.append((x-offw,y-offh,x+offw,y+offh)) return points, rects, sizes def make_boundingbox(self, hm_wd, hm_pos, min_bound=12, resize_val=1.1, aspect_val=1.25, dup_threathold=0.033): pos, rcts, sizes = self._conv3_filter(hm_wd, hm_pos) min_bound = min_bound // 2 for p, r, s in zip(pos, rcts, sizes): x1, y1, x2, y2 = r x1 = min(max(0,x1), hm_wd.shape[1]) y1 = min(max(0,y1), hm_wd.shape[0]) x2 = min(max(0,x2), hm_wd.shape[1]) y2 = min(max(0,y2), hm_wd.shape[0]) w, h = x2-x1, y2-y1 if min(w, h) >= min_bound: self.boundingboxs.append(BoundingBox(x1, y1, x2, y2)) w2 = int(np.round((x2-x1) * resize_val)) h2 = int(np.round((y2-y1) * resize_val)) if w2 != w and h2 != h and min(w2, h2) >= min_bound: xx1 = min(max(0, p[0] - w2//2), hm_wd.shape[1]) yy1 = min(max(0, p[1] - h2//2), hm_wd.shape[0]) xx2 = min(max(0, p[0] - w2//2 + w2), hm_wd.shape[1]) yy2 = min(max(0, p[1] - h2//2 + h2), hm_wd.shape[0]) self.boundingboxs.append(BoundingBox(xx1, yy1, xx2, yy2)) w2 = int(np.round((x2-x1) * aspect_val)) h2 = int(np.round((y2-y1) / aspect_val)) if w2 != w and h2 != h and min(w2, h2) >= min_bound: xx1 = min(max(0, p[0] - w2//2), hm_wd.shape[1]) yy1 = min(max(0, p[1] - h2//2), hm_wd.shape[0]) xx2 = min(max(0, p[0] - w2//2 + w2), hm_wd.shape[1]) yy2 = min(max(0, p[1] - h2//2 + h2), hm_wd.shape[0]) self.boundingboxs.append(BoundingBox(xx1, yy1, xx2, yy2)) w2 = int(np.round((x2-x1) / aspect_val)) h2 = int(np.round((y2-y1) * aspect_val)) if w2 != w and h2 != h and min(w2, h2) >= min_bound: xx1 = min(max(0, p[0] - w2//2), hm_wd.shape[1]) yy1 = min(max(0, p[1] - h2//2), hm_wd.shape[0]) xx2 = min(max(0, p[0] - w2//2 + w2), hm_wd.shape[1]) yy2 = min(max(0, p[1] - h2//2 + h2), hm_wd.shape[0]) self.boundingboxs.append(BoundingBox(xx1, yy1, xx2, yy2)) if s > dup_threathold: w2 = int(np.round((x2-x1) * aspect_val)) h2 = int(np.round((y2-y1) * aspect_val)) if w2 != w and h2 != h and min(w2, h2) >= min_bound: xx1 = min(max(0, p[0] - w2//2), hm_wd.shape[1]) yy1 = min(max(0, p[1] - h2//2), hm_wd.shape[0]) xx2 = min(max(0, p[0] - w2//2 + w2), hm_wd.shape[1]) yy2 = min(max(0, p[1] - h2//2 + h2), hm_wd.shape[0]) self.boundingboxs.append(BoundingBox(xx1, yy1, xx2, yy2)) def make_detectionscore(self, hm_wd_all): for i in range(len(self.boundingboxs)): x1, y1, x2, y2 = self.boundingboxs[i].x1, self.boundingboxs[i].y1, self.boundingboxs[i].x2, self.boundingboxs[i].y2 y_pred = hm_wd_all[y1:y2,x1:x2] w, h = x2-x1, y2-y1 y_true = ((np.exp(-(((np.arange(w)-(w/2))/(w/10))**2)/2)).reshape(1,-1) *(np.exp(-(((np.arange(h)-(h/2))/(h/10))**2)/2)).reshape(-1,1)) self.boundingboxs[i].detectionscore = 1.0 - np.mean((y_pred-y_true)**2) def make_sentenceid(self, id): for i in range(len(self.boundingboxs)): self.boundingboxs[i].sentenceindex = id class CenterLine: def __init__(self, hm_sent, hm_word, hm_pos, p1, p2): assert hm_sent.shape == hm_word.shape and hm_sent.shape[0] == hm_pos.shape[1] and hm_sent.shape[1] == hm_pos.shape[2], 'Invalid heatmap' assert p1 != p2, 'Invalid point' self.hm_sent = hm_sent self.hm_word = hm_word self.hm_pos = hm_pos self.p1 = p1 self.p2 = p2 self.score = 0 w = [] dx = 1 if self.p2[0]>=self.p1[0] else -1 dy = 1 if self.p2[1]>=self.p1[1] else -1 for x in range(self.p1[0], self.p2[0]+dx, dx): for y in range(self.p1[1], self.p2[1]+dy, dy): m = max(self.hm_pos[0][y][x]*self.hm_pos.shape[0], self.hm_pos[1][y][x]*self.hm_pos.shape[1]) w.append(int(np.round(m))) self.maxw = np.max(w) self.stdw = np.std(w) self.score = self._score() def _intersect(self, p1, p2, p3, p4): tc1 = float(p1[0] - p2[0]) * (p3[1] - p1[1]) + (p1[1] - p2[1]) * (p1[0] - p3[0]) tc2 = float(p1[0] - p2[0]) * (p4[1] - p1[1]) + (p1[1] - p2[1]) * (p1[0] - p4[0]) td1 = float(p3[0] - p4[0]) * (p1[1] - p3[1]) + (p3[1] - p4[1]) * (p3[0] - p1[0]) td2 = float(p3[0] - p4[0]) * (p2[1] - p3[1]) + (p3[1] - p4[1]) * (p3[0] - p2[0]) return tc1*tc2<0 and td1*td2<0 def _distance(self, p1, p2, other): ab = [p2[0] - p1[0], p2[1] - p1[1]] ap = [other[0] - p1[0], other[1] - p1[1]] bp = [other[0] - p2[0], other[1] - p2[1]] if (ab[0]==0 and ab[1] == 0) or (bp[0]==0 and bp[1] == 0) or (ap[0]==0 and ap[1] == 0): return 0 d = np.cross(ab, ap) l = np.sqrt(ab[0]**2+ab[1]**2) e = d / l # 点と線の距離 v = [ab[0]/l, ab[1]/l] m = np.dot(v, ap) if m >= 0 and m <= l: return abs(e) return min(np.sqrt(ap[0]**2+ap[1]**2), np.sqrt(bp[0]**2+bp[1]**2)) def _score(self): filt = np.zeros(self.hm_sent.shape, dtype=np.uint8) filt = cv2.line(filt, tuple(self.p1), tuple(self.p2), (255,255,255), self.maxw) return np.sum((self.hm_word > 0.015)[filt!=0]) def dist(self, other): if self._intersect(self.p1, self.p2, other.p1, other.p2): return 0 d1 = min(self._distance(self.p1, self.p2, other.p1), self._distance(self.p1, self.p2, other.p2)) d2 = min(self._distance(other.p1, other.p2, self.p1), self._distance(other.p1, other.p2, self.p2)) return min(d1, d2) def cross(self, other): return self.dist(other) < (self.maxw + other.maxw)/2 def contain(self, other): if self._intersect(self.p1, self.p2, other.p1, other.p2): return True d = max(self._distance(self.p1, self.p2, other.p1), self._distance(self.p1, self.p2, other.p2)) return d < self.maxw
import numpy as np def non_max_suppression(boxes, overlapThresh=0.2): if len(boxes) == 0: return [] sorted_box = sorted(boxes, key=lambda x:x.score())[::-1] ignore_flg = [False] * len(sorted_box) for i in range(len(sorted_box)): if not ignore_flg[i]: for j in range(i+1,len(sorted_box),1): r1 = sorted_box[i] r2 = sorted_box[j] if r1.x1 <= r2.x2 and r2.x1 <= r1.x2 and r1.y1<= r2.y2 and r2.y1 <= r1.y2: w = max(0, min(r1.x2,r2.x2) - max(r1.x1,r2.x1)) h = max(0, min(r1.y2,r2.y2) - max(r1.y1,r2.y1)) if w * h > (r2.x2-r2.x1)*(r2.y2-r2.y1)*overlapThresh: ignore_flg[j] = True return [sorted_box[i] for i in range(len(sorted_box)) if not ignore_flg[i]] def column_wordlines(bbox, overlapThresh=0.1, overlapThresh_line=0.6): def _1dim_non_suppression(ranges, overlapThresh): if len(ranges) == 0: return [] ignore_flg = [False] * len(ranges) for i in range(len(ranges)): if not ignore_flg[i]: for j in range(i+1,len(ranges),1): r1 = ranges[i] r2 = ranges[j] w = max(0, min(r1[1],r2[1]) - max(r1[0],r2[0])) if w > (r2[1]-r2[0])*overlapThresh: ignore_flg[j] = True return [ranges[i] for i in range(len(ranges)) if not ignore_flg[i]] box_range_x = [(b.x1,b.x2) for b in bbox] box_range_y = [(b.y1,b.y2) for b in bbox] cols = _1dim_non_suppression(box_range_x, overlapThresh) rows = _1dim_non_suppression(box_range_y, overlapThresh) stocked_flg = [False] * len(bbox) lines = [] if len(cols) < len(rows): # 縦書き for c in cols: stocks = [] for i in range(len(bbox)): if not stocked_flg[i]: if c[0] < bbox[i].x2 and c[1] > bbox[i].x1: w = max(0, min(c[1],bbox[i].x2) - max(c[0],bbox[i].x1)) if w > (bbox[i].x2-bbox[i].x1)*overlapThresh_line: stocks.append(bbox[i]) stocked_flg[i] = True lines.append(sorted(stocks, key=lambda x:x.y1)) lines = sorted(lines, key=lambda x: np.mean([y.x1 for y in x])) else: # 横書き for r in rows: stocks = [] for i in range(len(bbox)): if not stocked_flg[i]: if r[0] < bbox[i].y2 and r[1] > bbox[i].y1: h = max(0, min(r[1],bbox[i].y2) - max(r[0],bbox[i].y1)) if h >= (bbox[i].y2-bbox[i].y1)*overlapThresh_line: stocks.append(bbox[i]) stocked_flg[i] = True lines.append(sorted(stocks, key=lambda x:x.x1)) lines = sorted(lines, key=lambda x: np.mean([y.y1 for y in x])) return lines
hiragana = \ ['あ','い','う','え','お','か','き','く','け','こ','さ','し','す','せ','そ','た','ち','つ','て', 'と','な','に','ぬ','ね','の','は','ひ','ふ','へ','ほ','ま','み','む','め','も','ら','り','る', 'れ','ろ','が','ぎ','ぐ','げ','ご','ざ','じ','ず','ぜ','ぞ','だ','ぢ','づ','で','ど','ば','び', 'ぶ','べ','ぼ','ぱ','ぴ','ぷ','ぺ','ぽ','や','ゆ','よ','わ','を','ん'] katakana = \ ['ア','イ','ウ','エ','オ','カ','キ','ク','ケ','コ','サ','シ','ス','セ','ソ','タ','チ','ツ','テ', 'ト','ナ','ニ','ヌ','ネ','ノ','ハ','ヒ','フ','ヘ','ホ','マ','ミ','ム','メ','モ','ラ','リ','ル', 'レ','ロ','ガ','ギ','グ','ゲ','ゴ','ザ','ジ','ズ','ゼ','ゾ','ダ','ヂ','ヅ','デ','ド','バ','ビ', 'ブ','ベ','ボ','パ','ピ','プ','ペ','ポ','ヤ','ユ','ヨ','ワ','ヲ','ン','ー',] alphabet_upper = \ ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V', 'W','X','Y','Z'] alphabet_lower = \ ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v', 'w','x','y','z'] numetric = ['0','1','2','3','4','5','6','7','8','9'] alphabet_num = alphabet_upper + alphabet_lower + numetric kigou = \ ['(',')','[',']','「','」','『','』','<','>','¥','/','÷','*','+','×','?','=','〜','|',':', ';','。','、','.',','] jyouyou_kanji = \ ['亜','哀','挨','愛','曖','悪','握','圧','扱','宛','嵐','安','案','暗','以','衣','位','囲','医', '依','委','威','為','畏','胃','尉','異','移','萎','偉','椅','彙','意','違','維','慰','遺','緯', '域','育','一','壱','逸','茨','芋','引','印','因','咽','姻','員','院','淫','陰','飲','隠','韻', '右','宇','羽','雨','唄','鬱','畝','浦','運','雲','永','泳','英','映','栄','営','詠','影','鋭', '衛','易','疫','益','液','駅','悦','越','謁','閲','円','延','沿','炎','怨','宴','媛','援','園', '煙','猿','遠','鉛','塩','演','縁','艶','汚','王','凹','央','応','往','押','旺','欧','殴','桜', '翁','奥','横','岡','屋','億','憶','臆','虞','乙','俺','卸','音','恩','温','穏','下','化','火', '加','可','仮','何','花','佳','価','果','河','苛','科','架','夏','家','荷','華','菓','貨','渦', '過','嫁','暇','禍','靴','寡','歌','箇','稼','課','蚊','牙','瓦','我','画','芽','賀','雅','餓', '介','回','灰','会','快','戒','改','怪','拐','悔','海','界','皆','械','絵','開','階','塊','楷', '解','潰','壊','懐','諧','貝','外','劾','害','崖','涯','街','慨','蓋','該','概','骸','垣','柿', '各','角','拡','革','格','核','殻','郭','覚','較','隔','閣','確','獲','嚇','穫','学','岳','楽', '額','顎','掛','潟','括','活','喝','渇','割','葛','滑','褐','轄','且','株','釜','鎌','刈','干', '刊','甘','汗','缶','完','肝','官','冠','巻','看','陥','乾','勘','患','貫','寒','喚','堪','換', '敢','棺','款','間','閑','勧','寛','幹','感','漢','慣','管','関','歓','監','緩','憾','還','館', '環','簡','観','韓','艦','鑑','丸','含','岸','岩','玩','眼','頑','顔','願','企','伎','危','机', '気','岐','希','忌','汽','奇','祈','季','紀','軌','既','无','記','起','飢','鬼','帰','基','寄', '規','亀','喜','幾','揮','期','棋','貴','棄','毀','旗','器','畿','輝','機','騎','技','宜','偽', '欺','義','疑','儀','戯','擬','犠','議','菊','吉','喫','詰','却','客','脚','逆','虐','九','久', '及','弓','丘','旧','休','吸','朽','臼','求','究','泣','急','級','糾','宮','救','球','給','嗅', '窮','牛','去','巨','居','拒','拠','挙','虚','許','距','魚','御','漁','凶','共','叫','狂','京', '享','供','協','況','峡','挟','狭','恐','恭','胸','脅','強','教','郷','境','橋','矯','鏡','競', '響','驚','仰','暁','業','凝','曲','局','極','玉','巾','斤','均','近','金','菌','勤','琴','筋', '僅','禁','緊','錦','謹','襟','吟','銀','区','句','苦','駆','具','惧','愚','空','偶','遇','隅', '串','屈','掘','窟','熊','繰','君','訓','勲','薫','軍','郡','群','兄','刑','形','系','径','茎', '係','型','契','計','恵','啓','掲','渓','経','蛍','敬','景','軽','傾','携','継','詣','慶','憬', '稽','憩','警','鶏','芸','艸','迎','鯨','隙','劇','撃','激','桁','欠','穴','血','決','結','傑', '潔','月','犬','件','見','券','肩','建','研','県','倹','兼','剣','拳','軒','健','険','圏','堅', '検','嫌','献','絹','遣','権','憲','賢','謙','鍵','繭','顕','験','懸','元','幻','玄','言','弦', '限','原','現','舷','減','源','厳','己','戸','古','呼','固','股','虎','孤','弧','故','枯','個', '庫','湖','雇','誇','鼓','錮','顧','五','互','午','呉','後','娯','悟','碁','語','誤','護','口', '工','公','勾','孔','功','巧','広','甲','交','光','向','后','好','江','考','行','坑','孝','抗', '攻','更','効','幸','拘','肯','侯','厚','恒','洪','皇','紅','荒','郊','香','候','校','耕','航', '貢','降','高','康','控','梗','黄','喉','慌','港','硬','絞','項','溝','鉱','構','綱','酵','稿', '興','衡','鋼','講','購','乞','号','合','拷','剛','傲','豪','克','告','谷','刻','国','黒','穀', '酷','獄','骨','駒','込','頃','今','困','昆','恨','根','婚','混','痕','紺','魂','墾','懇','左', '佐','沙','査','砂','唆','差','詐','鎖','座','挫','才','再','災','妻','采','砕','宰','栽','彩', '採','済','祭','斎','細','菜','最','裁','債','催','塞','歳','載','際','埼','在','材','剤','財', '罪','崎','作','削','昨','柵','索','策','酢','搾','錯','咲','冊','札','刷','刹','拶','殺','察', '撮','擦','雑','皿','三','山','参','桟','蚕','惨','産','傘','散','算','酸','賛','残','斬','暫', '士','子','支','止','氏','仕','史','司','四','市','矢','旨','死','糸','糸','至','伺','志','私', '使','刺','始','姉','枝','祉','肢','姿','思','指','施','師','恣','紙','脂','視','紫','詞','歯', '嗣','試','詩','資','飼','誌','雌','摯','賜','諮','示','字','寺','次','耳','自','似','児','事', '侍','治','持','時','滋','慈','辞','磁','餌','璽','鹿','式','識','軸','七','𠮟','失','室','疾', '執','湿','嫉','漆','質','実','芝','写','社','車','舎','者','射','捨','赦','斜','煮','遮','謝', '邪','蛇','勺','尺','借','酌','釈','爵','若','弱','寂','手','主','守','朱','取','狩','首','殊', '珠','酒','腫','種','趣','寿','受','呪','授','需','儒','樹','収','囚','州','舟','秀','周','宗', '拾','秋','臭','修','袖','終','羞','習','週','就','衆','集','愁','酬','醜','蹴','襲','十','汁', '充','住','柔','重','従','渋','銃','獣','縦','叔','祝','宿','淑','粛','縮','塾','熟','出','述', '術','俊','春','瞬','旬','巡','盾','准','殉','純','循','順','準','潤','遵','処','初','所','書', '庶','暑','署','緒','諸','女','如','助','序','叙','徐','除','小','升','少','召','匠','床','抄', '肖','尚','招','承','昇','松','沼','昭','宵','将','消','症','祥','称','笑','唱','商','渉','章', '紹','訟','勝','掌','晶','焼','焦','硝','粧','詔','証','言','象','傷','奨','照','詳','彰','障', '憧','衝','賞','償','礁','鐘','上','丈','冗','条','状','乗','城','浄','剰','常','情','場','畳', '蒸','縄','壌','嬢','錠','譲','醸','色','拭','食','植','殖','飾','触','嘱','織','職','辱','尻', '心','申','伸','臣','芯','身','辛','侵','信','津','神','唇','娠','振','浸','真','針','深','紳', '進','森','診','寝','慎','新','審','震','薪','親','人','刃','仁','尽','迅','甚','陣','尋','腎', '須','図','水','吹','垂','炊','帥','粋','衰','推','酔','遂','睡','穂','錘','随','髄','枢','崇', '数','据','杉','裾','寸','瀬','是','井','世','正','生','成','西','声','制','姓','征','性','青', '斉','政','星','牲','省','凄','逝','清','盛','婿','晴','勢','聖','誠','精','製','誓','静','請', '整','醒','税','夕','斥','石','赤','昔','析','席','脊','隻','惜','戚','責','跡','積','績','籍', '切','折','拙','窃','接','設','雪','摂','節','説','舌','絶','千','川','仙','占','先','宣','専', '泉','浅','洗','染','扇','栓','旋','船','戦','煎','羨','腺','詮','践','箋','銭','銑','潜','線', '遷','選','薦','繊','鮮','全','前','善','然','禅','漸','膳','繕','狙','阻','祖','租','素','措', '粗','組','疎','訴','塑','遡','礎','双','壮','早','争','走','奏','相','荘','草','送','倉','捜', '挿','桑','巣','掃','曹','曽','爽','窓','創','喪','痩','葬','装','僧','想','層','総','遭','槽', '踪','操','燥','霜','騒','藻','造','像','増','憎','蔵','贈','臓','即','束','足','促','則','息', '捉','速','側','測','俗','族','属','賊','続','卒','率','存','村','孫','尊','損','遜','他','多', '汰','打','妥','唾','堕','惰','駄','太','対','体','人','耐','待','怠','胎','退','帯','泰','堆', '袋','逮','替','貸','隊','滞','態','戴','大','代','台','口','第','題','滝','宅','択','沢','卓', '拓','託','濯','諾','濁','但','達','脱','奪','棚','誰','丹','旦','担','単','炭','胆','探','淡', '短','嘆','端','綻','誕','鍛','団','男','段','断','弾','暖','談','壇','地','池','知','値','恥', '致','遅','痴','稚','置','緻','竹','畜','逐','蓄','築','秩','窒','茶','着','嫡','中','仲','虫', '虫','沖','宙','忠','抽','注','昼','柱','衷','酎','鋳','駐','著','貯','丁','弔','庁','兆','町', '長','挑','帳','張','彫','眺','釣','頂','鳥','朝','脹','貼','超','腸','跳','徴','嘲','潮','澄', '調','聴','懲','直','勅','捗','沈','珍','朕','陳','賃','鎮','追','椎','墜','通','痛','塚','漬', '坪','爪','鶴','低','呈','廷','弟','定','底','抵','邸','亭','貞','帝','訂','庭','逓','停','偵', '堤','提','程','艇','締','諦','泥','的','笛','摘','滴','適','敵','溺','迭','哲','鉄','徹','撤', '天','典','店','点','展','添','転','塡','田','伝','殿','電','斗','吐','妬','徒','途','都','渡', '塗','賭','土','奴','努','度','怒','刀','冬','灯','火','当','投','豆','東','到','逃','倒','凍', '唐','島','桃','討','透','党','悼','盗','陶','塔','搭','棟','湯','痘','登','答','等','筒','統', '稲','踏','糖','頭','謄','藤','闘','鬥','騰','同','洞','胴','動','堂','童','道','働','銅','導', '瞳','峠','匿','特','得','督','徳','篤','毒','独','読','栃','凸','突','届','屯','豚','頓','貪', '鈍','曇','丼','那','奈','内','梨','謎','鍋','南','軟','難','二','尼','弐','匂','肉','虹','日', '入','乳','尿','任','妊','忍','認','寧','熱','年','念','捻','粘','燃','悩','納','能','脳','農', '濃','把','波','派','破','覇','馬','婆','罵','拝','杯','背','肺','俳','配','排','敗','廃','輩', '売','倍','梅','培','陪','媒','買','賠','白','伯','拍','泊','迫','剝','舶','博','薄','麦','漠', '縛','爆','箱','箸','畑','肌','八','鉢','発','髪','伐','抜','罰','閥','反','半','氾','犯','帆', '汎','伴','判','坂','阪','板','版','班','畔','般','販','斑','飯','搬','煩','頒','範','繁','藩', '晩','番','蛮','盤','比','皮','妃','否','批','彼','披','肥','非','卑','飛','疲','秘','被','悲', '扉','費','碑','罷','避','尾','眉','美','備','微','鼻','膝','肘','匹','必','泌','筆','姫','百', '氷','表','俵','票','評','漂','標','苗','秒','病','描','猫','品','浜','水','貧','賓','頻','敏', '瓶','不','夫','父','付','布','扶','府','怖','阜','附','訃','負','赴','浮','婦','符','富','普', '腐','敷','膚','賦','譜','侮','武','部','舞','封','風','伏','服','副','幅','復','福','腹','複', '覆','払','沸','仏','物','粉','紛','雰','噴','墳','憤','奮','分','文','聞','丙','平','兵','併', '並','柄','陛','閉','塀','幣','弊','蔽','餅','米','壁','璧','癖','別','蔑','片','辺','返','変', '偏','遍','編','弁','廾','便','勉','歩','保','哺','捕','補','舗','母','募','墓','慕','暮','簿', '方','包','芳','邦','奉','宝','抱','放','法','泡','胞','俸','倣','峰','砲','崩','訪','報','蜂', '豊','飽','褒','縫','亡','乏','忙','坊','妨','忘','防','房','肪','某','冒','剖','紡','望','傍', '帽','棒','貿','貌','暴','膨','謀','頰','北','木','朴','牧','睦','僕','墨','撲','没','勃','堀', '本','奔','翻','凡','盆','麻','摩','磨','魔','毎','妹','枚','昧','埋','幕','膜','枕','又','末', '抹','万','満','慢','漫','未','味','魅','岬','密','蜜','脈','妙','民','眠','矛','務','無','夢', '霧','娘','名','命','明','迷','冥','盟','銘','鳴','滅','免','面','綿','麺','茂','模','毛','妄', '盲','耗','猛','網','目','黙','門','紋','問','匁','冶','夜','野','弥','厄','役','約','訳','薬', '躍','闇','由','油','喩','愉','諭','輸','癒','唯','友','有','勇','幽','悠','郵','湧','猶','裕', '遊','雄','誘','憂','融','優','与','予','余','人','誉','預','幼','用','羊','妖','洋','要','容', '庸','揚','揺','葉','陽','溶','腰','様','瘍','踊','窯','養','擁','謡','曜','抑','沃','浴','欲', '翌','翼','拉','裸','羅','来','雷','頼','絡','落','酪','辣','乱','卵','覧','濫','藍','欄','吏', '利','里','理','痢','裏','履','璃','離','陸','立','律','慄','略','柳','流','留','竜','粒','隆', '硫','侶','旅','虜','慮','了','両','良','料','涼','猟','陵','量','僚','領','寮','療','瞭','糧', '力','緑','林','厘','倫','輪','隣','臨','瑠','涙','累','塁','類','令','礼','示','冷','励','戻', '例','鈴','零','霊','隷','齢','麗','暦','歴','列','劣','烈','裂','恋','連','廉','練','錬','呂', '炉','賂','路','露','老','労','弄','郎','朗','浪','廊','楼','漏','籠','六','録','麓','論','和', '話','賄','脇','惑','枠','湾','腕'] nihongo = hiragana+katakana+alphabet_num+kigou+jyouyou_kanji nihongo_class = ['']+nihongo filter_word = \ [ ('り', 'リ', katakana, katakana), ('リ', 'り', hiragana, hiragana), ('へ', 'ヘ', katakana, katakana), ('ヘ', 'へ', hiragana, hiragana), ('べ', 'ベ', katakana, katakana), ('ベ', 'べ', hiragana, hiragana), ('ぺ', 'ペ', katakana, katakana), ('ペ', 'ぺ', hiragana, hiragana), ('か', 'ガ', katakana, katakana), ('ガ', 'か', hiragana, hiragana), ('口', 'ロ', katakana, katakana), ('ロ', '口', jyouyou_kanji, jyouyou_kanji), ('工', 'エ', katakana, katakana), ('エ', '工', jyouyou_kanji, jyouyou_kanji), ('二', 'ニ', katakana, katakana), ('こ', 'ニ', katakana, katakana), ('ニ', '二', jyouyou_kanji, jyouyou_kanji), ('こ', '二', jyouyou_kanji, jyouyou_kanji), ('ニ', 'こ', hiragana, hiragana), ('二', 'こ', hiragana, hiragana), ('一', 'ー', katakana, katakana), ('ー', '一', jyouyou_kanji, jyouyou_kanji), ('七', 'セ', katakana, katakana), ('セ', '七', jyouyou_kanji, jyouyou_kanji), ('八', 'ハ', katakana, katakana), ('ハ', '八', jyouyou_kanji, jyouyou_kanji), ('力', 'カ', katakana, katakana), ('刀', 'カ', katakana, katakana), ('カ', '力', jyouyou_kanji, jyouyou_kanji), ('カ', '刀', jyouyou_kanji, jyouyou_kanji), ('干', 'チ', katakana, katakana), ('千', 'チ', katakana, katakana), ('チ', '干', jyouyou_kanji, jyouyou_kanji), ('チ', '千', jyouyou_kanji, jyouyou_kanji), ('手', 'キ', katakana, katakana), ('キ', '手', jyouyou_kanji, jyouyou_kanji), ('J', 'ノ', katakana, katakana), ('ノ', 'J', alphabet_upper, alphabet_upper), ('ノ', 'J', None, alphabet_lower), ('j', 'ノ', katakana, katakana), ('ノ', 'j', alphabet_lower, alphabet_lower), ('T', '丁', jyouyou_kanji, jyouyou_kanji), ('丁', 'T', alphabet_upper, alphabet_upper), ('丁', 'T', None, alphabet_lower), ('0', 'o', alphabet_lower+alphabet_upper, alphabet_lower), ('0', 'O', alphabet_upper, alphabet_upper), ('o', '0', numetric, numetric), ('O', '0', numetric, numetric), ('1', 'l', alphabet_lower+alphabet_upper, alphabet_lower), ('|', 'l', alphabet_lower+alphabet_upper, alphabet_lower), ('1', 'I', alphabet_upper, alphabet_upper), ('|', 'I', alphabet_upper, alphabet_upper), ('1', 'I', None, alphabet_lower), ('|', 'I', None, alphabet_lower), ('l', '1', numetric, numetric), ('I', '1', numetric, numetric), ('|', '1', numetric, numetric), ('フ', '7', numetric, numetric), ('7', 'フ', katakana, katakana), ('5', 's', alphabet_lower+alphabet_upper, alphabet_lower), ('5', 'S', alphabet_upper, alphabet_upper), ('s', '5', numetric, numetric), ('S', '5', numetric, numetric), ('K', 'k', alphabet_lower, alphabet_lower), ('k', 'K', alphabet_upper, alphabet_upper), ('O', 'o', alphabet_lower, alphabet_lower), ('o', 'O', alphabet_upper, alphabet_upper), ('P', 'p', alphabet_lower, alphabet_lower), ('p', 'P', alphabet_upper, alphabet_upper), ('S', 's', alphabet_lower, alphabet_lower), ('s', 'S', alphabet_upper, alphabet_upper), ('U', 'u', alphabet_lower, alphabet_lower), ('u', 'U', alphabet_upper, alphabet_upper), ('V', 'v', alphabet_lower, alphabet_lower), ('v', 'V', alphabet_upper, alphabet_upper), ('V', 'v', alphabet_lower, alphabet_lower), ('v', 'V', alphabet_upper, alphabet_upper), ('W', 'w', alphabet_lower, alphabet_lower), ('w', 'W', alphabet_upper, alphabet_upper), ('X', 'x', alphabet_lower, alphabet_lower), ('x', 'X', alphabet_upper, alphabet_upper), ('Y', 'y', alphabet_lower, alphabet_lower), ('y', 'Y', alphabet_upper, alphabet_upper), ('Z', 'z', alphabet_lower, alphabet_lower), ('z', 'Z', alphabet_upper, alphabet_upper), ('十', '+', kigou, kigou), ('t', '+', kigou, kigou), ('メ', '+', kigou, kigou), ('+', '十', jyouyou_kanji, jyouyou_kanji), ('t', '十', jyouyou_kanji, jyouyou_kanji), ('メ', '十', jyouyou_kanji, jyouyou_kanji), ('+', 't', alphabet_lower+alphabet_upper, alphabet_lower), ('十', 't', alphabet_lower+alphabet_upper, alphabet_lower), ('メ', 'y', alphabet_lower+alphabet_upper, alphabet_lower), ('二', '=', kigou, kigou), ('ニ', '=', kigou, kigou), ('こ', '=', kigou, kigou), ('=', '二', jyouyou_kanji, jyouyou_kanji), ('=', 'ニ', katakana, katakana), ('=', 'こ', hiragana, hiragana), ('。', 'o', alphabet_lower+alphabet_upper, alphabet_lower), ('。', 'O', alphabet_upper, alphabet_upper), ('。', '0', numetric, numetric), ('o', '。', hiragana + katakana + jyouyou_kanji, None), ('O', '。', hiragana + katakana + jyouyou_kanji, None), ('0', '。', hiragana + katakana + jyouyou_kanji, None), ('2', 'っ', hiragana + jyouyou_kanji, ['た','ち','つ','て','と']), ('?', 'っ', hiragana + jyouyou_kanji, ['た','ち','つ','て','と']), ('つ', 'っ', hiragana + jyouyou_kanji, ['た','ち','つ','て','と']), ('ツ', 'ッ', katakana + jyouyou_kanji, ['タ','チ','ツ','テ','ト']), ('や', 'ゃ', ['き','し','ち','に','み','り','ぎ','ぢ','じ'], None), ('ゆ', 'ゅ', ['き','し','ち','に','み','り','ぎ','ぢ','じ'], None), ('よ', 'ょ', ['き','し','ち','に','み','り','ぎ','ぢ','じ'], None), ('ヤ', 'ャ', ['キ','シ','チ','ニ','ミ','リ','ギ','ヂ','ジ'], None), ('ユ', 'ュ', ['キ','シ','チ','ニ','ミ','リ','ギ','ヂ','ジ'], None), ('ヨ', 'ョ', ['キ','シ','チ','ニ','ミ','リ','ギ','ヂ','ジ'], None) ]
import numpy as np import cv2 import gc import os import json import argparse import torch from nets.detectionnet import get_detectionnet from nets.classifiernet import get_classifiernet from misc.nihongo import nihongo_class, filter_word from misc.detection import Detector from misc.nms import non_max_suppression, column_wordlines parser = argparse.ArgumentParser() parser.add_argument('images', metavar='file', type=str, nargs='+', help='input image files') parser.add_argument("--dpi", type=int, default=-1, help="image dpi") parser.add_argument('--cpu', action='store_true', help="CPU mode (no GPU)") parser.add_argument('--output_format', type=str, default="row", help="output format", choices=['row', 'json']) parser.add_argument('--output_detect_img', action='store_true', help="output detected bounding box") parser.add_argument('--low_gpu_memory', action='store_true', help="reduce gpu memory usage") args = parser.parse_args() def main(): d = [] for f in args.images: if os.path.isfile(f): d.append(f) elif os.path.isdir(f): d.extend([f+'/'+a for a in os.listdir(f)]) else: print('Input file "%s" in not file or directory.'%file) return ocr_result = get_ocr(d, dpi=args.dpi, use_cuda=(not args.cpu), output_detect_img=args.output_detect_img, low_gpu_memory=args.low_gpu_memory) if args.output_format == 'json': print(json.dumps(ocr_result, ensure_ascii=False)) else: for r in ocr_result: print('file "%s" detected in %d dpi.'%(r['filename'],r['detected_dpi'])) for b in r['blocks']: print('[Block #%d]'%b['id']) for s in b['sentences']: print(s['sent']) def filter_block(sent): for i in range(len(sent)): for j in range(len(filter_word)): if filter_word[j][0] == sent[i]: if filter_word[j][2] == "": bef = (i==0) else: bef = filter_word[j][2] is None or (i>0 and sent[i-1] in filter_word[j][2]) if filter_word[j][3] == "": aft = (i==len(sent)-1) else: aft = filter_word[j][3] is None or (i<len(sent)-1 and sent[i+1] in filter_word[j][3]) if bef and aft: sent[i] = filter_word[j][1] def get_ocr(filelist,dpi,use_cuda=True,output_detect_img=False,low_gpu_memory=False): det_model = 'models/detectionnet.model' cls_model = 'models/classifiernet.model' if not (os.path.isfile(det_model) and os.path.isfile(cls_model)): print('Model file not found.') return for file in filelist: if not os.path.isfile(file): print('Input file "%s" not found.'%file) return model = get_detectionnet() if use_cuda: model.load_state_dict(torch.load(det_model)) else: model.load_state_dict(torch.load(det_model, map_location=torch.device('cpu'))) dt = Detector(use_cuda=use_cuda, low_gpu_memory=low_gpu_memory) detections = [] for file in filelist: im = cv2.imread(file) if im is None: print('Cannot read input file "%s".'%file) return if len(im.shape) == 3: im = cv2.cvtColor(im, cv2.COLOR_RGB2GRAY) elif len(im.shape) != 2: print('Cannot read input file "%s".'%file) return d = dt.detect_image(model, im, dpi) detections.append(d) del model torch.cuda.empty_cache() model = get_classifiernet(len(nihongo_class) + 2) if use_cuda: model.load_state_dict(torch.load(cls_model)) else: model.load_state_dict(torch.load(cls_model, map_location=torch.device('cpu'))) boundings = [] for d in detections: b = dt.bounding_box(model, d) b = non_max_suppression(b) boundings.append(b) del model torch.cuda.empty_cache() results = [] for file, dtct, bbox in zip(filelist, detections, boundings): detected_dpi, _, gray_img, scale_image, _ = dtct detect_file = {'filename':file,'detected_dpi':detected_dpi,'blocks':[]} if output_detect_img: detect_img = cv2.imread(file, cv2.IMREAD_GRAYSCALE) detect_img = cv2.cvtColor(detect_img, cv2.COLOR_GRAY2RGB) if len(bbox) > 0: for i in range(max([b.sentenceindex for b in bbox])+1): bbox_sent = [b for b in bbox if b.sentenceindex == i] cbox = column_wordlines(bbox_sent) block_one = {'id':i,'sentences':[]} for c in cbox: blk = [] box = [] for b in c: n, s = b.word() if s > 0.3: x1, y1, x2, y2 = b.x1 * 2, b.y1 * 2, b.x2 * 2, b.y2 * 2 x1 = int(np.round(x1 * scale_image[0])) y1 = int(np.round(y1 * scale_image[1])) x2 = int(np.round(x2 * scale_image[0])) y2 = int(np.round(y2 * scale_image[1])) x1 = min(max(0, x1), gray_img.shape[1]) y1 = min(max(0, y1), gray_img.shape[0]) x2 = min(max(0, x2), gray_img.shape[1]) y2 = min(max(0, y2), gray_img.shape[0]) blk.append(nihongo_class[n]) box.append((x1, y1, x2, y2, float(s))) if output_detect_img: cv2.rectangle(detect_img, (x1,y1), (x2,y2), (255,0,0), 2) filter_block(blk) sent = ''.join(blk) sent_one = {'sent':sent,'bbox':[]} for w,b in zip(blk,box): box_one = {'word':w,'box':[b[0],b[1],b[2],b[3]],'score':b[4]} sent_one['bbox'].append(box_one) block_one['sentences'].append(sent_one) detect_file['blocks'].append(block_one) if output_detect_img: bb = [b for s in block_one['sentences'] for b in s['bbox']] if len(bb) > 0: x1 = min([b['box'][0] for b in bb]) y1 = min([b['box'][1] for b in bb]) x2 = max([b['box'][2] for b in bb]) y2 = max([b['box'][3] for b in bb]) cv2.rectangle(detect_img, (x1,y1), (x2,y2), (0,255,0), 2) cv2.putText(detect_img, '#%d'%block_one['id'], (x1,y1), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2, cv2.LINE_AA) if output_detect_img: cv2.imwrite(file+'-detections.png', detect_img) results.append(detect_file) return results if __name__ == '__main__': main()
(py37) $ cd ~/workspace_py37/chapter08 (py37) $ cp -r ~/workspace_py37/sample/chapt08/nets/ ./
import torch import torch.nn as nn import torch.nn.functional as F class CharactorNet(nn.Module): def __init__(self, n_class, n_convs, n_channels): super(CharactorNet, self).__init__() channel = 42 # 最初のチャンネル数 layers = [nn.Conv2d(1, channel, 3, 1, 1), # 最初の畳み込み層 nn.ReLU(), # この下が最初のDepthwise Convolution層 nn.Conv2d(channel,channel,3,1,1,groups=channel), nn.ReLU(), nn.BatchNorm2d(channel)] for i in range(len(n_convs)): # ブロックのループ in_channel = channel # ブロックの入力チャンネル channel = n_channels[i] # ブロック内のチャンネル layers.extend([ # ブロック最初のDepthwise Convolution層 nn.Conv2d(in_channel,channel,1,1,bias=False), nn.ReLU(), nn.Conv2d(channel,channel,3,1,1,groups=channel), nn.ReLU(), nn.BatchNorm2d(channel) ]) # ブロックにDepthwise Convolution層を追加 for _ in range(n_convs[i]-1): layers.extend([ nn.Conv2d(channel,channel,1,1,bias=False), nn.ReLU(), nn.Conv2d(channel,channel,3,1,1,groups=channel), nn.ReLU(), nn.BatchNorm2d(channel) ]) # ブロックの最後はMaxPooling層 if i < len(n_convs)-1: layers.extend([ nn.MaxPool2d(3,2,1), ]) self.layer = nn.Sequential(*layers) # 畳み込み層 self.fc = nn.Linear(channel,n_class) # 全結合層 def forward(self, input): x = self.layer(input) # 畳み込み層を実行 x = F.adaptive_avg_pool2d(x, (1, 1)) # チャンネル方向へ平均する x = x.view(x.size(0), -1) # バッチ×一次元のデータにする result = self.fc(x) # クラス分類を行う return result def get_classifiernet(num_class): model = CharactorNet(num_class, n_convs=[1,4,9,1], # ブロック内の畳み込み層の数 n_channels=[128,384,1152,3456]) # チャンネル数 return model
(py37) $ cd ~/workspace_py37/chapter08 (py37) $ mkdir misc (py37) $ cp OCR_Japanease/misc/nihongo.py misc/nihongo.py
(py37) $ cd ~/workspace_py37/chapter08 (py37) $ cp OCR_Japanease/models/classifiernet.model ./classifiernet_orig.model
(py37) $ cd ~/workspace_py37/chapter08 (py37) $ mv ~/ダウンロード/IPAexfont00401.zip ./ (py37) $ unzip IPAexfont00401.zip Archive: IPAexfont00401.zip creating: IPAexfont00401/ inflating: IPAexfont00401/ipaexg.ttf inflating: IPAexfont00401/ipaexm.ttf inflating: IPAexfont00401/IPA_Font_License_Agreement_v1.0.txt inflating: IPAexfont00401/Readme_IPAexfont00401.txt (py37) $ mkdir fonts (py37) $ mv IPAexfont00401/ipaexg.ttf ./fonts (py37) $ mv IPAexfont00401/ipaexm.ttf ./fonts
(py37) cd ~/workspace_py37/chapter08 (py37) $ cp ~/workspace_py37/sample/chapt08/chapt08_1.py chapt08_1a.py
~/workspace_py37/chapter08 ├── OCR_Japanease ├── chapt08_1a.py ├── classifiernet_orig.model ├── fonts │ ├── ipaexg.ttf │ └── ipaexm.ttf ├── misc │ └── nihongo.py └── nets └── classifiernet.py
# -*- coding: utf-8 -*- ##------------------------------------------ ## 「PyTorch で始める AI開発」 ## Chapter 08 / Section 024 ## OCRにおける文字認識/モデルを学習させる ## ## 2021.10.11 Masahiro Izutsu ##------------------------------------------ ## chapt08_1a.py (original: chapt08_1.py) import os import numpy as np from tqdm import tqdm from PIL import Image, ImageFont, ImageDraw import torch import torch.nn as nn from torchvision import transforms from nets.classifiernet import get_classifiernet from misc.nihongo import nihongo_class # GPUを使うかどうか USE_DEVICE = 'cuda:0' if torch.cuda.is_available() else 'cpu' IMAGE_SIZE = 56 # 文字画像のサイズ BATCH_SIZE = 16 # 学習時のバッチサイズ NUM_WORKERS = 4 # 読み込みスレッド数 NUM_EPOCHS = 20 # 学習エポック数 # PyTorchの内部を決定論的に設定する torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False # 乱数を初期化する np.random.seed(0) torch.manual_seed(0) # フォントを読み込む font_path = os.listdir('fonts/') # フォントファイルのリスト ttf_fonts = [ImageFont.truetype('fonts/'+fp, 56) for fp in font_path] # 文字部分のみを切り出すtransform class CharactorCrop(nn.Module): def __init__(self): super().__init__() def forward(self, img): # Numpyにする aimg = np.array(img) where = np.where(aimg < 128) # 文字が書かれているピクセルの場所 # 文字が書かれた部分を切り出す if len(where[0]) > 0 and len(where[1]) > 1: _x1,_x2 = min(where[1]),max(where[1]) _y1,_y2 = min(where[0]),max(where[0]) x1 = max(0,_x1-1) y1 = max(0,_y1-1) x2 = min(_x2+1,aimg.shape[1]) y2 = min(_y2+1,aimg.shape[0]) aimg = aimg[y1:y2,x1:x2] # 0から1の値にする aimg = np.clip(aimg, 0, 255).astype(np.float32) / 255. # 1チャンネルの画像Tensorにする aimg = aimg.reshape((1,aimg.shape[0],aimg.shape[1])) return torch.tensor(aimg, dtype=torch.float32) # 文字画像を生成して返すクラス class MyDataset(object): def __init__(self): # DAを行うtransforms self.trans = transforms.Compose([ transforms.RandomRotation(degrees=15, fill=255), transforms.RandomPerspective(p=1.0, fill=255), CharactorCrop(), # 文字の場所だけ切り出す transforms.Resize((IMAGE_SIZE,IMAGE_SIZE)), ]) def __getitem__(self, idx): # 文字画像を生成して返す word_class = idx // len(ttf_fonts) ttf_font = ttf_fonts[idx % len(ttf_fonts)] if word_class == 0: # not a word message = '■' # 「文字ではない」クラスで使用する文字 else: message = nihongo_class[word_class] # 文字 # 画像に文字を書く bgcolor = np.random.randint(128) + 127 # 背景色 fgcolor = np.random.randint(128) # 文字色 img = Image.new(size=(128,128), mode='L', color=bgcolor) draw = ImageDraw.Draw(img) draw.text((30,30), message, font=ttf_font, fill=fgcolor) # 画像とクラスを返す return self.trans(img), torch.tensor(word_class, dtype=torch.int64) def __len__(self): # フォントファイル数×文字数 return len(nihongo_class) * len(ttf_fonts) # OCRプログラムに合わせたクラス数でモデルを作る model = get_classifiernet(len(nihongo_class) + 2) # 学習済みモデルをファインチューニングする model.load_state_dict(torch.load('classifiernet_orig.model', map_location=torch.device(USE_DEVICE))) model.to(USE_DEVICE) # GPUを使うときはGPUメモリに乗せる model.train() # モデルを学習用に設定する # 学習の準備 optimizer = torch.optim.Adam(model.parameters(), lr=2e-5) loss = nn.CrossEntropyLoss() # データセットの作成 dataset = MyDataset() # 別スレッドでデータを読み込む data_loader = torch.utils.data.DataLoader( dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=NUM_WORKERS) # 学習ループ for epoch in tqdm(range(NUM_EPOCHS)): for X, y in data_loader: # 画像を読み込んでtensorにする X = X.to(USE_DEVICE) # GPUを使うときはGPUメモリ上に乗せる y = y.to(USE_DEVICE) # GPUを使うときはGPUメモリ上に乗せる # ニューラルネットワークを実行 res = model(X) # 合計の損失値を求める losses = loss(res, y) # 新しいバッチ分の学習を行う optimizer.zero_grad() # 一つ前の勾配をクリア losses.backward() # 損失値を逆伝播させる optimizer.step() # 新しい勾配からパラメーターを更新する # 最終的なモデルを保存する torch.save(model.state_dict(), 'chapt08-model1.pth')
(py37) > cd \workspace_py37\chapter08 (py37) > python chapt08_1w.py 100%|█████████████████████████████████████████| 20/20 [33:30<00:00, 100.54s/it]・CPU (Intel® Core™ i7-1185G7)
(py37) $ cd \workspace_py37\chapter08 (py37) $ python3 chapt08_1a.py 100%|██████████████████████████████████████| 20/20 [49:23:56<00:00, 8891.84s/it]
機種 | 開始日時 | 終了日時 | 処理時間 (h:m) |
GeForce GTX 1050 Ti Intel® Core™ i7-6700 | 10/13 04:31 | 10/13 05:05 | 00:34 |
DELL Latitude 7520 Intel® Core™ i7-1185G7 CPU | 10/13 12:50 | 10/15 15:14 | 49:24 |
(py37) $ cd ~/workspace_py37/chapter08/OCR_Japanease/ (py37) $ cp ../chapt08-model1.pth models/classifiernet.model (py37) $ python3 ocr_japanease.py --cpu testshot1.png file "testshot1.png" detected in 72 dpi. [Block #0] コロナウイルスにまけるな [Block #1] がんばろう [Block #2] 日本※ ファインチューニング後の学習モデルで正常に動作することが確認できる。
PukiWiki 1.5.2 © 2001-2019 PukiWiki Development Team. Powered by PHP 7.4.3-4ubuntu2.19. HTML convert time: 0.031 sec.