GAN-e4e2 のバックアップ(No.7) - PukiWiki

[ トップ ] [ 一覧 | 検索 | 履歴 | ログイン ]

私的AI研究会 > GAN-e4e2

StyleGANを使った画像編集：StyleGAN e4e（その２）== 編集中 == †

age+pose_03.gif

　「StyleGAN-e4e」で高速の画像編集する
　ローカル環境で「StyleGAN-e4e2」を動かす

StyleGANを使った画像編集：StyleGAN e4e（その２）== 編集中 ==
参考資料

※ 最終更新:2024/08/08　

「StyleGAN e4e」 †

概要 †

StyleGAN を使った画像編集では「潜在変数の推定」プロセスに時間が掛かる
これをを高速に実行できる e4e（encoder4editing）という技術がある
e4e は StyleGAN の学習済みモデルを使用して、画像を入力すると目的の潜在変数を直接出力する専用エンコーダを作る手法

モデル概要図（下記論文所収）

・Encoderに画像を入力すると１つの潜在変数 w とN個のオフセット△が出力され、これらを合成し N個の潜在変数とし Pretrained StyleGAN に入力する
・このとき、元画像と出力画像の誤差を表すロスに加えて、オフセットの分散を表すロスを設定し、これら２つのロスの合計を最小化するように、Encoder のパラメータを学習する

論文「Designing an Encoder for StyleGAN Image Manipulation」
<paper>
・[https://arxiv.org/pdf/2102.02766.pdf>+https://arxiv.org/pdf/2102.02766.pdf]]
<framework>
・https://github.com/cedro3/encoder4editing
<解説>
・http://cedro3.com/ai/e4e/

前回の検証
・StyleGANを使った画像編集：StyleGAN e4e

実行環境の構築 †

仮想環境「py38_learn」で実行する
未作成の場合は → 『仮想環境 (py38_learn)』の手順で仮想環境を作成

GitHub サイトからプロジェクトをダウンロード

cd /anaconda_win/workspace_2　　　　　　　　　　　　　　　　　　　　　← Windows の場合
cd ~/workspace_2　　　　　　　　　　　　　　　　　　　　　　　　　　　← Linux の場合

git clone https://github.com/cedro3/encoder4editing.git

プロジェクト・パッケージ project_e4e.zip (1.10GB) <encoder4editing> をダウンロード
・解凍してできるフォルダ
```
update
└─workspace_2
    └─encoder4editing　　　　　　　　　　　　　　　　　　　　　← GitHub からクローンしたプロジェクトに上書きする
        ├─images
        └─pretrained_models
```
・解凍してできる「update/」フォルダ以下を次のフォルダの下に上書きコピーする
　Windows の場合 →「anaconda_win/」　Linux の場合 → 「~/」

前回 GoogleColab 上の実行内容をローカルマシン対応プログラムに移行「e4e_demo.py」 †

　（※ このプログラムは GPU 環境でのみ動作する）

処理内容
・画像フォルダ内の顔画像を切り出し「./align」フォルダに同じファイル名で保存
・切り出した顔画像から潜在変数を推定し生成した画像を同じファイル名で「./vec_pic」に保存
・推定した潜在変数を同じファイル名（拡張子はjpg → pt）で「./vec」に保存する
・生成した潜在変数を用いて画像編集をおこなう
　(1) age 　　　 : 年齢による顔の変化
　(2) pose　　　 : 顔の水平方向の回転
　(3) smile　　　: 笑顔にする
　(4) age+pose　: 年齢による顔の変化＋顔の水平方向の回転

実行結果

(py38_learn) python e4e_demo.py

StyleGAN e4e demo program   Ver 0.01: Starting application...

   - source_image            :  ./images
   - result_image            :  ./results
   - log                     :  3

   - model path              :  ./pretrained_models/e4e_ffhq_encode.pt
   - align directory         :  ./align
   - vector pict directory   :  ./vec_pic
   - latent variable direct. :  ./vec
   - still image directory   :  ./pic

Loading e4e over the pSp framework from checkpoint: ./pretrained_models/e4e_ffhq_encode.pt
Model successfully loaded!
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:11<00:00,  1.30s/it]
100%|████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 11.45it/s]

 latent = 03.pt, direction = age, min = -50, max = 50
0 -> min: 100%|████████████████████████████████████████████████████████████████████████| 50/50 [00:03<00:00, 15.71it/s]
min -> max: 100%|████████████████████████████████████████████████████████████████████| 100/100 [00:05<00:00, 16.99it/s]
max -> 0: 100%|████████████████████████████████████████████████████████████████████████| 50/50 [00:02<00:00, 16.75it/s]

 latent = 03.pt, direction = pose, min = -50, max = 50
0 -> min: 100%|████████████████████████████████████████████████████████████████████████| 50/50 [00:03<00:00, 16.63it/s]
min -> max: 100%|████████████████████████████████████████████████████████████████████| 100/100 [00:05<00:00, 16.69it/s]
max -> 0: 100%|████████████████████████████████████████████████████████████████████████| 50/50 [00:02<00:00, 17.15it/s]

 latent = 03.pt, direction = smile, min = 0, max = 30
0 -> min: 0it [00:00, ?it/s]
min -> max: 100%|██████████████████████████████████████████████████████████████████████| 30/30 [00:01<00:00, 16.30it/s]
max -> 0: 100%|████████████████████████████████████████████████████████████████████████| 30/30 [00:01<00:00, 17.34it/s]

 latent = 03.pt, direction = age+pose, min = -50, max = 50
0 -> min: 100%|████████████████████████████████████████████████████████████████████████| 50/50 [00:03<00:00, 15.15it/s]
min -> max: 100%|████████████████████████████████████████████████████████████████████| 100/100 [00:05<00:00, 17.33it/s]
max -> 0: 100%|████████████████████████████████████████████████████████████████████████| 50/50 [00:03<00:00, 16.09it/s]

processing start >>      2024/08/07 05:14:03
processing end >>        2024/08/07 05:15:33
processing time >>       0:01:29.451243

Finished.

▼「e4e_demo.py」ソースコード

# -*- coding: utf-8 -*-
##------------------------------------------
##  StyleGAN e4e demo program   Ver 0.01
##    (encoder4editing)
##    StyleGANを使った画像編集をe4eで高速化する
##          http://http://cedro3.com/ai/e4e/
##
##               2024.08.05 Masahiro Izutsu
##------------------------------------------
## e4e_demo.py

import warnings
warnings.simplefilter('ignore')

# Color Escape Code ---------------------------
GREEN = '\033[1;32m'
RED = '\033[1;31m'
NOCOLOR = '\033[0m'
YELLOW = '\033[1;33m'
CYAN = '\033[1;36m'
BLUE = '\033[1;34m'

# インポート＆初期設定
from argparse import Namespace
import time
import os
import sys
import numpy as np
from PIL import Image
import torch
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

import shutil
from tqdm import tqdm
import dlib

from utils.common import tensor2im
from models.psp import pSp  # we use the pSp framework to load the e4e encoder.
from editings import latent_editor
from tqdm import trange
from utils.alignment import align_face

import argparse
from skimage.transform import resize
import ffmpeg
import my_imagetool
import my_logging

# 定数定義
IMAGE_DIR = './images'
RESULT_PATH = './results'
MODEL_PATH = './pretrained_models/e4e_ffhq_encode.pt'
ALIGN_DIR = './align'
VEC_PIC_DIR = './vec_pic'
VEC_DIR = './vec'
PIC_DIR = './pic'
OUT_MOVIE = './output.mp4'

# タイトル
title = 'StyleGAN e4e demo program   Ver 0.01'

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--direction", default='pose', type=str, choices=['age', 'pose', 'smile', 'age+pose'], help="select 'age'/'pose'/'smile'/'age+pose'")
    parser.add_argument("--source_image", default=IMAGE_DIR, help="path to source image")
    parser.add_argument("--result_image", default=RESULT_PATH, help="path to output")
    parser.add_argument('--log', metavar = 'LOG', default = '3', help = 'Log level(-1/0/1/2/3/4/5) Default value is \'3\'')

    return parser

# 基本情報の表示
def display_info(opt, title):
    print('\n' + GREEN + title + ': Starting application...' + NOCOLOR)
    print('\n   - ' + YELLOW + 'source_image            : ' + NOCOLOR, opt.source_image)
    print('   - ' + YELLOW + 'result_image            : ' + NOCOLOR, opt.result_image)
    print('   - ' + YELLOW + 'log                     : ' + NOCOLOR, opt.log)

    print('\n   - ' + 'model path              : ', MODEL_PATH)
    print('   - ' + 'align directory         : ', ALIGN_DIR)
    print('   - ' + 'vector pict directory   : ', VEC_PIC_DIR)
    print('   - ' + 'latent variable direct. : ', VEC_DIR)
    print('   - ' + 'still image directory   : ', PIC_DIR)
    print(' ')

# モデルに学習済みパラメータをロード
def load_model(model_path = MODEL_PATH):
    ckpt = torch.load(model_path, map_location='cpu')
    opts = ckpt['opts']
    opts['checkpoint_path'] = model_path
    opts= Namespace(**opts)
    net = pSp(opts)
    net.eval()
    net.cuda()

    print('Model successfully loaded!')
    return net

# --- 顔画像の切り出し ---
def run_alignment(image_path):
  predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
  aligned_image = align_face(filepath=image_path, predictor=predictor)
  return aligned_image

# 画像フォルダの 顔画像を切り出す
def align_images(image_dir = IMAGE_DIR, align_dir = ALIGN_DIR):
    # align_dir を初期化する
    if os.path.isdir(align_dir):
        shutil.rmtree(align_dir)
    os.makedirs(align_dir, exist_ok = True)

    files = sorted(os.listdir(image_dir))
    for i, file in enumerate(tqdm(files)):
        input_path = image_dir + '/' + file
        align_path = align_dir + '/' + file
        input_image = run_alignment(input_path)
        input_image.resize((256,256))
        input_image.save(align_path)

# 画像の潜在変数の推定
def make_latent_variable(net, align_dir = ALIGN_DIR, vec_pic_dir = VEC_PIC_DIR, vec_dir = VEC_DIR):
    # フォルダの初期化
    if os.path.isdir(vec_pic_dir):
        shutil.rmtree(vec_pic_dir)
    os.makedirs(vec_pic_dir, exist_ok = True)
    if os.path.isdir(vec_dir):
        shutil.rmtree(vec_dir)
    os.makedirs(vec_dir, exist_ok = True)

    img_transforms = transforms.Compose([
            transforms.Resize((256, 256)),
            transforms.ToTensor(),
            transforms.Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5])])

    files = sorted(os.listdir(align_dir))
    for i, file in enumerate(tqdm(files)):
        align_path = align_dir + '/' + file
        vec_pic_path = vec_pic_dir + '/' + file
        vec_path = vec_dir + '/' + file[:-4] + '.pt'

        input_image = Image.open(align_path)
        transformed_image = img_transforms(input_image)

        with torch.no_grad():
            images, latents = net(transformed_image.unsqueeze(0).to('cuda').float(), randomize_noise=False, return_latents=True)
            result_image, latent = images[0], latents[0]
            tensor2im(result_image).save(vec_pic_path)                      # vec_pic 保存
            torch.save(latents, vec_path)                                   # vec  保存

# フォルダー内の一覧画像の作成
def folder_image(folder, save_path = '', pixel_size = (256,256), dpi = 64):
    # ピクセル → インチ変換
    x_inch = pixel_size[0] / dpi
    y_inch = pixel_size[1] / dpi

    fig = plt.figure(figsize = (x_inch * 10, y_inch * 1 + 1), dpi = dpi)
    fig.subplots_adjust(left=0, right=1, bottom=0, top=1)

    files = os.listdir(folder)
    files.sort()
    for i, file in enumerate(files):
        img = Image.open(folder+'/'+file)
        images = np.asarray(img)
        images = resize(images, pixel_size)[..., :3]

        ax = fig.add_subplot(1, 10, i+1, xticks=[], yticks=[])
        image_plt = np.array(images)
        ax.imshow(image_plt)
        ax.set_xlabel(folder+'/'+file, fontsize=15)

    if len(save_path) > 0:
        plt.savefig(save_path)

    plt.close()

# 静止画の生成
def still_image_generation(net, vec_dir = VEC_DIR, latent = '03.pt', direction = 'pose', min = -50, max = 50):
    pic_dir = PIC_DIR                                   # 静止画フォルダ
    
    logger.info(f'\n{CYAN} latent = {latent}, direction = {direction}, min = {min}, max = {max}{NOCOLOR}')

    # フォルダの初期化
    if os.path.isdir(pic_dir):
        shutil.rmtree(pic_dir)
    os.makedirs(pic_dir, exist_ok=True)

    vec_path = vec_dir + '/' + latent
    latents = torch.load(vec_path)
    editor = latent_editor.LatentEditor(net.decoder, False)

    interfacegan_directions = {
            'age': 'editings/interfacegan_directions/age.pt',
            'smile': 'editings/interfacegan_directions/smile.pt',
            'pose': 'editings/interfacegan_directions/pose.pt',
            'age+pose':  'editings/interfacegan_directions/age+pose.pt'
    }

    interfacegan_direction = torch.load(interfacegan_directions[direction]).cuda()
    cnt = 0

    for i in trange(0, min, -1, desc='0 -> min'):
        result = editor.apply_interfacegan(latents, interfacegan_direction, factor=i).resize((512,512))
        result.save(pic_dir + '/' + str(cnt).zfill(6) + '.jpg')
        cnt +=1

    for i in trange(min, max, desc='min -> max'):
        result = editor.apply_interfacegan(latents, interfacegan_direction, factor=i).resize((512,512))
        result.save(pic_dir + '/' + str(cnt).zfill(6) + '.jpg')
        cnt +=1

    for i in trange(max, 0, -1, desc='max -> 0'):
        result = editor.apply_interfacegan(latents, interfacegan_direction, factor=i).resize((512,512))
        result.save(pic_dir + '/' + str(cnt).zfill(6) + '.jpg')
        cnt +=1

    return pic_dir

# mp4 動画の作成
def make_movie(pic_dir, latent = '03.pt', direction = 'pose', out_dir = 'movie'):
    # 既に ファイルがあれば削除する
    if os.path.exists(OUT_MOVIE):
        os.remove(OUT_MOVIE)

    # pic フォルダーの静止画から動画を作成
    ffmpeg.input(pic_dir + '/%6d.jpg', framerate = 30).output(OUT_MOVIE, vcodec = 'libx264', r = 30).run(quiet=True)

    # out_dir フォルダへ名前を付けてコピー
    shutil.copy(OUT_MOVIE, out_dir + '/' + direction + '_' + latent[:-3] + '.mp4')

    # 動画を表示
    my_imagetool.image2disp(OUT_MOVIE)


# main関数エントリーポイント(実行開始)
if __name__ == "__main__":
    import datetime

    parser = parse_args()
    opt = parser.parse_args()

    # アプリケーション・ログ設定
    module = os.path.basename(__file__)
    module_name = os.path.splitext(module)[0]
    logger = my_logging.get_module_logger_sel(module_name, int(opt.log))

    display_info(opt, title)

    start_time = datetime.datetime.now()        # 時間計測開始

    os.makedirs(opt.result_image, exist_ok = True)
    base_dir_pair = os.path.split(opt.source_image)
    path0 = opt.result_image + '/' + base_dir_pair[1] + '_align.jpg'
    path1 = opt.result_image + '/' + base_dir_pair[1] + '_vec_pic.jpg'

    image_path = opt.source_image
    out_path = opt.result_image

    # モデルに学習済みパラメータをロード
    net = load_model(model_path = MODEL_PATH)

    # 画像フォルダの 顔画像を切り出す
    align_images(image_dir = image_path, align_dir = ALIGN_DIR)

    # 画像の潜在変数の推定
    make_latent_variable(net, align_dir = ALIGN_DIR, vec_pic_dir = VEC_PIC_DIR, vec_dir = VEC_DIR)

    # フォルダ内の画像一覧作成
    folder_image(ALIGN_DIR, save_path = path0)
    my_imagetool.image2disp(path0, maxsize = 1024)
    folder_image(VEC_PIC_DIR, save_path = path1)
    my_imagetool.image2disp(path1, maxsize = 1024)


    latent = '03.pt'            # type:"string"
    direction = 'age'           # "age", "pose", "smile", "age+pose"
    min = -50                   # min:-50, max:0, step:10
    max = 50                    # min:0, max:50, step:10

    pic_dir = still_image_generation(net, vec_dir = VEC_DIR, latent = latent, direction = direction, min = min, max = max)
    make_movie(pic_dir, latent = latent, direction = direction, out_dir = out_path)

    latent = '03.pt'            # type:"string"
    direction = 'pose'          # "age", "pose", "smile", "age+pose"
    min = -50                   # min:-50, max:0, step:10
    max = 50                    # min:0, max:50, step:10

    pic_dir = still_image_generation(net, vec_dir = VEC_DIR, latent = latent, direction = direction, min = min, max = max)
    make_movie(pic_dir, latent = latent, direction = direction, out_dir = out_path)

    latent = '03.pt'            # type:"string"
    direction = 'smile'         # "age", "pose", "smile", "age+pose"
    min = 0                     # min:-50, max:0, step:10
    max = 30                    # min:0, max:50, step:10

    pic_dir = still_image_generation(net, vec_dir = VEC_DIR, latent = latent, direction = direction, min = min, max = max)
    make_movie(pic_dir, latent = latent, direction = direction, out_dir = out_path)

    latent = '03.pt'            # type:"string"
    direction = 'age+pose'      # "age", "pose", "smile", "age+pose"
    min = -50                   # min:-50, max:0, step:10
    max = 50                    # min:0, max:50, step:10

    pic_dir = still_image_generation(net, vec_dir = VEC_DIR, latent = latent, direction = direction, min = min, max = max)
    make_movie(pic_dir, latent = latent, direction = direction, out_dir = out_path)

    # 経過時間
    end_time = datetime.datetime.now()
    print(start_time.strftime('\nprocessing start >>\t %Y/%m/%d %H:%M:%S'))
    print(end_time.strftime('processing end >>\t %Y/%m/%d %H:%M:%S'))
    print('processing time >>\t', end_time - start_time)

    logger.info('\nFinished.\n')

編集可能な画像の潜在変数生成 †

編集の前処理で以下の各フォルダに生成画像と潜在変数が得られる
・上段「./images」: オリジナル画像（XXXXXXXX.jpg）
・中段「./align」　: 切り出した顔画像画像（XXXXXXXX.jpg）
・下段「./vec_pic」: 潜在変数から生成した画像（XXXXXXXX.jpg）
・　　「.vec」　　:潜在変数（XXXXXXXX.pt）
切り出した顔画像とほぼ同じ画像を生成できる潜在変数が素早く推定できたことが分かる
これで、編集できる潜在変数を得ることができたことになる

StyleGAN を使った GUI 操作の画像編集プログラム「e4e_gui.py」を作る †

主な機能
・編集前処理
　(1) 画像フォルダ内の顔画像を切り出し「./align」フォルダに同じファイル名で保存
　(2) 切り出した顔画像から潜在変数を推定し生成した画像を同じファイル名で「./vec_pic」に保存
　(3) 推定した潜在変数を同じファイル名（拡張子はjpg → pt）で「./vec」に保存する
　・生成した潜在変数を用いて画像編集をおこなう

・生成した潜在変数を用いて画像編集をおこなう
　(1) age 　　　 : 年齢による顔の変化
　(2) pose　　　 : 顔の水平方向の回転
　(3) smile　　　: 笑顔にする
　(4) age+pose　: 年齢による顔の変化＋顔の水平方向の回転
・生成パラメータ（min/max）はスライダーにより値を変更できる

・GPU が使用できないマシンの場合は「ビューモード」となり、GPU による処理結果を再生することができる

コマンドオプション一覧（起動時の設定）

コマンドオプション	引数	初期値	意味
--source_image	str	''　 (ダイアログによる指定)	静止画ファイルパス
--result_image	str	'./result'	出力保存ディレクトリ
--log	int	3	Log level(-1/0/1/2/3/4/5)

操作方法

① 顔の画像の選択（潜在変数から推定した元画像）
② 現在選ばれている顔の推定画像とそのファイルパスを表示
③ 処理結果の動画と保存済みのファイルパス名
④ 推定の方向（direction）
⑤ 推定パラメータ変更のためのスライダー ※
⑥ 推定処理の実行 ※
⑦ アプリケーションを終了する

　※ 「ビューモード」では機能しない

コマンド実行例

(py38_learn) python e4e_gui.py

StyleGAN-e4e GUI program   Ver 0.01: Starting application...

   - source_image            :  C:/anaconda_win/workspace_2/encoder4editing/images/03.jpg
   - result_image            :  ./results
   - log                     :  3

   - GPU status              :  True
   - model path              :  ./pretrained_models/e4e_ffhq_encode.pt
   - align directory         :  ./align
   - vector pict directory   :  ./vec_pic
   - latent variable direct. :  ./vec
   - still image directory   :  ./pic

Loading e4e over the pSp framework from checkpoint: ./pretrained_models/e4e_ffhq_encode.pt
Model successfully loaded!

Finished.

モジュール・ソースコード

▼「e4e_gui.py」

# -*- coding: utf-8 -*-
##------------------------------------------
##  StyleGAN-e4e GUI program   Ver 0.01
##
##               2024.08.05 Masahiro Izutsu
##------------------------------------------
## e4e_gui.py

import warnings
warnings.simplefilter('ignore')

# Color Escape Code ---------------------------
GREEN = '\033[1;32m'
RED = '\033[1;31m'
NOCOLOR = '\033[0m'
YELLOW = '\033[1;33m'
CYAN = '\033[1;36m'
BLUE = '\033[1;34m'

# インポート＆初期設定
import os
import argparse

import numpy as np
import cv2
from PIL import Image, ImageTk
import PySimpleGUI as sg
import my_logging
import my_dialog

from torch.cuda import is_available
gpu_d = is_available()                                          # GPU 確認

if gpu_d:
    import e4e_demo

# 定数定義
IMAGE_DIR = './images'
RESULT_PATH = './results'
MODEL_PATH = './pretrained_models/e4e_ffhq_encode.pt'
ALIGN_DIR = './align'
VEC_PIC_DIR = './vec_pic'
VEC_DIR = './vec'
PIC_DIR = './pic'

KEY_ORGIMAGE = '-Org-'
KEY_PRSIMAGE = '-Prs-'
KEY_MIN = '-Min-'
KEY_MAX = '-Max-'
KEY_AGE = '-Age-'
KEY_POSE = '-Pose-'
KEY_SMILE = '-Smile-'
KEY_AGEPOSE = '-AgePose-'
KEY_IMAGESEL = '-Image-'
KEY_IMAGEPTOS = '-New-'
KEY_EXIT = '-Exit-'
KEY_TXTORG = '-Source-'
KEY_TXTPRS = '-Process-'

IMG_CANCAS_SIZE = 512


# タイトル
title = 'StyleGAN-e4e GUI program   Ver 0.01'
sub_title = '' if gpu_d else '  <view mode>'

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--direction", default='pose', type=str, choices=['age', 'pose', 'smile', 'age+pose'], help="select 'age'/'pose'/'smile'/'age+pose'")
    parser.add_argument("--source_image", default='', help="path to source image")
    parser.add_argument("--result_image", default=RESULT_PATH, help="path to output")
    parser.add_argument('--log', metavar = 'LOG', default = '3', help = 'Log level(-1/0/1/2/3/4/5) Default value is \'3\'')

    return parser

# 基本情報の表示
def display_info(opt, title):
    print('\n' + GREEN + title + ': Starting application...' + NOCOLOR)
    print('\n   - ' + YELLOW + 'source_image            : ' + NOCOLOR, opt.source_image)
    print('   - ' + YELLOW + 'result_image            : ' + NOCOLOR, opt.result_image)
    print('   - ' + YELLOW + 'log                     : ' + NOCOLOR, opt.log)

    print('\n   - ' + 'GPU status              : ', gpu_d)
    print('   - ' + 'model path              : ', MODEL_PATH)
    print('   - ' + 'align directory         : ', ALIGN_DIR)
    print('   - ' + 'vector pict directory   : ', VEC_PIC_DIR)
    print('   - ' + 'latent variable direct. : ', VEC_DIR)
    print('   - ' + 'still image directory   : ', PIC_DIR)
    print(' ')

# 潜在変数の存在確認
def check_vector_image(img_path, vec_path):
    files_image = [
        f for f in os.listdir(img_path) if os.path.isfile(os.path.join(img_path, f))
    ]
    logger.debug(f'image = {files_image}')

    if not os.path.isdir(vec_path):
        return False

    files_vec = [
        f for f in os.listdir(vec_path) if os.path.isfile(os.path.join(vec_path, f))
    ]
    logger.debug(f'vector = {files_vec}')

    flag = True
    for img_path in files_image:
        name, ext= os.path.splitext(img_path)
        path = vec_path + '/' + name + '.pt'
        if not os.path.isfile(path):
            flag = False

    logger.debug(f'check_vector_image = {flag}')
    return flag

# カレントディレクトリ配下のディレクトリからファイルを選択
def get_image_file(tgt_dir, msg = ''):
    # カレントディレクトリ配下の指定のみ有効
    import re
    image_file = ''
    cpath = os.getcwd()
    cpath = re.sub(r'\\', '/', cpath)
    while True:
        image_file = my_dialog.select_image_file(msg, tgt_dir)
        if len(image_file) == 0:
            break

        parir = os.path.split(image_file)
        parir = os.path.split(parir[0])
        if cpath == parir[0]:
            break

    return image_file


def main_process(opt, net):

    direction = opt.direction
    base_dir_pair = os.path.split(opt.source_image)
    img_name, ext = os.path.splitext(base_dir_pair[1])
    img_dir = base_dir_pair[0]                                                  # 画像ディレクトリ
    latent = img_name + '.pt'                                                   # 潜在ベクトル
    result_image = opt.result_image

    image_path = VEC_PIC_DIR + '/' + base_dir_pair[1]
    save_path = result_image + '/' + direction + '_' + img_name + '.mp4'

    radio0 = True if direction == 'age' else False
    radio1 = True if direction == 'pose' else False
    radio2 = True if direction == 'smile' else False
    radio2 = True if direction == 'age+pose' else False
    min = 0 if direction == 'smile' else -50
    max = 30 if direction == 'smile' else 50

    # ウィンドウのテーマ
    sg.theme('BlueMono')

    # ウィンドウのレイアウト
    col_image0 = [
            [sg.Canvas(size=(IMG_CANCAS_SIZE, IMG_CANCAS_SIZE), key=KEY_ORGIMAGE)], 
            [sg.Text(image_path, background_color='LightSteelBlue1', size=(63, 1), key = KEY_TXTORG)]
    ]
    col_image1 = [
            [sg.Image(size=(IMG_CANCAS_SIZE, IMG_CANCAS_SIZE), key=KEY_PRSIMAGE)], 
            [sg.Text(save_path, background_color='LightSteelBlue1', size=(63, 1), key = KEY_TXTPRS)]
    ]
    col_color = [
            [sg.Text("min", size=(5, 1)), sg.Slider((-50, 50), min, 10, orientation='h', size=(35, 10), disabled = not gpu_d, key=KEY_MIN, enable_events=True)],
            [sg.Text("max", size=(5, 1)), sg.Slider((-50, 50), max, 10, orientation='h', size=(35, 10), disabled = not gpu_d, key=KEY_MAX, enable_events=True)]
    ]
    col_radio = [
            [sg.Text("", size=(15, 1)), sg.Text("direction", size=(10, 1)), sg.Radio('age', group_id='direction',enable_events = True, default=radio0, key=KEY_AGE), sg.Radio('pose', group_id='direction',enable_events = True, default=radio1, key=KEY_POSE), sg.Radio('smile', group_id='direction',enable_events = True, default=radio2, key=KEY_SMILE), sg.Radio('age+pose', group_id='direction',enable_events = True, default=radio2, key=KEY_AGEPOSE)],
            [sg.Text("", size=(40, 1)), sg.Button('Image', size=(10, 1), key=KEY_IMAGESEL), sg.Button('Process', size=(10, 1), disabled = not gpu_d, key=KEY_IMAGEPTOS), sg.Text("", size=(1, 1)), sg.Button('Exit', size=(10, 1), key=KEY_EXIT)],
    ]

    layout = [[sg.Text(title + sub_title, size=(48, 1), justification='right', font='Helvetica 20')],
            [sg.Column(col_image0), sg.Column(col_image1)],
            [sg.Column(col_color), sg.Column(col_radio)], 
            [sg.Text("", size=(15, 1))]
    ]

    # ウィンドウオブジェクトの作成
    window = sg.Window(title, layout, finalize=True, return_keyboard_events=True)

    im0_canvas = window[KEY_ORGIMAGE]
    tcv_img0 = im0_canvas.TKCanvas
    tcv_img0.create_rectangle(0, 0, IMG_CANCAS_SIZE, IMG_CANCAS_SIZE, fill = '#cccccc')

    org_image = Image.open(image_path)                                                  # PIL型で読み込み（オリジナル画像）
    p_image = org_image.resize((IMG_CANCAS_SIZE, IMG_CANCAS_SIZE))
    pht0_image = ImageTk.PhotoImage(image = p_image)
    tcv_img0.create_image(IMG_CANCAS_SIZE / 2, IMG_CANCAS_SIZE / 2, image = pht0_image) # Canvasの中心に表示

    # 処理動画
    cap = cv2.VideoCapture(save_path)
    cap_open = cap.isOpened()

    new_make_f = False

    # イベントのループ
    while True:
        event, values = window.read(timeout=30)

        if new_make_f:
            pic_dir = e4e_demo.still_image_generation(net, logger, vec_dir = VEC_DIR, latent = latent, direction = direction, min = min, max = max)
            e4e_demo.make_movie(pic_dir, latent = latent, direction = direction, out_dir = result_image, disp_f = False)
            cap = cv2.VideoCapture(save_path)
            cap_open = cap.isOpened()
            new_make_f = False

        if event == KEY_EXIT or event == sg.WIN_CLOSED or event == 'Escape:27':
            break

        if event == KEY_MIN or event == KEY_MAX:
            min = int(values[KEY_MIN])
            max = int(values[KEY_MAX])
            logger.debug(f'min = {min}, max = {max}')

        if event == KEY_AGE or event == KEY_POSE or event == KEY_SMILE or event == KEY_AGEPOSE:
            logger.debug(f'{KEY_AGE}, {KEY_POSE}, {KEY_SMILE}, {KEY_AGEPOSE}')
            if values[KEY_AGE]:
                direction = 'age'
                min = -50
                max = 50
            elif values[KEY_POSE]:
                direction = 'pose'
                min = -50
                max = 50
            elif values[KEY_SMILE]:
                direction = 'smile'
                min = 0
                max = 30
            elif values[KEY_AGEPOSE]:
                direction = 'age+pose'
                min = -50
                max = 50
            window[KEY_MIN].update(float(min))
            window[KEY_MAX].update(float(max))
            window[KEY_TXTPRS].update(save_path)

            if cap_open:
                cap.release()
            save_path = result_image + '/' + direction + '_' + img_name + '.mp4'
            cap = cv2.VideoCapture(save_path)
            cap_open = cap.isOpened()
            if gpu_d and not cap_open:
                new_make_f = True

        if event == KEY_IMAGESEL:
            fpath = get_image_file(VEC_PIC_DIR, msg = VEC_PIC_DIR + '　')
            if len(fpath) > 0:
                # オリジナル画像変更
                image_path = fpath
                window[KEY_TXTORG].update(image_path)
                org_image = Image.open(image_path)                                      # PIL型で読み込み（オリジナル画像）
                p_image = org_image.resize((IMG_CANCAS_SIZE, IMG_CANCAS_SIZE))
                pht0_image = ImageTk.PhotoImage(image = p_image)
                tcv_img0.create_image(IMG_CANCAS_SIZE / 2, IMG_CANCAS_SIZE / 2, image = pht0_image) # Canvasの中心に表示

                base_dir_pair = os.path.split(image_path)
                img_name, ext = os.path.splitext(base_dir_pair[1])
                latent = img_name + '.pt'
                if cap_open:
                    cap.release()
                    frame = np.zeros((IMG_CANCAS_SIZE, IMG_CANCAS_SIZE, 3))
                    frame += 204
                    img = cv2.imencode('.png', frame)[1].tobytes()
                    window[KEY_PRSIMAGE].update(img)

                save_path = result_image + '/' + direction + '_' + img_name + '.mp4'
                cap = cv2.VideoCapture(save_path)
                cap_open = cap.isOpened()
                if gpu_d and not cap_open:
                    new_make_f = True

                window[KEY_TXTORG].update(image_path)
                window[KEY_TXTPRS].update(save_path)

        if event == KEY_IMAGEPTOS:
            if gpu_d:
                if cap_open:
                    cap.release()
                    cap_open = False

                new_make_f = True
            logger.debug(f'New image → latent = {latent}, direction = {direction}, min = {min}, max = {max}{NOCOLOR}')

        if cap_open:
            ret, frame = cap.read()
            if frame is None:
                #最初のフレームに戻る
                cap.set(cv2.CAP_PROP_POS_FRAMES, 0)
                ret, frame = cap.read()

            img = cv2.imencode('.png', frame)[1].tobytes()
            window[KEY_PRSIMAGE].update(img)

    # ウィンドウ終了処理
    if cap_open:
        cap.release()

    window.close()


# main関数エントリーポイント(実行開始)
if __name__ == "__main__":
    import datetime

    parser = parse_args()
    opt = parser.parse_args()

    # アプリケーション・ログ設定
    module = os.path.basename(__file__)
    module_name = os.path.splitext(module)[0]
    logger = my_logging.get_module_logger_sel(module_name, int(opt.log))

    if len(opt.source_image) == 0:
        opt.source_image = get_image_file(IMAGE_DIR, msg = IMAGE_DIR + '　')
        if len(opt.source_image) == 0:
            exit(0)

    base_dir_pair = os.path.split(opt.source_image)
    img_dir = base_dir_pair[0]                                                  # 画像ディレクトリ

    display_info(opt, title)

    if gpu_d:
        start_time = datetime.datetime.now()                                    # 時間計測開始

        # モデルに学習済みパラメータをロード
        net = e4e_demo.load_model(model_path = e4e_demo.MODEL_PATH)

    else:
        net = None

    if gpu_d and not check_vector_image(img_dir, e4e_demo.VEC_DIR):

        # 潜在変数と推定画像の生成
        e4e_demo.make_latent(net, img_dir, out_path = opt.result_image, disp_f = True)

        # 経過時間
        end_time = datetime.datetime.now()
        print(start_time.strftime('\nprocessing start >>\t %Y/%m/%d %H:%M:%S'))
        print(end_time.strftime('processing end >>\t %Y/%m/%d %H:%M:%S'))
        print('processing time >>\t', end_time - start_time)

    main_process(opt, net)

    logger.info('\nFinished.\n')

ここまでのまとめ †

ハードウェアによる実行速度の違い

プログラム GPU CPU

RTX 4070 GTX 1050 i9-13900 i7-1260P i7-1185G7

e4e_demo.py 1分30秒 3分37秒 × × ×

　×: CPU 動作不可

対処した問題点とエラー詳細 †

UserWarning: loaded more than 1 DLL from .libs: †

仮想環境の再構築でワーニングエラーが発生するようになった
・将来のアップデート予告のようなので、プログラムの最初で抑制する
```
import warnings
warnings.simplefilter('ignore')
```

OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. †

CPU 動作に対応できないプログラムのよう
・最初に GPU の動作確認をして対処する

from torch.cuda import is_available
gpu_d = is_available()                                          # GPU 確認

更新履歴 †

2024/08/06 初版

参考資料 †

StyleGAN
- StyleGANを使った画像編集をe4eで高速化する
- e4e オフィシャルサイト Designing an Encoder for StyleGAN Image Manipulation (SIGGRAPH 2021)

StyleGAN3

Other