GanFOMM2 のバックアップ(No.36)

私的AI研究会 > GanFOMM2

静止画から動画を作る：First Order Motion Model（その２） †

　同じカテゴリの静止画と動画を使って、静止画を動画のように動かす「First Order Motion Model」技術を使って静止画を動画にする

静止画から動画を作る：First Order Motion Model（その２）
参考資料

※ 最終更新:2024/11/02　

↑

First Order Motion Model †

↑

概要 †

「first-order-model」は、2019年に発表された『First Order Motion Model for Image Animation』という論文に基づいて作成された、同じカテゴリの静止画と動画を使って、静止画を動画のように動かすというモデル
静止画 source と動画 Driving Frame から入力動画の動きに沿って動く入力画像の動画を生成する
学習時は、静止画と動画は同一の動画から任意のフレームを選択して行う
推論時は、学習時と同じカテゴリーであれば静止画と動画とも任意のもので行うことができる

モデル概要図（下記論文所収）
論文「First Order Motion Model for Image Animation」
<Official site>
・https://aliaksandrsiarohin.github.io/first-order-model-website/
<paper>
・https://papers.nips.cc/paper_files/paper/2019/file/31c0b36aef265d9221af80872ceb62f9-Paper.pdf
・(new) https://arxiv.org/pdf/2104.11280
<framework>
・https://github.com/AliaksandrSiarohin/first-order-model
・(new) https://github.com/snap-research/articulated-animation
・(Motion Co-Segmentation.) https://github.com/AliaksandrSiarohin/motion-cosegmentation

「Google Colab」での実行デモ
・静止画から作るフェイク動画：First Order Motion Model

↑

実行環境の構築 †

GitHub サイトからプロジェクトをダウンロード
・「workspace_2/」フォルダがない場合は作成しておく

cd /anaconda_win/workspace_2　　　　　　　　　　　　　　　　　　　　　← Windows の場合
cd ~/workspace_2　　　　　　　　　　　　　　　　　　　　　　　　　　　← Linux の場合

git clone https://github.com/AliaksandrSiarohin/first-order-model

プロジェクト・パッケージ project_first-order-model.zip (3.52GB) <first-order-model> をダウンロード
・解凍してできるフォルダ

project_first-order-model
├─workspace_2
│  ├─first-order-model　　　　　　　　　　　　　　　　　　　　　　← GitHub からクローンしたプロジェクトに上書きする
│  │  ├─result
│  │  ├─result_save
│  │  └─sample
│  │      ├─images
│  │      └─videos
│  └─mylib2 　　　　　　　　　　　　　　　　　　　　　　　　　　　← ローカル環境で実行するための汎用ライブラリ
│      ├─mylib_test
│      └─result
└─workspace_py37
    └─mylib　　　　　　　　　　　　　　　　　　　　　　　　　　　　← 私的汎用ライブラリ

・解凍してできる「project_first-order-model/」フォルダ内を次のフォルダの下に上書きコピーする
　Windows の場合 →「anaconda_win/」　Linux の場合 → 「~/」

新しく仮想環境「py38_learn」を構築する
『仮想環境 (py38_learn)』の手順で仮想環境を作成

↑

前準備 †

ローカル環境で「First Order Motion Model」を実行するために必要となるライブラリを作成する（上記プロジェクト・パッケージに含む）
実行確認 → Python 私的汎用ライブラリ２

　　Windows　の場合

「mylib2/」フォルダにパスが通っているか環境変数（PYTHONPATH）を確認する
```
echo $env:PYTHONPATH
```
・パスが通っていない場合 → 環境変数を設定する

カレントディレクトリを指定する

cd /anaconda_win/workspace_2/first-order-model

　　Linux　の場合

「mylib2/」フォルダにパスが通っているか環境変数（PYTHONPATH）を確認する
```
printenv PYTHONPATH
```
・パスが通っていない場合 → 環境変数を設定する
カレントディレクトリを指定する
```
cd ~/workspace_2/first-order-model
```

↑

入力となる静止画と動画について †

静止画(source_image) / 動画(driving_video) 画像サイズ
・256x256 ピクセル画像にリサイズして実行されるのでサイズは自由
・縦横比を維持するためには縦横同じ（正方形）か、それに近い方が望ましい

学習済みモデルと同じカテゴリー（似た内容）であれな任意のものを使用できる
動画に音声が含まれる場合は処理結果の動画にも音声トラックを引き継ぐことができる

↑

提供されているデモ「demo.py」を試す †

学習済みモデル（プロジェクト・パッケージに組み込み済み）を使ってデモプログラムを動かす
・提供されている「demo.py」は若干の不具合があるので対処した版を「demo2.py」とする
・GPU未搭載やメモリー容量などで CUDAエラーが起きる場合は「--cpu」オプションを付加する
・処理結果は「--result_video <filepath>」オプションで指定するファイルに出力される

静止画はモナリザ、動画はトランプ大統領のサンプルで実行する →「トランプのように話すモナリザ」

(py38_learn) python demo2.py  --config config/vox-256.yaml --driving_video './sample/videos/2.mp4' --source_image './sample/images/05.png' --checkpoint './sample/vox-cpk.pth.tar' --relative --adapt_scale --result_video result.gif --cpu

FOMM Demo (demo2.py) Ver. 0.01: Starting application...

   - config          :  config/vox-256.yaml
   - checkpoint      :  ./sample/vox-cpk.pth.tar
   - source_image    :  ./sample/images/05.png
   - driving_video   :  ./sample/videos/2.mp4
   - result_video    :  result.gif
   - relative        :  True
   - adapt_scale     :  True
   - find_best_frame :  False
   - best_frame      :  None
   - cpu             :  True
   - audio           :  False

100%|████████████████████████████████████████████████████████████████████████████████| 211/211 [02:20<00:00,  1.50it/s]

・GPU動作の場合（「--cpu」オプションなし）

100%|████████████████████████████████████████████████████████████████████████████████| 211/211 [00:02<00:00, 74.32it/s]

顔を動画の方に合わせるモードで実行する →「トランプ似のモナリザ」

(py38_learn) python demo2.py  --config config/vox-256.yaml --driving_video './sample/videos/2.mp4' --source_image './sample/images/05.png' --checkpoint './sample/vox-cpk.pth.tar' --adapt_scale --result_video result1.gif --cpu

FOMM Demo (demo2.py) Ver. 0.01: Starting application...

   - config          :  config/vox-256.yaml
   - checkpoint      :  ./sample/vox-cpk.pth.tar
   - source_image    :  ./sample/images/05.png
   - driving_video   :  ./sample/videos/2.mp4
   - result_video    :  result1.gif
   - relative        :  False
   - adapt_scale     :  True
   - find_best_frame :  False
   - best_frame      :  None
   - cpu             :  True
   - audio           :  False

100%|████████████████████████████████████████████████████████████████████████████████| 211/211 [02:19<00:00,  1.51it/s]

・GPU動作の場合（「--cpu」オプションなし）

100%|████████████████████████████████████████████████████████████████████████████████| 211/211 [00:02<00:00, 81.72it/s]

モジュール・ソースコード

▼「demo2.py」

# -*- coding: utf-8 -*-
##------------------------------------------
##   First Order Motion Model Demo Ver 0.01
##
##               2024.06.14 Masahiro Izutsu
##------------------------------------------
## demo2.py         (original: demo.py)
##  Ver 0.02    2024/06/16  worning error を消す

## *************
## URL: https://qiita.com/jun40vn/items/722bd4675246eb7eac46
## オリジナルデモ「demo.py」→「demo2.py」動作確認コマンド
##
## 顔           モナリザ → トランプ
## python demo2.py  --config config/vox-256.yaml --driving_video './sample/videos/2.mp4' --source_image './sample/images/05.png' --checkpoint './sample/vox-cpk.pth.tar' --relative --adapt_scale
## 顔           トランプ似のモナリザ
## python demo2.py  --config config/vox-256.yaml --driving_video './sample/videos/2.mp4' --source_image './sample/images/05.png' --checkpoint './sample/vox-cpk.pth.tar' --adapt_scale
## 顔           松嶋菜々子 → ヒントン教授
## python demo2.py  --config config/vox-256.yaml --driving_video './sample/videos/hinton.mp4' --source_image './sample/images/pic6.png' --checkpoint './sample/vox-cpk.pth.tar' --relative --adapt_scale
##
## ファッション 波瑠 → モデル
## python demo2.py  --config config/fashion-256.yaml --driving_video './sample/videos/fashion01x.mp4' --source_image './sample/images/fashion003x.png' --checkpoint './sample/fashion.pth.tar' --relative --adapt_scale
##
## 太極拳       石原さとみ → 太極拳
## python demo2.py  --config config/taichi-256.yaml --driving_video './sample/videos/taichi2.mp4' --source_image './sample/images/taichi001x.jpg' --checkpoint './sample/taichi-cpk.pth.tar' --relative --adapt_scale
##
## アニメーション
## python demo2.py  --config config/mgif-256.yaml --driving_video './sample/videos/anim_00055x.mp4' --source_image './sample/images/anim02.png' --checkpoint './sample/mgif-cpk.pth.tar' --relative --adapt_scale
## *************

import warnings
warnings.simplefilter('ignore')

import sys
import yaml
from argparse import ArgumentParser
from tqdm.auto import tqdm
import imageio.v2 as imageio                            # 2024/06/14    warning error 対応

import numpy as np
from skimage.transform import resize
from skimage import img_as_ubyte
import torch
from sync_batchnorm import DataParallelWithCallback

from modules.generator import OcclusionAwareGenerator
from modules.keypoint_detector import KPDetector
from animate import normalize_kp

import ffmpeg
from os.path import splitext
from shutil import copyfileobj
import tempfile

if sys.version_info[0] < 3:
    raise Exception("You must use Python 3 or higher. Recommended version is Python 3.7")

# Color Escape Code ---------------------------
GREEN = '\033[1;32m'
RED = '\033[1;31m'
NOCOLOR = '\033[0m'
YELLOW = '\033[1;33m'

import os 

# タイトル
title = 'FOMM Demo (demo2.py) Ver. 0.01'

# 基本情報の表示
def display_info(opt):
    print('\n' + GREEN + title + ': Starting application...' + NOCOLOR)
    print('\n   - ' + YELLOW + 'config          : ' + NOCOLOR, opt.config)
    print('   - ' + YELLOW + 'checkpoint      : ' + NOCOLOR, opt.checkpoint)
    print('   - ' + YELLOW + 'source_image    : ' + NOCOLOR, opt.source_image)
    print('   - ' + YELLOW + 'driving_video   : ' + NOCOLOR, opt.driving_video)
    print('   - ' + YELLOW + 'result_video    : ' + NOCOLOR, opt.result_video)

    print('   - ' + YELLOW + 'relative        : ' + NOCOLOR, opt.relative)
    print('   - ' + YELLOW + 'adapt_scale     : ' + NOCOLOR, opt.adapt_scale)
    print('   - ' + YELLOW + 'find_best_frame : ' + NOCOLOR, opt.find_best_frame)
    print('   - ' + YELLOW + 'best_frame      : ' + NOCOLOR, opt.best_frame)
    print('   - ' + YELLOW + 'cpu             : ' + NOCOLOR, opt.cpu)
    print('   - ' + YELLOW + 'audio           : ' + NOCOLOR, opt.audio)
    print(' ')

#-----------------------------------------------

def load_checkpoints(config_path, checkpoint_path, cpu=False):

    with open(config_path) as f:
        config = yaml.full_load(f)

    generator = OcclusionAwareGenerator(**config['model_params']['generator_params'],
                                        **config['model_params']['common_params'])
    if not cpu:
        generator.cuda()

    kp_detector = KPDetector(**config['model_params']['kp_detector_params'],
                             **config['model_params']['common_params'])
    if not cpu:
        kp_detector.cuda()

    if cpu:
        checkpoint = torch.load(checkpoint_path, map_location=torch.device('cpu'))
    else:
        checkpoint = torch.load(checkpoint_path)

    generator.load_state_dict(checkpoint['generator'])
    kp_detector.load_state_dict(checkpoint['kp_detector'])

    if not cpu:
        generator = DataParallelWithCallback(generator)
        kp_detector = DataParallelWithCallback(kp_detector)

    generator.eval()
    kp_detector.eval()

    return generator, kp_detector


def make_animation(source_image, driving_video, generator, kp_detector, relative=True, adapt_movement_scale=True, cpu=False):
    with torch.no_grad():
        predictions = []
        source = torch.tensor(source_image[np.newaxis].astype(np.float32)).permute(0, 3, 1, 2)
        if not cpu:
            source = source.cuda()
        driving = torch.tensor(np.array(driving_video)[np.newaxis].astype(np.float32)).permute(0, 4, 1, 2, 3)
        kp_source = kp_detector(source)
        kp_driving_initial = kp_detector(driving[:, :, 0])

        for frame_idx in tqdm(range(driving.shape[2])):
            driving_frame = driving[:, :, frame_idx]
            if not cpu:
                driving_frame = driving_frame.cuda()
            kp_driving = kp_detector(driving_frame)
            kp_norm = normalize_kp(kp_source=kp_source, kp_driving=kp_driving,
                                   kp_driving_initial=kp_driving_initial, use_relative_movement=relative,
                                   use_relative_jacobian=relative, adapt_movement_scale=adapt_movement_scale)
            out = generator(source, kp_source=kp_source, kp_driving=kp_norm)

            predictions.append(np.transpose(out['prediction'].data.cpu().numpy(), [0, 2, 3, 1])[0])
    return predictions

def find_best_frame(source, driving, cpu=False):
    import face_alignment  # type: ignore (local file)
    from scipy.spatial import ConvexHull

    def normalize_kp(kp):
        kp = kp - kp.mean(axis=0, keepdims=True)
        area = ConvexHull(kp[:, :2]).volume
        area = np.sqrt(area)
        kp[:, :2] = kp[:, :2] / area
        return kp

    fa = face_alignment.FaceAlignment(face_alignment.LandmarksType._2D, flip_input=True,
                                      device='cpu' if cpu else 'cuda')
    kp_source = fa.get_landmarks(255 * source)[0]
    kp_source = normalize_kp(kp_source)
    norm  = float('inf')
    frame_num = 0
    for i, image in tqdm(enumerate(driving)):
        kp_driving = fa.get_landmarks(255 * image)[0]
        kp_driving = normalize_kp(kp_driving)
        new_norm = (np.abs(kp_source - kp_driving) ** 2).sum()
        if new_norm < norm:
            norm = new_norm
            frame_num = i
    return frame_num

if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument("--config", required=True, help="path to config")
    parser.add_argument("--checkpoint", default='vox-cpk.pth.tar', help="path to checkpoint to restore")

    parser.add_argument("--source_image", default='sup-mat/source.png', help="path to source image")
    parser.add_argument("--driving_video", default='driving.mp4', help="path to driving video")
    parser.add_argument("--result_video", default='result.mp4', help="path to output")

    parser.add_argument("--relative", dest="relative", action="store_true", help="use relative or absolute keypoint coordinates")
    parser.add_argument("--adapt_scale", dest="adapt_scale", action="store_true", help="adapt movement scale based on convex hull of keypoints")

    parser.add_argument("--find_best_frame", dest="find_best_frame", action="store_true",
                        help="Generate from the frame that is the most alligned with source. (Only for faces, requires face_aligment lib)")

    parser.add_argument("--best_frame", dest="best_frame", type=int, default=None, help="Set frame to start from.")

    parser.add_argument("--cpu", dest="cpu", action="store_true", help="cpu mode.")

    parser.add_argument("--audio", dest="audio", action="store_true", help="copy audio to output from the driving video" )

    parser.set_defaults(relative=False)
    parser.set_defaults(adapt_scale=False)
    parser.set_defaults(audio_on=False)

    opt = parser.parse_args()

#-----------------------------------------------
    display_info(opt)                                           # 2024/06/17 基本情報の表示

    # ファイルの存在確認
    if not os.path.isfile(opt.source_image):
        print(RED + f"File not found !! '{opt.source_image}' " + NOCOLOR)
        quit()
    if not os.path.isfile(opt.driving_video):
        print(RED + f"File not found !! '{opt.driving_video}' " + NOCOLOR)
        quit()

#-----------------------------------------------

    source_image = imageio.imread(opt.source_image)
    reader = imageio.get_reader(opt.driving_video)
    fps = reader.get_meta_data()['fps']
    driving_video = []
    try:
        for im in reader:
            driving_video.append(im)
    except RuntimeError:
        pass
    reader.close()

    source_image = resize(source_image, (256, 256))[..., :3]
    driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]
    generator, kp_detector = load_checkpoints(config_path=opt.config, checkpoint_path=opt.checkpoint, cpu=opt.cpu)

    if opt.find_best_frame or opt.best_frame is not None:
        i = opt.best_frame if opt.best_frame is not None else find_best_frame(source_image, driving_video, cpu=opt.cpu)
        print("Best frame: " + str(i))
        driving_forward = driving_video[i:]
        driving_backward = driving_video[:(i+1)][::-1]
        predictions_forward = make_animation(source_image, driving_forward, generator, kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu)
        predictions_backward = make_animation(source_image, driving_backward, generator, kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu)
        predictions = predictions_backward[::-1] + predictions_forward[1:]
    else:
        predictions = make_animation(source_image, driving_video, generator, kp_detector, relative=opt.relative, adapt_movement_scale=opt.adapt_scale, cpu=opt.cpu)

    # 出力が GIFファイルの時はループする
    name, ext = splitext(opt.result_video)
    if ext == '.gif':
        imageio.mimsave(opt.result_video, [img_as_ubyte(frame) for frame in predictions], fps = fps, loop = 0)
    else:
        imageio.mimsave(opt.result_video, [img_as_ubyte(frame) for frame in predictions], fps = fps)

    if opt.audio:
        try:
            # 一時ファイル・エラー対応  2024/06/18
            with tempfile.TemporaryDirectory() as str_temp_dir:
                tmpfile = f"{str_temp_dir}/temp.mp4"
                ffmpeg.output(ffmpeg.input(opt.result_video).video, ffmpeg.input(opt.driving_video).audio, tmpfile, c='copy').run(quiet=True)
                with open(opt.result_video, 'wb') as result:
                    with open(tmpfile, 'rb') as output:
                        copyfileobj(output, result)
        except ffmpeg.Error:
            print(RED + f"Failed to copy audio: the driving video may have no audio track or the audio format is invalid." + NOCOLOR)

↑

カテゴリーを簡単に指定できるプログラム「fomm.py」を作成する ※ 2024/11/02 改版 †

主な機能
・カテゴリーを指定すると、学習済みモデルの設定を自動でできるようにする
・テストのために静止画像と動画の指定も省略可能とする
・オリジナルのオプションパラメータ指定はそのまま利用できる
・GPU未搭載やメモリー容量などで CUDAエラーが起きる場合は「--cpu」オプションを付加する
・「--relative」「--audio」のオプションは「demo.py」と反対に True をデフォールト設定とする
・処理後に生成される動画と元の静止画/元の動画/処理結果の動画を生成し表示する

出力ファイルの保存場所とファイル名（--result_video './result/face.mp4' 指定の時）
・「./result」フォルダに保存される（「./result」フォルダは存在しなければならない）
・静止画から生成された動画 → 'face_ + <静止画> + <元動画> + .mp4'
・静止画/元動画/動画一覧　 → 'face_ + <静止画> + <元動画> + _a + .mp4'

学習済みモデルの場合のカテゴリー別オプション指定

カテゴリー	--config	--checkpoint	--source_image	--driving_video	出力例
顔	config/ vox-256.yaml	./sample/ vox-cpk.pth.tar	./sample/images/ 05.png	./sample/videos/ 2.mp4	モナリザ → トランプ
顔	config/ vox-256.yaml	./sample/ vox-cpk.pth.tar	./sample/images/ pic6.png	./sample/ videos/hinton.mp4	松嶋菜々子 → ヒントン教授
ファッション	config/ fashion-256.yaml	./sample/ fashion.pth.tar	./sample/images/ fashion003x.png	./sample/videos/ fashion01x.mp4	波瑠 → モデル
太極拳	config/ taichi-256.yaml	./sample/ taichi-cpk.pth.tar	./sample/images/ taichi001x.jpg	./sample/videos/ taichi2.mp4	石原さとみ → 太極拳
アニメーション	config/ mgif-256.yaml	./sample/ mgif-cpk.pth.tar	./sample/images/ anim02.png	./sample/videos/ anim_00055x.mp4	馬のアニメーション

「-c, --category <オプション>」を追加して設定を切り替える（その他のオプション指定はそのまま有効）

カテゴリー	オプション	--config	--checkpoint	--source_image	--driving_video	内容
顔	-c 0	初期値	初期値	任意指定	任意指定	指定ソースで変換処理
顔	-c 00	初期値	初期値	初期値	初期値	モナリザ → トランプ
ファッション	-c 1	初期値	初期値	任意指定	任意指定	指定ソースで変換処理
ファッション	-c 10	初期値	初期値	初期値	初期値	波瑠 → モデル
太極拳	-c 2	初期値	初期値	任意指定	任意指定	指定ソースで変換処理
太極拳	-c 20	初期値	初期値	初期値	初期値	石原さとみ → 太極拳
アニメーション	-c 3	初期値	初期値	任意指定	任意指定	指定ソースで変換処理
アニメーション	-c 30	初期値	初期値	初期値	初期値	馬のアニメーション

コマンド実行例

(py38_learn) python fomm.py -c 00

First Order Motion Model Ver. 0.01: Starting application...

   - Category        : 00: ** Face **
   - source_image    : ./sample/images/05.png
   - driving_video   : ./sample/videos/2.mp4
   - result_video    : ./result/face.mp4
   - audio           : True
   - cpu             : True
   - log             : 3

100%|████████████████████████████████████████████████████████████████████████████████| 211/211 [02:24<00:00,  1.46it/s]
 Saving... → './result/face_05_2.mp4'
 Saving... → './result/face_05_2_a.mp4'

 Finished.

・GPU動作の場合

100%|████████████████████████████████████████████████████████████████████████████████| 211/211 [00:02<00:00, 81.20it/s]

モジュール・ソースコード

▼「fomm.py」

# -*- coding: utf-8 -*-
##------------------------------------------
##   First Order Motion Model  Ver 0.01
##
##               2024.06.16 Masahiro Izutsu
##------------------------------------------
## fomm.py

import warnings
warnings.simplefilter('ignore')

# Color Escape Code
GREEN = '\033[1;32m'
RED = '\033[1;31m'
NOCOLOR = '\033[0m'
YELLOW = '\033[1;33m'

# 定数定義
CONFIG_VOX = './config/vox-256.yaml'                                # 0: 顔
DEF_CHECKPOINT = './sample/vox-cpk.pth.tar'
DEF_SOURCE_IMAGE = './sample/images/05.png'
DEF_DRIVING_IMAGE = './sample/videos/2.mp4'
DEF_RESULT_VIDEO = './result/face.mp4'

CONFIG_FAS = './config/fashion-256.yaml'                            # 1: ファッション
FAS_CHECKPOINT = './sample/fashion.pth.tar'
FAS_SOURCE_IMAGE = './sample/images/fashion003x.png'
FAS_DRIVING_IMAGE = './sample/videos/fashion01x.mp4'
FAS_RESULT_VIDEO = './result/fashion.mp4'

CONFIG_TAI = './config/taichi-256.yaml'                             # 2: 太極拳
TAI_CHECKPOINT = './sample/taichi-cpk.pth.tar'
TAI_SOURCE_IMAGE = './sample/images/taichi001x.jpg'
TAI_DRIVING_IMAGE = './sample/videos/taichi2.mp4'
TAI_RESULT_VIDEO = './result/taich.mp4'

CONFIG_MGF = './config/mgif-256.yaml'                               # 3: アニメーション
MGF_CHECKPOINT = './sample/mgif-cpk.pth.tar'
MGF_SOURCE_IMAGE = './sample/images/anim02.png'
MGF_DRIVING_IMAGE = './sample/videos/anim_00055x.mp4'
MGF_RESULT_VIDEO = './result/mgif.gif'

# import処理
from demo2 import load_checkpoints
from demo2 import make_animation
from skimage import img_as_ubyte
from skimage.transform import resize
import imageio.v2 as imageio

import os
import argparse
import my_videotool
import my_imagetool
import my_movieplay

# タイトル
title = 'First Order Motion Model Ver. 0.01'

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-c', '--category', default = '0', type = str, help = 'Category (0:face, 1:fashion, 2:tai chi, 3:motion GIF) Default is 0')
    parser.add_argument("--config", default=CONFIG_VOX, help="path to config")
    parser.add_argument("--checkpoint", default=DEF_CHECKPOINT, help="path to checkpoint to restore")
    parser.add_argument("--source_image", default=DEF_SOURCE_IMAGE, help="path to source image")
    parser.add_argument("--driving_video", default=DEF_DRIVING_IMAGE, help="path to driving video")
    parser.add_argument("--result_video", default=DEF_RESULT_VIDEO, help="path to output")
    parser.add_argument("--relative", dest="relative", action="store_false", help="use relative or absolute keypoint coordinates")
    parser.add_argument("--adapt_scale", dest="adapt_scale", action="store_false", help="adapt movement scale based on convex hull of keypoints")
    parser.add_argument("--find_best_frame", dest="find_best_frame", action="store_true",
                        help="Generate from the frame that is the most alligned with source. (Only for faces, requires face_aligment lib)")
    parser.add_argument("--best_frame", dest="best_frame", type=int, default=None, help="Set frame to start from.")
    parser.add_argument("--audio", dest="audio", action="store_false", help="copy audio to output from the driving video" )

    parser.add_argument("--cpu", dest="cpu", action="store_true", help="cpu mode.")
    parser.add_argument('--log', metavar = 'LOG', default = '3', help = 'Log level(-1/0/1/2/3/4/5) Default value is \'3\'')

    return parser

# 基本情報の表示
def display_info(opt, title):
    if opt.category[0] == '0':
        cat = f'{opt.category}: ** Face **'
    elif opt.category[0] == '1':
        cat = f'{opt.category}: ** Fashion **'
    elif opt.category[0] == '2':
        cat = f'{opt.category}: ** Tai chi **'
    elif opt.category[0] == '3':
        cat = f'{opt.category}: ** Moving GIF **'
    else:
        cat = f'{opt.category}: ** setup **'

    if title != '':
        print(f'\n{GREEN}{title}: Starting application...{NOCOLOR}')

    print(f'\n   - {YELLOW}Category        : {NOCOLOR}{cat}')
    print(f'   - {YELLOW}source_image    : {NOCOLOR}{opt.source_image}')
    print(f'   - {YELLOW}driving_video   : {NOCOLOR}{opt.driving_video}')
    print(f'   - {YELLOW}result_video    : {NOCOLOR}{opt.result_video}')
    print(f'   - {YELLOW}audio           : {NOCOLOR}{opt.audio}')
    print(f'   - {YELLOW}cpu             : {NOCOLOR}{opt.cpu}')
    print(f'   - {YELLOW}log             : {NOCOLOR}{opt.log}')

    if int(opt.log) < 3:
        print(f'   - config          : {opt.config}')
        print(f'   - checkpoint      : {opt.checkpoint}')
        print(f'   - relative        : {opt.relative}')
        print(f'   - adapt_scale     : {opt.adapt_scale}')
        print(f'   - find_best_frame : {opt.find_best_frame}')
        print(f'   - best_frame      : {opt.best_frame}')

    print(f' ')

# 出力ファイル名を得る
#   in:     source_image    静止画ファイルパス
#           driving_video   元動画ファイルパス
#           result_video    処理結果画像ファイルパス
#   out:    out_path1       処理結果画像ファイルパス
#           out_path2       静止画/元動画/処理結果画像ファイルパス
#           ext             拡張子
#
def get_results_path(source_image, driving_video, result_video):
    base_dir_pair = os.path.split(source_image)
    s_name, ext = os.path.splitext(base_dir_pair[1])
    base_dir_pair = os.path.split(driving_video)
    d_name, ext = os.path.splitext(base_dir_pair[1])
    
    name, ext = os.path.splitext(result_video)
    out_path1 = name + '_' + s_name + '_' + d_name + ext
    out_path2 = name + '_' + s_name + '_' + d_name + '_a' + ext
    return out_path1, out_path2, ext


# カテゴリー別のオプション設定
#
def set_category_opt(opt):
    if opt.category[0] == '0':                                      # 0: 顔
        opt.config = CONFIG_VOX
        opt.checkpoint = DEF_CHECKPOINT
        opt.result_video = DEF_RESULT_VIDEO

    if opt.category[0] == '1':                                      # 1: ファッション
        opt.config = CONFIG_FAS
        opt.checkpoint = FAS_CHECKPOINT
        opt.result_video = FAS_RESULT_VIDEO

    elif opt.category[0] == '2':                                    # 2: 太極拳
        opt.config = CONFIG_TAI
        opt.checkpoint = TAI_CHECKPOINT
        opt.result_video = TAI_RESULT_VIDEO

    elif opt.category[0] == '3':                                    # 3: アニメーション
        opt.config = CONFIG_MGF
        opt.checkpoint = MGF_CHECKPOINT
        opt.result_video = MGF_RESULT_VIDEO

    if int(opt.log) < 3:
        display_info(opt, '')

    return opt


# First Order Motion Model 処理プロセス
#
def fomm_process(opt, disp_f = True):

    # ファイルの存在確認
    if not os.path.isfile(opt.source_image):
        print(RED + f"File not found !! '{opt.source_image}' " + NOCOLOR)
        return
    if not os.path.isfile(opt.driving_video):
        print(RED + f"File not found !! '{opt.driving_video}' " + NOCOLOR)
        return

    # 静止画/動画 読み出し
    source_image = imageio.imread(opt.source_image)
    driving_video, fps = my_videotool.read_video(opt.driving_video)

    # 256x256 にリサイズ6
    source_image = resize(source_image, (256, 256))[..., :3]
    driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]

    # 静止画/動画から処理
    generator, kp_detector = load_checkpoints(config_path = opt.config, checkpoint_path = opt.checkpoint, cpu = opt.cpu)
    predictions = make_animation(source_image, driving_video, generator, kp_detector, relative = opt.relative, cpu = opt.cpu)

    # 出力ファイル名
    out_path1 = ''                                                  # 処理結果画像
    out_path2 = ''                                                  # 静止画/元動画/処理結果画像
    ext = ''
    if len(opt.result_video) > 0:
        out_path1, out_path2, ext = get_results_path(opt.source_image, opt.driving_video, opt.result_video)

    # 処理結果の保存 1
    if out_path1 != '':
        if ext == '.gif':
            imageio.mimsave(out_path1, [img_as_ubyte(frame) for frame in predictions], fps = fps, loop = 0)
        else:
            imageio.mimsave(out_path1, [img_as_ubyte(frame) for frame in predictions], fps = fps)

        print(f" Saving... → '{out_path1}'")

        # 音声トラックの付加
        if opt.audio:
            my_videotool.add_audio(opt.driving_video, out_path1, log_f = False)

        # 生成動画の表示 1
        if disp_f:
            my_movieplay.movie_play(out_path1, title = 'Processed result image 1')

    # 静止画/元動画/処理結果画像の作成
    ani = my_videotool.img_movie3x1(source_image, driving_video, predictions, interval = fps)

    # 処理結果の保存 2
    if out_path2 != '':
        my_videotool.save_video(ani, out_path2)
        print(f" Saving... → '{out_path2}'")

        # 音声トラックの付加
        if opt.audio:
            my_videotool.add_audio(opt.driving_video, out_path2, log_f = False)

        # 入力画像/元動画/生成動画の表示 2
        if disp_f:
            my_movieplay.movie_play(out_path2, title = 'Processed result image 2')

    return out_path1, out_path2

# main関数エントリーポイント(実行開始)
if __name__ == "__main__":
    parser = parse_args()
    opt = parser.parse_args()

    # カテゴリー別の前処理
    if opt.category[0] == '1':                                      # 1: ファッション
        opt.config = CONFIG_FAS
        opt.checkpoint = FAS_CHECKPOINT
        if len(opt.category) > 1:                                   # '10' の時は初期設定
            opt.source_image = FAS_SOURCE_IMAGE
            opt.driving_video = FAS_DRIVING_IMAGE
            opt.result_video = FAS_RESULT_VIDEO

    elif opt.category[0] == '2':                                    # 2: 太極拳
        opt.config = CONFIG_TAI
        opt.checkpoint = TAI_CHECKPOINT
        if len(opt.category) > 1:                                   # '20' の時は初期設定
            opt.source_image = TAI_SOURCE_IMAGE
            opt.driving_video = TAI_DRIVING_IMAGE
            opt.result_video = TAI_RESULT_VIDEO

    elif opt.category[0] == '3':                                    # 3: アニメーション
        opt.config = CONFIG_MGF
        opt.checkpoint = MGF_CHECKPOINT
        if len(opt.category) > 1:                                   # '30' の時は初期設定
            opt.source_image = MGF_SOURCE_IMAGE
            opt.driving_video = MGF_DRIVING_IMAGE
            opt.result_video = MGF_RESULT_VIDEO

    display_info(opt, title)
    fomm_process(opt)

    print('\nFinished.\n')

↑

GUI で操作できるプログラム「fomm_test.py」を作成する ※ 2024/11/02 改版 †

主な機能
・カテゴリーを指定すると、学習済みモデルの設定を自動でできるようにする
・入力として必要な静止画像と動画は ダイアログにより選択指定する
・オリジナルのオプションパラメータ指定はそのまま利用できる
・GPU未搭載やメモリー容量などで CUDAエラーが起きる場合は「--cpu」オプションを付加する
・「--relative」「--audio」のオプションは「demo.py」と反対に True をデフォールト設定とする
・処理後に生成される動画と元の静止画/元の動画/処理結果の動画を生成し表示する
・カテゴリー以外のオプション指定なしで動作させることが可能

出力ファイルの保存場所とファイル名（--result_video './result/face.mp4' 指定の時）
・「./result」フォルダに保存される（「./result」フォルダは存在しなければならない）
・静止画から生成された動画 → 'face_ + <静止画> + <元動画> + .mp4'
・静止画/元動画/動画一覧　 → 'face_ + <静止画> + <元動画> + _a + .mp4'

コマンドオプション一覧

コマンドオプション	引数	初期値	意味
-c, --category	str	'0'	カテゴリー指定（必須）
--config	str	指定しなければ内部設定※	学習済みモデルの設定ファイル（.yaml）
--checkpoint	str		学習済みモデル・ファイル
--source_image	str		静止画ファイルパス
--driving_video	str		動画ファイルパス
--result_video	str		出力保存ファイルパス
--relative	bool	True	use relative or absolute keypoint coordinates
--adapt_scale	bool	True	adapt movement scale based on convex hull of keypoints
--find_best_frame	bool	False	Generate from the frame that is the most alligned with source. (Only for faces, requires face_aligment lib
--best_frame	int	None	Set frame to start from.
--cpu	bool	False	cpu mode.
--audio	bool	True	copy audio to output from the driving video
--log	int	3	Log level(-1/0/1/2/3/4/5)

※ 指定がないときの内部指定

コマンドオプション	-c 0 (顔)	-c 1 (ファッション)	-c 2 (太極拳)	-c 3 (アニメーション)
--config	./config/vox-256.yaml	./config/fashion-256.yaml	./config/taichi-256.yaml	./config/mgif-256.yaml
--checkpoint	./sample/vox-cpk.pth.tar	./sample/fashion.pth.tar	./sample/taichi-cpk.pth.tar	./sample/mgif-cpk.pth.tar
--source_image	ダイアログによる静止画選択
--driving_video	ダイアログによる動画選択
--result_video	./result/face.mp4	./result/fashion.mp4	./result/taich.mp4	./result/mgif.gif

← プログラム実行中にダイアログからファイルを選択

コマンド実行例

(py38_learn) python fomm_test.py

First Order Motion Model Test Ver. 0.02: Starting application...

   - Category        : 0: ** Face **
   - source_image    : ./sample/images/04.jpg
   - driving_video   : ./sample/videos/2.mp4
   - result_video    : ./result/face.mp4
   - audio           : True
   - cpu             : True
   - log             : 3

100%|████████████████████████████████████████████████████████████████████████████████| 211/211 [02:17<00:00,  1.54it/s]
 Saving... → './result/face_04_2.mp4'
 Saving... → './result/face_04_2_a.mp4'

 Finished.

・GPU動作の場合

100%|████████████████████████████████████████████████████████████████████████████████| 211/211 [00:03<00:00, 60.16it/s]

終了前に生成されたファイル一覧（./result フォルダ）を表示する

モジュール・ソースコード

▼「fomm_test.py」

# -*- coding: utf-8 -*-
##------------------------------------------
##   First Order Motion Model Test  Ver 0.02
##
##               2024.06.28 Masahiro Izutsu
##------------------------------------------
## fomm_test.py
##      Ver 0.02    2024/10/27  処理結果一覧追加

import warnings
warnings.simplefilter('ignore')

from torch.cuda import is_available
gpu_d = is_available()                                          # GPU 確認

# import処理
import os
import argparse

import my_logging
import my_thumbnail
import fomm

# 定数定義
DEF_THEME = 'BlueMono'
DEF_IMAGE_DIR = './sample/images'
DEF_VIDEO_DIR = './sample/videos'
RESULT_PATH = './result'

# タイトル
title = 'First Order Motion Model Test Ver. 0.02'

# Parses arguments for the application
def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('-c', '--category', default = '0', type = str, help = 'Category (0:face, 1:fashion, 2:tai chi, 3:motion GIF) Default is 0')
    parser.add_argument("--source_image", default = '', help="path to source image")
    parser.add_argument("--driving_video", default = '', help="path to driving video")
    parser.add_argument("--result_video", default = '', help="path to output")
    parser.add_argument("--audio", dest="audio", action="store_false", help="copy audio to output from the driving video" )
    parser.add_argument("--cpu", dest="cpu", action="store_true", help="cpu mode.")
    parser.add_argument('--log', metavar = 'LOG', default = '3', help = 'Log level(-1/0/1/2/3/4/5) Default value is \'3\'')

    parser.add_argument("--config", default = '', help="path to config")
    parser.add_argument("--checkpoint", default= '', help="path to checkpoint to restore")
    parser.add_argument("--relative", dest="relative", action="store_false", help="use relative or absolute keypoint coordinates")
    parser.add_argument("--adapt_scale", dest="adapt_scale", action="store_false", help="adapt movement scale based on convex hull of keypoints")
    parser.add_argument("--find_best_frame", dest="find_best_frame", action="store_true",
                        help="Generate from the frame that is the most alligned with source. (Only for faces, requires face_aligment lib)")
    parser.add_argument("--best_frame", dest="best_frame", type=int, default=None, help="Set frame to start from.")

    return parser

# main関数エントリーポイント(実行開始)
if __name__ == "__main__":
    parser = parse_args()
    opt = parser.parse_args()

    # アプリケーション・ログ設定
    module = os.path.basename(__file__)
    module_name = os.path.splitext(module)[0]
    logger = my_logging.get_module_logger_sel(module_name, int(opt.log))

    if opt.cpu:
        gpu_d = False

    if len(opt.source_image) == 0:
        msg = '元画像の選択:　' + os.getcwd() + DEF_IMAGE_DIR[1:]
        opt.source_image = my_thumbnail.image_dialog(file_path=DEF_IMAGE_DIR, title=msg, theme=DEF_THEME, xn=10, yn=4, thumb_size=128, gap=4, logger=logger)
        if len(opt.source_image) == 0:
            exit(0)

    if len(opt.driving_video) == 0:
        msg = '参照動画の選択:　' + os.getcwd() + DEF_VIDEO_DIR[1:]
        opt.driving_video = my_thumbnail.movie_dialog(file_path=DEF_VIDEO_DIR, title=msg, theme=DEF_THEME, xn=10, yn=4, thumb_size=128, gap=4, audio_f=True, logger=logger)
        if len(opt.driving_video) == 0:
            exit(0)

    # カテゴリー別の前処理
    opt = fomm.set_category_opt(opt)

    fomm.display_info(opt, title)
    fomm.fomm_process(opt, True)

    msg = '処理結果一覧:　' + os.getcwd() + RESULT_PATH[1:]
    my_thumbnail.file_dialog(file_path=RESULT_PATH, title=msg, theme=DEF_THEME, xn=10, yn=4, thumb_size=128, gap=4, ret='Exit', audio_f=True, logger=logger)

    logger.info('\nFinished.\n')

↑

顔のカテゴリー †

↑

サンプル画像 †

静止画サンプル

動画サンプル

↑

生成される画像例 †

ニュースのビデオと静止画から動画を生成（音声付き）

↑

その他のカテゴリー †

↑

サンプル画像 †

静止画サンプル

動画サンプル

↑

全身のカテゴリー †

コマンド実行例

(py38_learn) python fomm_test.py -c 1

First Order Motion Model Test Ver. 0.02: Starting application...

   - Category        : 1: ** Fashion **
   - source_image    : ./sample/images/fashion003x.png
   - driving_video   : ./sample/videos/fashion01x.mp4
   - result_video    : ./result/fashion.mp4
   - audio           : True
   - cpu             : True
   - log             : 3

100%|████████████████████████████████████████████████████████████████████████████████| 395/395 [04:21<00:00,  1.51it/s]
 Saving... → './result/fashion_fashion003x_fashion01x.mp4'
 Saving... → './result/fashion_fashion003x_fashion01x_a.mp4'

 Finished.

・GPU動作の場合

100%|████████████████████████████████████████████████████████████████████████████████| 395/395 [00:04<00:00, 79.97it/s]

↑

太極拳(Taichi) のカテゴリー †

コマンド実行例

(py38_learn) python fomm_test.py -c 2

First Order Motion Model Test Ver. 0.02: Starting application...

   - Category        : 2: ** Tai chi **
   - source_image    : ./sample/images/fashion004x.png
   - driving_video   : ./sample/videos/taichi2.mp4
   - result_video    : ./result/taich.mp4
   - audio           : True
   - cpu             : True
   - log             : 3

100%|████████████████████████████████████████████████████████████████████████████████| 365/365 [04:04<00:00,  1.49it/s]
 Saving... → './result/taich_taichi001x_taichi2.mp4'
 Saving... → './result/taich_taichi001x_taichi2_a.mp4'

 Finished.

・GPU動作の場合

100%|████████████████████████████████████████████████████████████████████████████████| 365/365 [00:04<00:00, 76.69it/s]

↑

アニメーション(Moving GIF) のカテゴリー †

コマンド実行例（結果の出力画像は GIF形式のみ）

(py38_learn) python fomm_test.py -c 3

First Order Motion Model Test Ver. 0.02: Starting application...

   - Category        : 3: ** Moving GIF **
   - source_image    : ./sample/images/anim02.png
   - driving_video   : ./sample/videos/anim_00055x.mp4
   - result_video    : ./result/mgif.gif
   - audio           : True
   - cpu             : True
   - log             : 3

100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:20<00:00,  1.49it/s]
 Saving... → './result/mgif_anim02_anim_00055x.gif'
 Saving... → './result/mgif_anim02_anim_00055x_a.gif'

 Finished.

・GPU動作の場合

100%|██████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 66.94it/s]

↑

ここまでのまとめ †

ハードウェアによる実行速度の違い

機能	プログラム	GPU			CPU
機能	プログラム	RTX 4070Ti	RTX 4060	GTX 1050	i9-13900	i7-14700	i7-1260P
トランプのように話すモナリザ	demo2.py	2秒	4秒	18秒	2分20秒	3分4秒	4分39秒
トランプ似のモナリザ	demo2.py	2秒	5秒	18秒	2分19秒	3分5秒	4分35秒
トランプのように話すモナリザ	fomm.py	2秒	4秒	17秒	2分24秒	3分5秒	4分40秒
トランプのように話す北川景子	fomm_test.py	3秒	4秒	18秒	2分17秒	2分59秒	4分14秒
「ファッション」カテゴリー	fomm_test.py	4秒	8秒	35秒	4分21秒	5分36秒	7分58秒
「アニメーション」カテゴリー	fomm_test.py	0秒	0秒	2秒	20秒	25秒	36秒
「太極拳」カテゴリー	fomm_test.py	4秒	8秒	32秒	4分4秒秒	5分10秒	7分22秒

↑

First Order Motion Model (GUI) †

↑

対処した問題点とエラー詳細 †

↑

「demo.py」→「demo2.py」変更点 †

「imageio」パッケージに関するワーニングエラーを消す → 6行目から

import imageio.v2 as imageio                            # 2024/06/14    warning error 対応

import warnings
warnings.simplefilter('ignore', UserWarning)

実行前にコマンドオプションを表示 / 入力ソースファイルの存在確認を追加 → 134行目から~

    display_info(opt)                                           # 2024/06/17 基本情報の表示

    # ファイルの存在確認
    if not os.path.isfile(opt.source_image):
        print(RED + f"File not found !! '{opt.source_image}' " + NOCOLOR)
        quit()
    if not os.path.isfile(opt.driving_video):
        print(RED + f"File not found !! '{opt.driving_video}' " + NOCOLOR)
        quit()

出力ファイルに対する処理変更 → 163行目から

    # 出力が GIFファイルの時はループする
    name, ext = splitext(opt.result_video)
    if ext == '.gif':
        imageio.mimsave(opt.result_video, [img_as_ubyte(frame) for frame in predictions], fps = fps, loop = 0)
    else:
        imageio.mimsave(opt.result_video, [img_as_ubyte(frame) for frame in predictions], fps = fps)

    if opt.audio:
        try:
            # 一時ファイル・エラー対応  2024/06/18
            with tempfile.TemporaryDirectory() as str_temp_dir:
                tmpfile = f"{str_temp_dir}/temp.mp4"
                ffmpeg.output(ffmpeg.input(opt.result_video).video, ffmpeg.input(opt.driving_video).audio, tmpfile, c='copy').run(quiet=True)
                with open(opt.result_video, 'wb') as result:
                    with open(tmpfile, 'rb') as output:
                        copyfileobj(output, result)
        except ffmpeg.Error:
            print(RED + f"Failed to copy audio: the driving video may have no audio track or the audio format is invalid." + NOCOLOR)

↑

RuntimeError: CUDA error: an illegal memory access was encountered †

エラー内容

Traceback (most recent call last):
  File "fomm.py", line 206, in <module>
    fomm_process(opt)
  File "fomm.py", line 132, in fomm_process
    predictions = make_animation(source_image, driving_video, generator, kp_detector, relative = opt.relative, cpu = opt.cpu)
  File "H:\anaconda_win\workspace_2\first-order-model\demo2.py", line 141, in make_animation
    predictions.append(np.transpose(out['prediction'].data.cpu().numpy(), [0, 2, 3, 1])[0])
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

対応策
GPUメモリ不足なので CPU動作にする（--cpu オプションの付加）

↑

Cannot open shared library libasound_module_conf_pulse.so †

エラー内容
・Ubuntu20.04/22.04 環境でのみ発生（エラーは発生するが音は出る）

ALSA lib conf.c:4028:(snd_config_hooks_call) Cannot open shared library libasound_module_conf_pulse.so (/home/mizutu/anaconda3/envs/py38_learn/lib/alsa-lib/libasound_module_conf_pulse.so: cannot open shared object file: No such file or directory)
        :
ALSA lib dlmisc.c:339:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_samplerate.so (/home/mizutu/anaconda3/envs/py38_learn/lib/alsa-lib/libasound_module_rate_samplerate.so: cannot open shared object file: No such file or directory)
        :
ALSA lib conf.c:4028:(snd_config_hooks_call) Cannot open shared library libasound_module_conf_pulse.so (/home/mizutu/anaconda3/envs/py38_learn/lib/alsa-lib/libasound_module_conf_pulse.so: 共有オブジェクトファイルを開けません: そのようなファイルやディレクトリはありません)
        :
ALSA lib dlmisc.c:339:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_samplerate.so (/home/mizutu/anaconda3/envs/py38_learn/lib/alsa-lib/libasound_module_rate_samplerate.so: 共有オブジェクトファイルを開けません: そのようなファイルやディレクトリはありません)
        :

対応策
・音が出ない場合（20.04）下記コマンドで pulseaudioを再起動する
```
pulseaudio --kill
pulseaudio --start
```
・次のようなシンボリック・リンクを貼ることで「ファイルが見つからない」エラーは抑制できる
　/home/mizutu/anaconda3/envs/py38_learn/lib/alsa-lib → /usr/lib/x86_64-linux-gnu/alsa-lib
```
cd /home/mizutu/anaconda3/envs/py38_learn/lib
ln -s /usr/lib/x86_64-linux-gnu/alsa-lib alsa-lib
```
・動作に支障はないのでそのほかのエラー出力については保留（調査中）

↑

First Order Motion Model ここまでのまとめ †

First Order Motion Model for Image Animation より

↑

installation（環境構築） †

前回検証の GoogleColab ではなくローカルマシン上に動作環境を作る
Python3.11 環境ではうまくいかないようなので、新しく「Python3.8」環境を構築した
→ 『仮想環境 (py38_learn)』
事前学習済みモデル（Pre-trained checkpoint）を使用する
→ GoiogleDrive からダウンロード

↑

わかったこと †

オリジナルの「Animation Demo」実行には多少のソースコード修正が必要（「demo.py」→「demo2.py」）
GPU環境を基本としているが CPUでも動作する（ただし速度は遅い）
・「--cpu」オプションを付加して実行する
　実行速度例：　GPU 2秒 → CPU 2分20秒
学習「Training」ついては保留とする
・データセットと学習時間の問題がクリアできていない
「Face-Swap」について手順が明確でないが次節で検証する
→ 動画のパーツを入れ替える：Motion Supervised co-part Segmentation

↑

更新履歴 †

2024/06/18 初版
2024/11/02 改版 GUI対応

↑

参考資料 †

First Order Motion Model

Error

Others
- cedro-blog: AI（人工知能）

GanFOMM2 のバックアップ(No.36)

静止画から動画を作る：First Order Motion Model（その２） †

First Order Motion Model †

概要 †

実行環境の構築 †

前準備 †

入力となる 静止画と動画について †

提供されているデモ「demo.py」を試す †

カテゴリーを簡単に指定できるプログラム「fomm.py」を作成する ※ 2024/11/02 改版 †

GUI で操作できるプログラム「fomm_test.py」を作成する ※ 2024/11/02 改版 †

顔のカテゴリー †

サンプル画像 †

生成される画像例 †

その他のカテゴリー †

サンプル画像 †

全身のカテゴリー †

太極拳(Taichi) のカテゴリー †

アニメーション(Moving GIF) のカテゴリー †

ここまでのまとめ †

First Order Motion Model (GUI) †

対処した問題点とエラー詳細 †

「demo.py」→「demo2.py」変更点 †

RuntimeError: CUDA error: an illegal memory access was encountered †

Cannot open shared library libasound_module_conf_pulse.so †

First Order Motion Model ここまでのまとめ †

installation（環境構築） †

わかったこと †

更新履歴 †

参考資料 †

入力となる静止画と動画について †