GanFOMM2 のバックアップ(No.9) - PukiWiki

[ トップ ] [ 一覧 | 検索 | 履歴 | ログイン ]

私的AI研究会 > GanFOMM2

静止画から動画を作る：First Order Motion Model（その２）== 編集中 == †

　同じカテゴリの静止画と動画を使って、静止画を動画のように動かす「First Order Motion Model」技術を使って静止画を動画にする

※ 最終更新:2024/06/22　

First Order Motion Model †

概要 †

「first-order-model」は、2019年に発表された『First Order Motion Model for Image Animation』という論文に基づいて作成された、同じカテゴリの静止画と動画を使って、静止画を動画のように動かすというモデル
静止画 source と動画 Driving Frame から入力動画の動きに沿って動く入力画像の動画を生成する
学習時は、静止画と動画は同一の動画から任意のフレームを選択して行う
推論時は、学習時と同じカテゴリーであれば静止画と動画とも任意のもので行うことができる

モデル概要図（下記論文所収）
論文「First Order Motion Model for Image Animation」
<Official site>
・https://aliaksandrsiarohin.github.io/first-order-model-website/
<paper>
・https://papers.nips.cc/paper_files/paper/2019/file/31c0b36aef265d9221af80872ceb62f9-Paper.pdf
・(new) https://arxiv.org/pdf/2104.11280
<framework>
・https://github.com/AliaksandrSiarohin/first-order-model
・(new) https://github.com/snap-research/articulated-animation

「Google Colab」での実行デモ
・静止画から作るフェイク動画：First Order Motion Model

実行環境の構築 †

GitHub サイトからプロジェクトをダウンロード

cd /anaconda_win/workspace_2　　　　　　　　　　　　　　　　　　　　　← Windows の場合
cd ~/workspace_2　　　　　　　　　　　　　　　　　　　　　　　　　　　← Linux の場合

git clone https://github.com/AliaksandrSiarohin/first-order-model

プロジェクト・パッケージ update_2024XXXX.zip (XXXMB) <アップデートファイル> をダウンロード
・解凍してできるフォルダ

update
├─workspace_2
│  ├─first-order-model　　　　　　　　　　　　　　　　　　　　　　← GitHub からクローンしたプロジェクトに上書きする
│  │  ├─result
│  │  ├─result_save
│  │  └─sample
│  │      ├─images
│  │      └─videos
│  └─mylib2 　　　　　　　　　　　　　　　　　　　　　　　　　　　← ローカル環境で実行するための汎用ライブラリ
│      ├─mylib_test
│      └─result
└─workspace_py37
    └─mylib　　　　　　　　　　　　　　　　　　　　　　　　　　　　← 私的汎用ライブラリ

・解凍してできる「update/」フォルダ以下を次のフォルダの下に上書きコピーする
　Windows の場合 →「anaconda_win/」　Linux の場合 → 「~/」

新しく仮想環境「py38_learn」を構築する
『仮想環境 (py38_learn)』の手順で仮想環境を作成

前準備 †

ローカル環境で「First Order Motion Model」を実行するために必要となるライブラリを作成する（上記プロジェクト・パッケージに含む）
→ Python 私的汎用ライブラリ２

「mylib2/」フォルダにパスが通っているか環境変数（PYTHONPATH）を確認する

echo $env:PYTHONPATH　　　　　　　　　　　　　　　　　　　　　　　　← Windows の場合
printenv PYTHONPATH 　　　　　　　　　　　　　　　　　　　　　　　　← Linux の場合

提供されているデモ「demo.py」を試す †

学習済みモデル（プロジェクト・パッケージに組み込み済み）を使ってデモプログラムを動かす
提供されている「demo.py」は若干の不具合があるので対処した版を「demo2.py」とする
GPU搭載メモリー容量などの関係で CUDAエラーが起きる場合は「--cpu」オプションを付加する
処理結果は「--result_video <filepath>」オプションで指定するファイルに出力される

静止画はモナリザ、動画はトランプ大統領のサンプルで実行する →「トランプのように話すモナリザ」

(py38_learn) python demo2.py  --config config/vox-256.yaml --driving_video './sample/videos/2.mp4' --source_image './sample/images/05.png' --checkpoint './sample/vox-cpk.pth.tar' --relative --adapt_scale --result_video result.gif --cpu

FOMM Demo (demo2.py) Ver. 0.01: Starting application...

   - config          :  config/vox-256.yaml
   - checkpoint      :  ./sample/vox-cpk.pth.tar
   - source_image    :  ./sample/images/05.png
   - driving_video   :  ./sample/videos/2.mp4
   - result_video    :  result.gif
   - relative        :  True
   - adapt_scale     :  True
   - find_best_frame :  False
   - best_frame      :  None
   - cpu             :  True
   - audio           :  False

100%|████████████████████████████████████████████████████████████████████████████████| 211/211 [02:20<00:00,  1.50it/s]

顔を動画の方に合わせるモードで実行する →「トランプ似のモナリザ」

(py38_learn) python demo2.py  --config config/vox-256.yaml --driving_video './sample/videos/2.mp4' --source_image './sample/images/05.png' --checkpoint './sample/vox-cpk.pth.tar' --adapt_scale --result_video result1.gif --cpu

FOMM Demo (demo2.py) Ver. 0.01: Starting application...

   - config          :  config/vox-256.yaml
   - checkpoint      :  ./sample/vox-cpk.pth.tar
   - source_image    :  ./sample/images/05.png
   - driving_video   :  ./sample/videos/2.mp4
   - result_video    :  result1.gif
   - relative        :  False
   - adapt_scale     :  True
   - find_best_frame :  False
   - best_frame      :  None
   - cpu             :  True
   - audio           :  False

100%|████████████████████████████████████████████████████████████████████████████████| 211/211 [02:19<00:00,  1.51it/s]

モジュール・ソースコード

▼「demo2.py」

▲「demo2.py」

カテゴリーを簡単に指定できるプログラム「fomm.py」を作成する~ †

学習済みモデルの場合のカテゴリー別オプション指定

カテゴリー	--config	--checkpoint	--source_image	--driving_video	出力例
顔	config/ vox-256.yaml	./sample/ vox-cpk.pth.tar	./sample/images/ 05.png	./sample/videos/ 2.mp4	モナリザ → トランプ
顔	config/ vox-256.yaml	./sample/ vox-cpk.pth.tar	./sample/images/ pic6.png	./sample/ videos/hinton.mp4	松嶋菜々子 → ヒントン教授
ファッション	config/ fashion-256.yaml	./sample/ fashion.pth.tar	./sample/images/ fashion003x.png	./sample/videos/ fashion01x.mp4	波瑠 → モデル
アニメーション	config/ mgif-256.yaml	./sample/ mgif-cpk.pth.tar	./sample/images/ anim02.png	./sample/videos/ anim_00055x.mp4	馬のアニメーション
太極拳	config/ taichi-256.yaml	./sample/ taichi-cpk.pth.tar	./sample/images/ taichi001x.jpg	./sample/videos/ taichi2.mp4	石原さとみ → 太極拳

「-c, --category <オプション>」を追加して設定を切り替える（その他のオプション指定はそのまま有効）

カテゴリー	オプション	--config	--checkpoint	--source_image	--driving_video	内容
顔	-c 0	初期値	初期値	任意指定	任意指定	指定ソースで変換処理
顔	-c 00	初期値	初期値	初期値	初期値	モナリザ → トランプ
ファッション	-c 1	初期値	初期値	任意指定	任意指定	指定ソースで変換処理
ファッション	-c 10	初期値	初期値	初期値	初期値	波瑠 → モデル
アニメーション	-c 2	初期値	初期値	任意指定	任意指定	指定ソースで変換処理
アニメーション	-c 20	初期値	初期値	初期値	初期値	馬のアニメーション
太極拳	-c 3	初期値	初期値	任意指定	任意指定	指定ソースで変換処理
太極拳	-c 30	初期値	初期値	初期値	初期値	石原さとみ → 太極拳

コマンド実行例

(py38_learn) python fomm.py -c 00 --cpu

First Order Motion Model Ver. 0.01: Starting application...

   - Category        :  00: ** Face **
   - config          :  ./config/vox-256.yaml
   - checkpoint      :  ./sample/vox-cpk.pth.tar
   - source_image    :  ./sample/images/05.png
   - driving_video   :  ./sample/videos/2.mp4
   - result_video    :  ./result/face.mp4
   - relative        :  True
   - adapt_scale     :  True
   - find_best_frame :  False
   - best_frame      :  None
   - cpu             :  True
   - audio           :  True

100%|████████████████████████████████████████████████████████████████████████████████| 211/211 [02:24<00:00,  1.46it/s]
 Saving... → './result/face_05_2.mp4'
 Saving... → './result/face_05_2_a.mp4'

 Finished.

モジュール・ソースコード

▼「fomm.py」

▲「fomm.py」

GUI で操作できるプログラム「fomm_test.py」を作成する~ †

コマンドオプション一覧

コマンドオプション	引数	初期値	意味
-c, --category	str	'0'	カテゴリー指定（必須）
--config	str	指定しなければ内部設定	学習済みモデルの設定ファイル（.yaml）
--checkpoint	str		学習済みモデル・ファイル
--source_image	str		静止画ファイルパス
--driving_video	str		動画ファイルパス
--result_video	str		出力保存ファイルパス
--relative	bool	True	use relative or absolute keypoint coordinates
--adapt_scale	bool	True	adapt movement scale based on convex hull of keypoints
--find_best_frame	bool	False	Generate from the frame that is the most alligned with source. (Only for faces, requires face_aligment lib
--best_frame	int	None	Set frame to start from.
cpu	bool	False	cpu mode.
--audio	bool	True	copy audio to output from the driving video

モジュール・ソースコード

▼「fomm_test.py」

▲「fomm_test.py」

顔のカテゴリー †

サンプル画像 †

静止画サンプル

動画サンプル

生成される画像例 †

ニュースのビデオと静止画から動画を生成（音声付き）

その他のカテゴリー †

サンプル画像 †

静止画サンプル

動画サンプル

全身のカテゴリー †

アニメーション(Moving GIF) のカテゴリー †

太極拳(Taichi) のカテゴリー †

対処した問題点とエラー詳細 †

「demo.py」→「demo2.py」変更点 †

「imageio」パッケージに関するワーニングエラーを消す → 6行目から

import imageio.v2 as imageio                            # 2024/06/14    warning error 対応

import warnings
warnings.simplefilter('ignore', UserWarning)

実行前にコマンドオプションを表示 / 入力ソースファイルの存在確認を追加 → 134行目から~

    display_info(opt)                                           # 2024/06/17 基本情報の表示

    # ファイルの存在確認
    if not os.path.isfile(opt.source_image):
        print(RED + f"File not found !! '{opt.source_image}' " + NOCOLOR)
        quit()
    if not os.path.isfile(opt.driving_video):
        print(RED + f"File not found !! '{opt.driving_video}' " + NOCOLOR)
        quit()

出力ファイルに対する処理変更 → 163行目から

    # 出力が GIFファイルの時はループする
    name, ext = splitext(opt.result_video)
    if ext == '.gif':
        imageio.mimsave(opt.result_video, [img_as_ubyte(frame) for frame in predictions], fps = fps, loop = 0)
    else:
        imageio.mimsave(opt.result_video, [img_as_ubyte(frame) for frame in predictions], fps = fps)

    if opt.audio:
        try:
            # 一時ファイル・エラー対応  2024/06/18
            with tempfile.TemporaryDirectory() as str_temp_dir:
                tmpfile = f"{str_temp_dir}/temp.mp4"
                ffmpeg.output(ffmpeg.input(opt.result_video).video, ffmpeg.input(opt.driving_video).audio, tmpfile, c='copy').run(quiet=True)
                with open(opt.result_video, 'wb') as result:
                    with open(tmpfile, 'rb') as output:
                        copyfileobj(output, result)
        except ffmpeg.Error:
            print(RED + f"Failed to copy audio: the driving video may have no audio track or the audio format is invalid." + NOCOLOR)

更新履歴 †

2024/06/18 初版

参考資料 †

First Order Motion Model

Error
- Windows環境：`NamedTemporaryFile`で作成したファイルを再度openすると、`PermissionError`が発生する
- Pythonで一時ディレクトリを作って安全に後始末する

Others
- cedro-blog: AI（人工知能）