私的AI研究会 > TalkFace2
音声と1枚の顔画像を使って、顔画像がまるで話しているような動画を作成する技術「One Shot Talking Face」をローカルマシンで動かす
Windows マシン上での生成環境構築に難があるので、Linux 上で生成した結果を用いたデモプログラムを紹介する
本プログラムは Linux生成環境では未生成の画像生成を実行できる
project_talking-face └─workspace_2 └─one-shot-talking-face ├─results │ ├─phone │ └─text ├─select │ ├─audios │ └─images └─train・解凍してできる「project_talking-face/」フォルダ内を次のフォルダの下に上書きコピーする
コマンドオプション | 引数 | 初期値 | 意味 |
--audio_file | str | './select/audios/obama2.wav' | 音声ファイルパス |
--source_dir | str | './select/images/d5.jpg' | 静止画ファイルパス |
--result_path | str | './result' | 出力保存ディレクトリ |
--log | int | 3 | Log level(-1/0/1/2/3/4/5) |
(py38_learn) python talk_face.py One Shot Talking Face (GUI) Ver. 0.01: Starting application... - audio_file : ./select/audios/obama2.wav - source_dir : ./select/images/d5.jpg - result_path : ./results - cpu : False - log : 3 Finished.
(py38_learn) pip install pocketsphinx
コマンドオプション | 引数 | 初期値 | 意味 |
--audio_file | str | './select/audios/obama2.wav' | 音声ファイルパス |
--result_path | str | './result' | 出力保存ディレクトリ |
--log | int | 3 | Log level(-1/0/1/2/3/4/5) |
(py38_learn) python talk_text.py Talk to text (GUI) Ver. 0.01: Starting application... - audio_file : ./select/audios/obama2.wav - result_path : ./results - log : 3 Finished.
(py38_learn) python speak2text.py ./select/audios/obama2.wav hi everybody but i am thank you for that you've won too much like could not be prouder of everything you got your time with the obama foundation and of course i couldn't be prouder of all of you in the graduating class of twenty twenty four teachers and coaches most of all parents and family who guided you won't work op graduating is a big achievement on or any circumstances some of you had overcome serious obstacles long way were there was no lose work or losing a job living in a neighborhood where people to watch
(py38_learn) winget search jq 名前 ID バージョン ソース ----------------------------------------------- JQuery参考手册 9NBLGGH4P48H Unknown msstore jq jqlang.jq 1.7.1 winget (py38_learn) winget install jqlang.jq 見つかりました jq [jqlang.jq] バージョン 1.7.1 このアプリケーションは所有者からライセンス供与されます。 Microsoft はサードパーティのパッケージに対して責任を負わず、ライセンスも付与しません。 ダウンロード中 https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-windows-amd64.exe ██████████████████████████████ 962 KB / 962 KB インストーラーハッシュが正常に検証されました パッケージのインストールを開始しています... コマンド ライン エイリアスが追加されました: "jq" パス環境変数が変更されました; 新しい値を使用するにはシェルを再起動してください。 インストールが完了しました
(py38_learn) jq --help jq - commandline JSON processor [version 1.7.1] :(参考)アンインストール
(py38_learn) winget uninstall jqlang.jq 見つかりました jq [jqlang.jq] パッケージのアンインストールを開始しています... 正常にアンインストールされました2. オフィシャルサイト から Win 版をダウンロードして使用する
(py38_learn) jq-win -V jq-1.7.1
(py38_learn) sudo apt install jq [sudo] XXXX のパスワード: パッケージリストを読み込んでいます... 完了 : (py38_learn) jq -V jq-1.6
(py38_learn) cat ./train/audio.txt | jq-win '[.w[]|{word: (.t | ascii_upcase | sub(\"<S>\"; \"sil\") | sub(\"<SIL>\"; \"sil\") | sub(\"\\(2\\)\"; \"\") | sub(\"\\(3\\)\"; \"\") | sub(\"\\(4\\)\"; \"\") | sub(\"\\[SPEECH\\]\"; \"SIL\") | sub(\"\\[NOISE\\]\"; \"SIL\")), phones: [.w[]|{ph: .t | sub(\"\\+SPN\\+\"; \"SIL\") | sub(\"\\+NSN\\+\"; \"SIL\"), bg: (.b*100)|floor, ed: (.b*100+.d*100)|floor}]}]' [ { "word": "sil", "phones": [ { "ph": "SIL", "bg": 0, "ed": 13 } :
(py38_learn_test2) cat ./train/audio.txt | jq-win '[.w[]|{word: (.t | ascii_upcase | sub(\"<S>\"; \"sil\") | sub(\"<SIL>\"; \"sil\") | sub(\"\\(2\\)\"; \"\") | sub(\"\\(3\\)\"; \"\") | sub(\"\\(4\\)\"; \"\") | sub(\"\\[SPEECH\\]\"; \"SIL\") | sub(\"\\[NOISE\\]\"; \"SIL\")), phones: [.w[]|{ph: .t | sub(\"\\+SPN\\+\"; \"SIL\") | sub(\"\\+NSN\\+\"; \"SIL\"), bg: (.b*100)|floor, ed: (.b*100+.d*100)|floor}]}]' > ./train/audio.json
『One Shot Talking Face を使って音声で顔画像を動かす』 をローカルマシンに移植する
(py38_learn) sudo apt install git-lfs
(py38_learn) git clone https://github.com/cmusphinx/pocketsphinx.git (py38_learn) cd pocketsphinx/ (py38_learn) cmake -S . -B build (py38_learn) cmake --build build (py38_learn) sudo cmake --build build --target install
(py38_learn) pocketsphinx Usage: pocketsphinx [PARAMS] [soxflags | config | help | help-config | live | single | align] INPUTS... Examples: sox input.mp3 $(pocketsphinx soxflags) | pocketsphinx single - sox -qd $(pocketsphinx soxflags) | pocketsphinx live - pocketsphinx single INPUT pocketsphinx align INPUT WORDS... For detailed PARAMS values, run pocketsphinx help-config
(py38_learn) git clone https://huggingface.co/camenduru/pocketsphinx-20.04-t4 pocketsphinx
project_talking-face └─workspace_2 └─one-shot-talking-face ├─results │ ├─phone │ └─text ├─select │ ├─audios │ └─images └─train・解凍してできる「project_talking-face/」フォルダ内を「~/」フォルダの下に上書きコピーする
(py38_learn) sudo apt install jq (py38_learn) pip install python_speech_features (py38_learn) pip install pyworld
(py38_learn) pocketsphinx -phone_align yes single ./train/audio.wav {"b":0.000,"d":40.000,"p":0.000,"t":"hi everybody and i thank you for that you've won too much like could not be prouder of everything you got your time with the obama foundation then of course i couldn't be prouder of all of you in the graduating class of twenty twenty those walls the teachers and coaches the most of all parents and family who guided you won't why not graduating is a big achievement under any circumstances someone to get over com serious obstacles long wet weather was no loose worker whose good job now we're living in a neighborhood where people to walk","w":[{"b":0.000,"d":0.130,"p":0.981,"t":"<s>","w":[{"b":0.000,"d":0.130,"p":0.981,"t":"SIL"}]},{"b":0.130,"d":0.280,"p":0.985,"t":"<sil>","w":[{"b":0.130,"d":0.280,"p":0.985,"t":"SIL"}]},{"b":0.410,"d":0.170,"p":0.954,"t":"hi","w":[{"b":0.410,"d":0.070,"p":0.981,"t":"HH"},{"b":0.480,"d":0.100,"p":0.972,"t":"AY"}]},{"b":0.580,"d":0.470,"p":0.876,"t":"everybody","w":[{"b":0.580,"d":0.050,"p":0.989,"t":"EH"},{"b":0.630,"d":0.080,"p":0.971,"t":"V"},{"b":0.710,"d":0.050,"p":0.990,"t":"R"},{"b":0.760,"d":0.030,"p":0.990,"t":"IY"},{"b":0.790,"d":0.060,"p":0.995,"t":"B"},{"b":0.850,"d":0.060,"p":0.991,"t":"AA"},{"b":0.910,"d": :
(py38_learn) mizutu@ubuntu-lat:~/workspace_2/one-shot-talking-face$ pocketsphinx -phone_align yes single ./train/audio.wav $text | jq '[.w[]|{word: (.t | ascii_upcase | sub("<S>"; "sil") | sub("<SIL>"; "sil") | sub("\\(2\\)"; "") | sub("\\(3\\)"; "") | sub("\\(4\\)"; "") | sub("\\[SPEECH\\]"; "SIL") | sub("\\[NOISE\\]"; "SIL")), phones: [.w[]|{ph: .t | sub("\\+SPN\\+"; "SIL") | sub("\\+NSN\\+"; "SIL"), bg: (.b*100)|floor, ed: (.b*100+.d*100)|floor}]}]' > test.json
(py38_learn) python -B test_script.py --img_path ./train/image.png --audio_path ./train/audio.wav --phoneme_path ./test.json --save_dir ./train
(py38_learn) python -B test_script.py --img_path ./train/image.png --audio_path ./train/audio.wav --phoneme_path ./test.json --save_dir ./train Traceback (most recent call last): File "test_script.py", line 12, in <module> from tools.interface import read_img,get_img_pose,get_pose_from_audio,get_audio_feature_from_audio,\ File "/home/mizutu/workspace_2/one-shot-talking-face/tools/interface.py", line 12, in <module> import pyworld File "/home/mizutu/anaconda3/envs/py38_learn/lib/python3.8/site-packages/pyworld/__init__.py", line 17, in <module> from .pyworld import * File "pyworld/pyworld.pyx", line 1, in init pyworld.pyworld ValueError: numpy.ndarray size changed, may indicate binary incompatibility. Expected 96 from C header, got 80 from PyObject
(py38_learn_test) mizutu@ubuntu-HP-ENVY:~/workspace_2$ pip install numpy==1.23.0 Collecting numpy==1.23.0 Using cached numpy-1.23.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (2.2 kB) Using cached numpy-1.23.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.1 MB) Installing collected packages: numpy Attempting uninstall: numpy Found existing installation: numpy 1.19.5 Uninstalling numpy-1.19.5: Successfully uninstalled numpy-1.19.5 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. openvino 2022.1.0 requires numpy<1.20,>=1.16.6, but you have numpy 1.23.0 which is incompatible. Successfully installed numpy-1.23.0
(py38_learn_test) python -B test_script.py --img_path ./train/image.png --audio_path ./train/audio.wav --phoneme_path ./train/test.json --save_dir ./train Traceback (most recent call last): File "test_script.py", line 180, in <module> test_with_input_audio_and_image(args.img_path,args.audio_path,phoneme,config.GENERATOR_CKPT,config.AUDIO2POSE_CKPT,args.save_dir) : File "/home/mizutu/anaconda3/envs/py38_learn_test/lib/python3.8/subprocess.py", line 1720, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) OSError: [Errno 8] Exec format error: './OpenFace/FeatureExtraction'
(py38_learn_test) sudo apt update : (py38_learn_test) sudo apt-get install git-lfs : 以下のパッケージが新たにインストールされます: git-lfs : (py38_learn_test) git lfs install Updated git hooks. Git LFS initialized. (py38_learn_test) git clone https://huggingface.co/camenduru/one-shot-talking-face-20.04-t4 one-shot-talking-face :