GanFOMM

静止画から作るフェイク動画：First Order Motion Model †

　「First Order Motion Model」技術を使って静止画を動画にする。

静止画から作るフェイク動画：First Order Motion Model
参考資料

※ 最終更新:2024/01/21　

↑

サイト『First Order Motion Model で、モナリザをトランプ大統領のように動かす』の検証 †

↑

概要 †

静止画 source と動画 Driving Frame から入力動画の動きに沿って動く入力画像の動画を生成する。

モデル概要図（下記論文所収）詳しくは上記サイト参照
「First Order Motion Model」を、上記サイト(cedro-blog) の手順に従って検証してみたが tensorflowの 1.x 系と cuda バージョンなどの問題で現在の「Google Colaboratory」では動作しない

参考にしたと思われる「First Order Motion Model」のオフィシャルサイトにある「Colab Demo」は動かすことができた。
　→ First Order Motion Model for Image Animation
この環境でタイトルのサイトを参考に検証を進める

論文「First Order Motion Model for Image Animation」
・https://papers.nips.cc/paper_files/paper/2019/file/31c0b36aef265d9221af80872ceb62f9-Paper.pdf
・https://aliaksandrsiarohin.github.io/first-order-model-website/

↑

Google Colaboratory に実行環境を作成 †

オフィシャルサイトの Colab Demo を開く
『first-order-model-demo』の Google Colab が開くので「ファイル」メニューから「ドライブにコピーを保存」を選択
『first-order-model-demo のコピー』のタイトルで開いた Google Colab のページで以降の操作を行う
データファイルをダウンロードして解凍した「/work」を「/anaconda_win/work」に上書き配置（「update/work/FOMM/」を使用する）
　update_20231117.zip (18.3MB) <アップデート・データ>
データファイルをダウンロードして解凍した「/work」を「/anaconda_win/work」に上書き配置
　update_20240117.zip (9.93MB) <アップデート・データ>

↑

デモの起動 †

以下のセルを実行する ①（実行時間 12秒）

%%capture
%pip install ffmpeg-python imageio-ffmpeg
!git init .
!git remote add origin https://github.com/AliaksandrSiarohin/first-order-model
!git pull origin master
!git clone https://github.com/graphemecluster/first-order-model-demo demo

・「Google Colab」にプロジェクトを配置する

以下のセルを実行する ②（実行時間 17秒）

import IPython.display
import PIL.Image
import cv2
import ffmpeg
import imageio
    :                   途中省略　詳しくは下記 ▼ を押す
    :
loading.layout.display = 'none'
complete.layout.display = 'none'
select_image('00')
select_video('0')

▼　セルのコード全体

import IPython.display
import PIL.Image
import cv2
import ffmpeg
import imageio
import io
import ipywidgets
import numpy
import os.path
import requests
import skimage.transform
import warnings
from base64 import b64encode
from demo import load_checkpoints, make_animation  # type: ignore (local file)
from google.colab import files, output
from IPython.display import HTML, Javascript
from shutil import copyfileobj
from skimage import img_as_ubyte
from tempfile import NamedTemporaryFile
from tqdm.auto import tqdm
warnings.filterwarnings("ignore")
os.makedirs("user", exist_ok=True)

display(HTML("""
<style>
.widget-box > * {
    flex-shrink: 0;
}
.widget-tab {
    min-width: 0;
    flex: 1 1 auto;
}
.widget-tab .p-TabBar-tabLabel {
    font-size: 15px;
}
.widget-upload {
    background-color: tan;
}
.widget-button {
    font-size: 18px;
    width: 160px;
    height: 34px;
    line-height: 34px;
}
.widget-dropdown {
    width: 250px;
}
.widget-checkbox {
    width: 650px;
}
.widget-checkbox + .widget-checkbox {
    margin-top: -6px;
}
.input-widget .output_html {
    text-align: center;
    width: 266px;
    height: 266px;
    line-height: 266px;
    color: lightgray;
    font-size: 72px;
}
.title {
    font-size: 20px;
    font-weight: bold;
    margin: 12px 0 6px 0;
}
.warning {
    display: none;
    color: red;
    margin-left: 10px;
}
.warn {
    display: initial;
}
.resource {
    cursor: pointer;
    border: 1px solid gray;
    margin: 5px;
    width: 160px;
    height: 160px;
    min-width: 160px;
    min-height: 160px;
    max-width: 160px;
    max-height: 160px;
    -webkit-box-sizing: initial;
    box-sizing: initial;
}
.resource:hover {
    border: 6px solid crimson;
    margin: 0;
}
.selected {
    border: 6px solid seagreen;
    margin: 0;
}
.input-widget {
    width: 266px;
    height: 266px;
    border: 1px solid gray;
}
.input-button {
    width: 268px;
    font-size: 15px;
    margin: 2px 0 0;
}
.output-widget {
    width: 256px;
    height: 256px;
    border: 1px solid gray;
}
.output-button {
    width: 258px;
    font-size: 15px;
    margin: 2px 0 0;
}
.uploaded {
    width: 256px;
    height: 256px;
    border: 6px solid seagreen;
    margin: 0;
}
.label-or {
    align-self: center;
    font-size: 20px;
    margin: 16px;
}
.loading {
    align-items: center;
    width: fit-content;
}
.loader {
    margin: 32px 0 16px 0;
    width: 48px;
    height: 48px;
    min-width: 48px;
    min-height: 48px;
    max-width: 48px;
    max-height: 48px;
    border: 4px solid whitesmoke;
    border-top-color: gray;
    border-radius: 50%;
    animation: spin 1.8s linear infinite;
}
.loading-label {
    color: gray;
}
.video {
    margin: 0;
}
.comparison-widget {
    width: 256px;
    height: 256px;
    border: 1px solid gray;
    margin-left: 2px;
}
.comparison-label {
    color: gray;
    font-size: 14px;
    text-align: center;
    position: relative;
    bottom: 3px;
}
@keyframes spin {
    from { transform: rotate(0deg); }
    to { transform: rotate(360deg); }
}
</style>
"""))

def thumbnail(file):
    return imageio.get_reader(file, mode='I', format='FFMPEG').get_next_data()

def create_image(i, j):
    image_widget = ipywidgets.Image.from_file('demo/images/%d%d.png' % (i, j))
    image_widget.add_class('resource')
    image_widget.add_class('resource-image')
    image_widget.add_class('resource-image%d%d' % (i, j))
    return image_widget

def create_video(i):
    video_widget = ipywidgets.Image(
        value=cv2.imencode('.png', cv2.cvtColor(thumbnail('demo/videos/%d.mp4' % i), cv2.COLOR_RGB2BGR))[1].tostring(),
        format='png'
    )
    video_widget.add_class('resource')
    video_widget.add_class('resource-video')
    video_widget.add_class('resource-video%d' % i)
    return video_widget

def create_title(title):
    title_widget = ipywidgets.Label(title)
    title_widget.add_class('title')
    return title_widget

def download_output(button):
    complete.layout.display = 'none'
    loading.layout.display = ''
    files.download('output.mp4')
    loading.layout.display = 'none'
    complete.layout.display = ''

def convert_output(button):
    complete.layout.display = 'none'
    loading.layout.display = ''
    ffmpeg.input('output.mp4').output('scaled.mp4', vf='scale=1080x1080:flags=lanczos,pad=1920:1080:420:0').overwrite_output().run()
    files.download('scaled.mp4')
    loading.layout.display = 'none'
    complete.layout.display = ''

def back_to_main(button):
    complete.layout.display = 'none'
    main.layout.display = ''

label_or = ipywidgets.Label('or')
label_or.add_class('label-or')

image_titles = ['Peoples', 'Cartoons', 'Dolls', 'Game of Thrones', 'Statues']
image_lengths = [8, 4, 8, 9, 4]

image_tab = ipywidgets.Tab()
image_tab.children = [ipywidgets.HBox([create_image(i, j) for j in range(length)]) for i, length in enumerate(image_lengths)]
for i, title in enumerate(image_titles):
    image_tab.set_title(i, title)

input_image_widget = ipywidgets.Output()
input_image_widget.add_class('input-widget')
upload_input_image_button = ipywidgets.FileUpload(accept='image/*', button_style='primary')
upload_input_image_button.add_class('input-button')
image_part = ipywidgets.HBox([
    ipywidgets.VBox([input_image_widget, upload_input_image_button]),
    label_or,
    image_tab
])

video_tab = ipywidgets.Tab()
video_tab.children = [ipywidgets.HBox([create_video(i) for i in range(5)])]
video_tab.set_title(0, 'All Videos')

input_video_widget = ipywidgets.Output()
input_video_widget.add_class('input-widget')
upload_input_video_button = ipywidgets.FileUpload(accept='video/*', button_style='primary')
upload_input_video_button.add_class('input-button')
video_part = ipywidgets.HBox([
    ipywidgets.VBox([input_video_widget, upload_input_video_button]),
    label_or,
    video_tab
])

model = ipywidgets.Dropdown(
    description="Model:",
    options=[
        'vox',
        'vox-adv',
        'taichi',
        'taichi-adv',
        'nemo',
        'mgif',
        'fashion',
        'bair'
    ]
)
warning = ipywidgets.HTML('<b>Warning:</b> Upload your own images and videos (see README)')
warning.add_class('warning')
model_part = ipywidgets.HBox([model, warning])

relative = ipywidgets.Checkbox(description="Relative keypoint displacement (Inherit object proporions from the video)", value=True)
adapt_movement_scale = ipywidgets.Checkbox(description="Adapt movement scale (Don’t touch unless you know want you are doing)", value=True)
generate_button = ipywidgets.Button(description="Generate", button_style='primary')
main = ipywidgets.VBox([
    create_title('Choose Image'),
    image_part,
    create_title('Choose Video'),
    video_part,
    create_title('Settings'),
    model_part,
    relative,
    adapt_movement_scale,
    generate_button
])

loader = ipywidgets.Label()
loader.add_class("loader")
loading_label = ipywidgets.Label("This may take several minutes to process…")
loading_label.add_class("loading-label")
progress_bar = ipywidgets.Output()
loading = ipywidgets.VBox([loader, loading_label, progress_bar])
loading.add_class('loading')

output_widget = ipywidgets.Output()
output_widget.add_class('output-widget')
download = ipywidgets.Button(description='Download', button_style='primary')
download.add_class('output-button')
download.on_click(download_output)
convert = ipywidgets.Button(description='Convert to 1920×1080', button_style='primary')
convert.add_class('output-button')
convert.on_click(convert_output)
back = ipywidgets.Button(description='Back', button_style='primary')
back.add_class('output-button')
back.on_click(back_to_main)

comparison_widget = ipywidgets.Output()
comparison_widget.add_class('comparison-widget')
comparison_label = ipywidgets.Label('Comparison')
comparison_label.add_class('comparison-label')
complete = ipywidgets.HBox([
    ipywidgets.VBox([output_widget, download, convert, back]),
    ipywidgets.VBox([comparison_widget, comparison_label])
])

display(ipywidgets.VBox([main, loading, complete]))
display(Javascript("""
var images, videos;
function deselectImages() {
    images.forEach(function(item) {
        item.classList.remove("selected");
    });
}
function deselectVideos() {
    videos.forEach(function(item) {
        item.classList.remove("selected");
    });
}
function invokePython(func) {
    google.colab.kernel.invokeFunction("notebook." + func, [].slice.call(arguments, 1), {});
}
setTimeout(function() {
    (images = [].slice.call(document.getElementsByClassName("resource-image"))).forEach(function(item) {
        item.addEventListener("click", function() {
            deselectImages();
            item.classList.add("selected");
            invokePython("select_image", item.className.match(/resource-image(\d\d)/)[1]);
        });
    });
    images[0].classList.add("selected");
    (videos = [].slice.call(document.getElementsByClassName("resource-video"))).forEach(function(item) {
        item.addEventListener("click", function() {
            deselectVideos();
            item.classList.add("selected");
            invokePython("select_video", item.className.match(/resource-video(\d)/)[1]);
        });
    });
    videos[0].classList.add("selected");
}, 1000);
"""))

selected_image = None
def select_image(filename):
    global selected_image
    selected_image = resize(PIL.Image.open('demo/images/%s.png' % filename).convert("RGB"))
    input_image_widget.clear_output(wait=True)
    with input_image_widget:
        display(HTML('Image'))
    input_image_widget.remove_class('uploaded')
output.register_callback("notebook.select_image", select_image)

selected_video = None
def select_video(filename):
    global selected_video
    selected_video = 'demo/videos/%s.mp4' % filename
    input_video_widget.clear_output(wait=True)
    with input_video_widget:
        display(HTML('Video'))
    input_video_widget.remove_class('uploaded')
output.register_callback("notebook.select_video", select_video)

def resize(image, size=(256, 256)):
    w, h = image.size
    d = min(w, h)
    r = ((w - d) // 2, (h - d) // 2, (w + d) // 2, (h + d) // 2)
    return image.resize(size, resample=PIL.Image.LANCZOS, box=r)

def upload_image(change):
    global selected_image
    for name, file_info in upload_input_image_button.value.items():
        content = file_info['content']
    if content is not None:
        selected_image = resize(PIL.Image.open(io.BytesIO(content)).convert("RGB"))
        input_image_widget.clear_output(wait=True)
        with input_image_widget:
            display(selected_image)
        input_image_widget.add_class('uploaded')
        display(Javascript('deselectImages()'))
upload_input_image_button.observe(upload_image, names='value')

def upload_video(change):
    global selected_video
    for name, file_info in upload_input_video_button.value.items():
        content = file_info['content']
    if content is not None:
        selected_video = 'user/' + name
        with open(selected_video, 'wb') as video:
            video.write(content)
        preview = resize(PIL.Image.fromarray(thumbnail(selected_video)).convert("RGB"))
        input_video_widget.clear_output(wait=True)
        with input_video_widget:
            display(preview)
        input_video_widget.add_class('uploaded')
        display(Javascript('deselectVideos()'))
upload_input_video_button.observe(upload_video, names='value')

def change_model(change):
    if model.value.startswith('vox'):
        warning.remove_class('warn')
    else:
        warning.add_class('warn')
model.observe(change_model, names='value')

def generate(button):
    main.layout.display = 'none'
    loading.layout.display = ''
    filename = model.value + ('' if model.value == 'fashion' else '-cpk') + '.pth.tar'
    if not os.path.isfile(filename):
        response = requests.get('https://github.com/graphemecluster/first-order-model-demo/releases/download/checkpoints/' + filename, stream=True)
        with progress_bar:
            with tqdm.wrapattr(response.raw, 'read', total=int(response.headers.get('Content-Length', 0)), unit='B', unit_scale=True, unit_divisor=1024) as raw:
                with open(filename, 'wb') as file:
                    copyfileobj(raw, file)
        progress_bar.clear_output()
    reader = imageio.get_reader(selected_video, mode='I', format='FFMPEG')
    fps = reader.get_meta_data()['fps']
    driving_video = []
    for frame in reader:
        driving_video.append(frame)
    generator, kp_detector = load_checkpoints(config_path='config/%s-256.yaml' % model.value, checkpoint_path=filename)
    with progress_bar:
        predictions = make_animation(
            skimage.transform.resize(numpy.asarray(selected_image), (256, 256)),
            [skimage.transform.resize(frame, (256, 256)) for frame in driving_video],
            generator,
            kp_detector,
            relative=relative.value,
            adapt_movement_scale=adapt_movement_scale.value
        )
    progress_bar.clear_output()
    imageio.mimsave('output.mp4', [img_as_ubyte(frame) for frame in predictions], fps=fps)
    try:
        with NamedTemporaryFile(suffix='.mp4') as output:
            ffmpeg.output(ffmpeg.input('output.mp4').video, ffmpeg.input(selected_video).audio, output.name, c='copy').run()
            with open('output.mp4', 'wb') as result:
                copyfileobj(output, result)
    except ffmpeg.Error:
        pass
    output_widget.clear_output(True)
    with output_widget:
        video_widget = ipywidgets.Video.from_file('output.mp4', autoplay=False, loop=False)
        video_widget.add_class('video')
        video_widget.add_class('video-left')
        display(video_widget)
    comparison_widget.clear_output(True)
    with comparison_widget:
        video_widget = ipywidgets.Video.from_file(selected_video, autoplay=False, loop=False, controls=False)
        video_widget.add_class('video')
        video_widget.add_class('video-right')
        display(video_widget)
    display(Javascript("""
    setTimeout(function() {
        (function(left, right) {
            left.addEventListener("play", function() {
                right.play();
            });
            left.addEventListener("pause", function() {
                right.pause();
            });
            left.addEventListener("seeking", function() {
                right.currentTime = left.currentTime;
            });
            right.muted = true;
        })(document.getElementsByClassName("video-left")[0], document.getElementsByClassName("video-right")[0]);
    }, 1000);
    """))
    loading.layout.display = 'none'
    complete.layout.display = ''

generate_button.on_click(generate)

loading.layout.display = 'none'
complete.layout.display = 'none'
select_image('00')
select_video('0')

・サンプルデータ・学習済みモデルなどをダウンロードし、セルの実行が終了するとセルの下に次の GUI の実行パネルが表示される

↑

デモの動かし方 †

セルの下に GUI パネルが表示される

入力画像(Choose Image) を指定する
・あらかじめ用意されている画像は「People/Cartoons/Dolls/Games of Thrones/Status」のタブで選択可能
・任意の画像を使いたい場合は「Upload」ボタンを押して、ローカル PC から画像ファイルを指定する

・用意されているサンプル画像（「demo/images/」フォルダ内）
入力動画(Choose Video) を指定する
・あらかじめ用意されている動画から選択できる
・任意の画像を使いたい場合は「Upload」ボタンを押して、ローカル PC から画像ファイルを指定する

・用意されているサンプル動画（「demo/images/」フォルダ内）
実行(Generate) を押す
・しばらくすると生成した動画(左) と元の動画(右) が表示される

・「Download」ボタンを押すと生成された 256x256 サイズの生成動画をダウンロードする
・「Convert to 1920x1080」ボタンを押すと生成された 1920x1080 サイズに変換した生成動画をダウンロードする
・「Back」ボタンを押すと 1.入力画像の指定画面に戻る

モナリザや有村架純さんがトランプ大統領のように動く

↑

生成された動画を編集 †

サイト(cedro-blog) にある生成画像の動画編集をやってみる
データファイルをダウンロードして解凍する（解凍した「update/work/FOMM/」を使用する
　update_20231117.zip (18.3MB) <アップデート・データ>

Googleドライブに接続する
・最後のセルに以下のセルを追加実行する
```
from google.colab import drive
drive.mount('/content/drive')
```
・Googleアカウント認証ダイアログが出る場合はログインする

・しばらくして Googleドライブに接続すると左側のファイルビューアに「drive」フォルダができる

・「drive」フォルダ内の「data」フォルダ(なければ作成する) でファイルのやり取りをすることにする
動画ファイルをリサイズする（入力動画を 256x256 サイズにする）
・以下のセルを実行する
```
!ffmpeg.exe -i drive/MyDrive/data/h1_video.jpg -s 256:256 -q 2 h1_video_256.jpg
```

静止画、動画フレーム、生成動画フレームを表示する
・以下のセルを実行する (あらかじめ drive/MyDrive/data/フォルダ内に動画ファイルをアップロードおく)

# 静止画 / 動画 / 生成動画 の表示
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from skimage.transform import resize
from IPython.display import HTML
import warnings
warnings.filterwarnings("ignore")
 
def display(source, driving, generated=None):
    fig = plt.figure(figsize=(8 + 4 * (generated is not None), 4), dpi=64)
    fig.subplots_adjust(left=0, right=1, bottom=0, top=1)

    ims = []
    for i in range(len(driving)):
        cols = [source]
        cols.append(driving[i])
        if generated is not None:
            cols.append(generated[i])
        im = plt.imshow(np.concatenate(cols, axis=1), animated=True)
        plt.axis('off')
        ims.append([im])
 
    ani = animation.ArtistAnimation(fig, ims, interval=33, repeat_delay=1000)
    plt.close()
    return ani

source_image = imageio.imread('demo/images/41.png')
driving_video = imageio.mimread('demo/videos/1.mp4')
generated_video = imageio.mimread('drive/MyDrive/data/result/41_1_output.mp4')

# 256x256 サイズに統一
source_image = resize(source_image, (256, 256))[..., :3]
driving_video = [resize(frame, (256, 256))[..., :3] for frame in driving_video]
generated_video = [resize(frame, (256, 256))[..., :3] for frame in generated_video]
ani = display(source_image, driving_video, generated_video)
ani.save('output.gif')
HTML(ani.to_html5_video())

・上記の場合、出力画像は「output.gif」で保存（必要に応じてダウンロードする）

生成動画(256x256) 6枚を 1つの動画にする
・以下のセルを実行する (あらかじめ drive/MyDrive/data/フォルダ内に動画ファイルをアップロードおく)

## 6枚の静止画の生成動画フレームを連結
import imageio
import numpy as np
import matplotlib.animation as animation
import matplotlib.pyplot as plt
from IPython.display import HTML
from tqdm import trange

result = []
result.append(imageio.mimread('/content/drive/MyDrive/data/result/2.mp4'))
result.append(imageio.mimread('/content/drive/MyDrive/data/result/mon_output.mp4'))
result.append(imageio.mimread('/content/drive/MyDrive/data/result/statue-02_output.mp4'))
result.append(imageio.mimread('/content/drive/MyDrive/data/result/okegawa_m1_output.mp4'))
result.append(imageio.mimread('/content/drive/MyDrive/data/result/nitta_m1_output.mp4'))
result.append(imageio.mimread('/content/drive/MyDrive/data/result/yaoi_m1_output.mp4'))

fig = plt.figure(figsize=(12, 8), dpi=64)
fig.subplots_adjust(left=0, right=1, bottom=0, top=1)
ims =[]
for i in trange(len(result[0])):
    x = np.concatenate([result[0][i], result[1][i], result[2][i]],1)
    y = np.concatenate([result[3][i], result[4][i], result[5][i]],1)
    z = np.concatenate([x, y])
    im = plt.imshow(z, animated=True)
    plt.axis('off')
    ims.append([im])

print('making animeation...')
ani = animation.ArtistAnimation(fig, ims, interval=33, repeat_delay=1000)
plt.close()
ani.save('outvideo6.mp4')
HTML(ani.to_html5_video())

使用したカスタム入力画像

↑

生成される画像例 †

上のビデオから下の動画を生成（音声は手動で合成）

↑

編集の終了・再接続後の実行 †

編集を終えるときは Colab「ランタイム」→「ランタイムを接続解除して削除」を選択する
・GPU 占有時間を少なくするためすべての実行作業が終了した場合は接続解除しておくことが望ましい
・接続解除して削除を実行しても、ノートブック上の実行結果はそのまま残る
再接続の場合は上記のデモの起動からもう一度実行同じ手順をする

↑

動画編集 Tips †

↑

動画ファイルの一部を切り取る †

「ffmpeg」コマンドを使用する（例：入力動画の 40,100 から 256x256 を切り取る）
・GoogleColab で下記のコマンドを実行する
```
!ffmpeg -i drive/MyDrive/data/h1_video.mp4 -ss 00:00:00.00 -t 00:00:02 -filter:v "crop=256:256:40:100" -async 1 h1_video_256.mp4
```
・引数
- -i：動画ファイルのパス
- -ss：動画のクロップスタート時刻(時間：分：秒：1/100秒)
- -t：クロップ動画の長さ(時間：分：秒)
- -filter:v “crop=XX:XX:XX:XX”：(クロップ画像の横サイズ:クロップ画像の縦サイズ:クロップ画像の左上角の横位置:クロップ画像の左上角の縦位置)
  ＊クロップ画像の左上角の横位置・縦位置は、動画の左上角を(0,0)として計算した位置
- -async 1：クロップ画像保存ファイルパス

↑

動画に音を追加する †

「ffmpeg」コマンドを使用する（例：動画ファイル「drive/MyDrive/data/result/outvideo6_17.mp4」音声ファイル「drive/MyDrive/data/result/hi_audio.mp3」）
・GoogleColab で下記のコマンドを実行する

!ffmpeg -i drive/MyDrive/data/result/outvideo6_17.mp4 -i drive/MyDrive/data/result/hi_audio.mp3 -c:v copy -c:a mp3 -map 0:v:0 -map 1:a:0 outvideo6_17_oudio.mp4

▼　- log -　GoogleColab Tesla T4

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'drive/MyDrive/data/result/outvideo6_17.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.76.100
  Duration: 00:00:02.90, start: 0.000000, bitrate: 999 kb/s
  Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 768x512, 994 kb/s, 30.30 fps, 30.30 tbr, 16k tbn, 60.61 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
Input #1, mp3, from 'drive/MyDrive/data/result/hi_audio.mp3':
  Metadata:
    encoder         : Lavf58.76.100
  Duration: 00:00:02.95, start: 0.025057, bitrate: 128 kb/s
  Stream #1:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s
    Metadata:
      encoder         : Lavc58.13
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (mp3 (mp3float) -> mp3 (libmp3lame))
Press [q] to stop, [?] for help
Output #0, mp4, to 'outvideo6_17_oudio.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.76.100
  Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 768x512, q=2-31, 994 kb/s, 30.30 fps, 30.30 tbr, 16k tbn, 16k tbc (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1: Audio: mp3 (mp4a / 0x6134706D), 44100 Hz, stereo, fltp
    Metadata:
      encoder         : Lavc58.134.100 libmp3lame
[libmp3lame @ 0x5583828ba480] Trying to remove 1152 samples, but the queue is empty
frame=   88 fps=0.0 q=-1.0 Lsize=     403kB time=00:00:02.89 bitrate=1137.8kbits/s speed=42.1x    
video:353kB audio:46kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 1.012550%

・この方法で合成した mp4 ファイルは Webサーバーによる配信では表示できない（コーデック未対応？）

下記 Webサービスの利用が簡単！
→ ビデオに音楽を追加 - 無料でビデオにオーディオを追加

↑

動画のサイズを変更する †

「ffmpeg」コマンドを使用する
・GoogleColab で下記のコマンドを実行する（例：入力動画を 256x256 サイズに変換）

!ffmpeg -i drive/MyDrive/data/h1_video.mp4 -s 256:256 -q 2 h1_video_256.mp4

▼　- log -　GoogleColab Tesla T4

ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'drive/MyDrive/data/h1_video.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.29.100
  Duration: 00:00:02.94, start: 0.000000, bitrate: 1707 kb/s
  Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1024x1024, 1703 kb/s, 29.97 fps, 29.97 tbr, 11988 tbn, 59.94 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[libx264 @ 0x585e4d3a4fc0] -qscale is ignored, -crf is recommended.
[libx264 @ 0x585e4d3a4fc0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512
[libx264 @ 0x585e4d3a4fc0] profile High, level 1.3, 4:2:0, 8-bit
[libx264 @ 0x585e4d3a4fc0] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'h1_video_256.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.76.100
  Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(tv, progressive), 256x256, q=2-31, 29.97 fps, 11988 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc58.134.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame=   88 fps= 60 q=-1.0 Lsize=      61kB time=00:00:02.83 bitrate= 176.2kbits/s speed=1.94x    
video:59kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 3.105508%
[libx264 @ 0x585e4d3a4fc0] frame I:1     Avg QP:23.12  size:  4147
[libx264 @ 0x585e4d3a4fc0] frame P:27    Avg QP:24.33  size:  1444
[libx264 @ 0x585e4d3a4fc0] frame B:60    Avg QP:27.80  size:   280
[libx264 @ 0x585e4d3a4fc0] consecutive B-frames:  4.5%  9.1% 13.6% 72.7%
[libx264 @ 0x585e4d3a4fc0] mb I  I16..4: 14.5% 69.9% 15.6%
[libx264 @ 0x585e4d3a4fc0] mb P  I16..4:  2.1%  2.2%  0.3%  P16..4: 51.5% 24.5% 10.0%  0.0%  0.0%    skip: 9.4%
[libx264 @ 0x585e4d3a4fc0] mb B  I16..4:  0.0%  0.0%  0.0%  B16..8: 45.6%  3.7%  0.4%  direct: 0.5%  skip:49.8%  L0:39.0% L1:55.6% BI: 5.4%
[libx264 @ 0x585e4d3a4fc0] 8x8 transform intra:57.2% inter:80.3%
[libx264 @ 0x585e4d3a4fc0] coded y,uvDC,uvAC intra: 50.6% 83.2% 29.4% inter: 13.2% 15.5% 0.2%
[libx264 @ 0x585e4d3a4fc0] i16 v,h,dc,p: 15% 24%  9% 52%
[libx264 @ 0x585e4d3a4fc0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 11% 14% 21%  9%  7%  9%  6% 13%  9%
[libx264 @ 0x585e4d3a4fc0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 22% 12% 11%  6% 13% 16%  7%  7%  6%
[libx264 @ 0x585e4d3a4fc0] i8c dc,h,v,p: 52% 16% 20% 12%
[libx264 @ 0x585e4d3a4fc0] Weighted P-Frames: Y:22.2% UV:7.4%
[libx264 @ 0x585e4d3a4fc0] ref P L0: 59.5% 16.0% 17.5%  5.8%  1.2%
[libx264 @ 0x585e4d3a4fc0] ref B L0: 93.7%  5.4%  0.9%
[libx264 @ 0x585e4d3a4fc0] ref B L1: 96.4%  3.6%
[libx264 @ 0x585e4d3a4fc0] kb/s:163.24

↑

YouTube 動画をダウンロードする †

パッケージをインストール

(py38_gan) PS > pip install yt-dlp

▼　- log -「pip install yt-dlp」'

(py38_gan) PS > pip install yt-dlp
Collecting yt-dlp
  Downloading yt_dlp-2023.12.30-py2.py3-none-any.whl.metadata (160 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 160.7/160.7 kB 2.4 MB/s eta 0:00:00
Collecting mutagen (from yt-dlp)
  Downloading mutagen-1.47.0-py3-none-any.whl.metadata (1.7 kB)
Collecting pycryptodomex (from yt-dlp)
  Downloading pycryptodomex-3.20.0-cp35-abi3-win_amd64.whl.metadata (3.4 kB)
Requirement already satisfied: certifi in c:\users\izuts\anaconda3\envs\py38_gan\lib\site-packages (from yt-dlp) (2023.11.17)
Requirement already satisfied: requests<3,>=2.31.0 in c:\users\izuts\anaconda3\envs\py38_gan\lib\site-packages (from yt-dlp) (2.31.0)
Requirement already satisfied: urllib3<3,>=1.26.17 in c:\users\izuts\anaconda3\envs\py38_gan\lib\site-packages (from yt-dlp) (2.1.0)
Collecting websockets>=12.0 (from yt-dlp)
  Downloading websockets-12.0-cp38-cp38-win_amd64.whl.metadata (6.8 kB)
Collecting brotli (from yt-dlp)
  Downloading Brotli-1.1.0-cp38-cp38-win_amd64.whl.metadata (5.6 kB)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\izuts\anaconda3\envs\py38_gan\lib\site-packages (from requests<3,>=2.31.0->yt-dlp) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in c:\users\izuts\anaconda3\envs\py38_gan\lib\site-packages (from requests<3,>=2.31.0->yt-dlp) (2.10)
Downloading yt_dlp-2023.12.30-py2.py3-none-any.whl (3.0 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.0/3.0 MB 15.8 MB/s eta 0:00:00
Downloading websockets-12.0-cp38-cp38-win_amd64.whl (124 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 125.0/125.0 kB 7.2 MB/s eta 0:00:00
Downloading Brotli-1.1.0-cp38-cp38-win_amd64.whl (357 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 357.3/357.3 kB 23.1 MB/s eta 0:00:00
Downloading mutagen-1.47.0-py3-none-any.whl (194 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.4/194.4 kB ? eta 0:00:00
Downloading pycryptodomex-3.20.0-cp35-abi3-win_amd64.whl (1.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 22.2 MB/s eta 0:00:00
Installing collected packages: brotli, websockets, pycryptodomex, mutagen, yt-dlp
Successfully installed brotli-1.1.0 mutagen-1.47.0 pycryptodomex-3.20.0 websockets-12.0 yt-dlp-2023.12.30

ソースコード「ytb_down.py」

# YouTube ダウンロード
#   2023.11.18
# ytb_down.py
# ffmpeg yt-dlp
#    test:    opt = ['https://www.youtube.com/watch?v=7G0ovtPqHnI']
#
# 【簡単便利】pythonからYouTubeを高画質でdownloadするぞ！by Windows
# https://resanaplaza.com/2023/04/02/%E3%80%90%E7%B0%A1%E5%8D%98%E4%BE%BF%E5%88%A9%E3%80%91python%E3%81%8B%E3%82%89youtube%E3%82%92%E9%AB%98%E7%94%BB%E8%B3%AA%E3%81%A7download%E3%81%99%E3%82%8B%E3%81%9E%EF%BC%81by-windows/

from yt_dlp import YoutubeDL

def ytb_download(option):
    ydl = YoutubeDL()
    result = ydl.download(option)
    return result

if __name__ == "__main__":
    import sys

    opt = []
    args = sys.argv
    for n in range(len(args)):
        print(n, args[n])
        if n >= 1:
            opt.append(args[n])

    if len(opt) == 0:
        print('Please input potion parameter !!')

    print(opt)

    result = ytb_download(opt)

ダウンロードする YouTube 動画のURL を右クリックでコピーしておく

コマンドラインにペーストして実行する（ローカルマシン py38a 仮想環境）
・コマンド実行例

(py38_gan) PS > python ytb_down.py 'https://youtu.be/Fs7ImkqiW7E'
0 .\ytb_down.py
1 https://youtu.be/Fs7ImkqiW7E
['https://youtu.be/Fs7ImkqiW7E']
[youtube] Extracting URL: https://youtu.be/Fs7ImkqiW7E
[youtube] Fs7ImkqiW7E: Downloading webpage
[youtube] Fs7ImkqiW7E: Downloading ios player API JSON
[youtube] Fs7ImkqiW7E: Downloading android player API JSON
[youtube] Fs7ImkqiW7E: Downloading m3u8 information
[info] Fs7ImkqiW7E: Downloading 1 format(s): 22
[download] Destination: 【オープンハウス】TVCM「夢」篇　30秒 [Fs7ImkqiW7E].mp4
[download] 100% of    4.68MiB in 00:00:01 at 4.52MiB/s

↑