BlendGAN

顔画像に様々なスタイルをブレンドする：BlendGAN †

　１つのモデルで顔画像の様々なスタイル合成を可能にする「BlendGAN」を検証する

顔画像に様々なスタイルをブレンドする：BlendGAN
参考資料

※ 最終更新:2023/12/09　

↑

サイト『BlendGANで、顔画像に様々なスタイルをブレンドする』の検証] †

↑

概要 †

これまでの顔画像のスタイル合成は StyleGAN を使ってレイヤー交換を行うのが一般的だが、この技術では合成したいスタイルの種類だけモデルを用意する必要がる。「BlendGAN」は、１つのモデルで顔画像の様々なスタイル合成を可能にする技術

BlendGAN 概念図

オフィシャルサイト → BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation
論文 → BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation

↑

Google Colaboratory に実行環境を作成 †

上記サイト作者のデモサイトを開き「Open in Colab」① ボタンを押す
『BlendGAN』の Google Colab が開くので「ファイル」メニューから「ドライブにコピーを保存」を選択
『BlendGAN のコピー』のタイトルで開いた Google Colab のページで以降の操作を行う
データファイルをダウンロードして解凍する（解凍した「update/work/BlendGAN/」を使用する
　update_20231209.zip (115MB) <アップデート・データ>

↑

環境設定 †

以下のセルを実行する ①（実行時間 1分59秒）

#@title セットアップ

# githubからコードを取得
! git clone https://github.com/cedro3/BlendGAN.git
%cd BlendGAN

# ninjaインストール
!wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip
!sudo unzip ninja-linux.zip -d /usr/local/bin/
!sudo update-alternatives --install /usr/bin/ninja ninja /usr/local/bin/ninja 1 --force

# 学習済みパラメータのダウンロード
! pip install --upgrade gdown
import gdown
gdown.download('https://drive.google.com/uc?id=1D27HPNOSx9kWIhc13VevRy0pUv_xYiJb', './pretrained_models/blendgan.pt', quiet=False)
gdown.download('https://drive.google.com/uc?id=1pWWSm_c75ieMExJPWJuYA1wby-hm4f1J', './pretrained_models/psp_encoder.pt', quiet=False)
gdown.download('https://drive.google.com/uc?id=1qshfqj8SdmgQv_kfLpiohbI3QPQF-OE5', './pretrained_models/style_encoder.pt', quiet=False)

# ランドマークデータのダウンロード
! wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
! bzip2 -dk shape_predictor_68_face_landmarks.dat.bz2

# 画像の表示関数
import matplotlib.pyplot as plt
from PIL import Image
import os
import numpy as np

def display_pic(folder):
    fig = plt.figure(figsize=(30, 40))
    files = os.listdir(folder)
    files.sort()
    for i, file in enumerate(files):
        img = Image.open(folder+'/'+file)
        images = np.asarray(img)
        ax = fig.add_subplot(10, 10, i+1, xticks=[], yticks=[])
        image_plt = np.array(images)
        ax.imshow(image_plt)
        ax.set_xlabel(folder+'/'+file, fontsize=15)
    plt.show()
    plt.close()

▼　- log -　GoogleColab Tesla T4

Cloning into 'BlendGAN'...
remote: Enumerating objects: 137, done.
remote: Total 137 (delta 0), reused 0 (delta 0), pack-reused 137
Receiving objects: 100% (137/137), 54.75 MiB | 33.75 MiB/s, done.
Resolving deltas: 100% (24/24), done.
/content/BlendGAN
--2023-11-21 02:27:14--  https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip
Resolving github.com (github.com)... 140.82.113.3
Connecting to github.com (github.com)|140.82.113.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/1335132/d2f252e2-9801-11e7-9fbf-bc7b4e4b5c83?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20231121%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20231121T022714Z&X-Amz-Expires=300&X-Amz-Signature=0b87e043bde36d5fe2165c7bda79bb0b8a01c727160b9210e600b7864549d1a8&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=1335132&response-content-disposition=attachment%3B%20filename%3Dninja-linux.zip&response-content-type=application%2Foctet-stream [following]
--2023-11-21 02:27:14--  https://objects.githubusercontent.com/github-production-release-asset-2e65be/1335132/d2f252e2-9801-11e7-9fbf-bc7b4e4b5c83?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20231121%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20231121T022714Z&X-Amz-Expires=300&X-Amz-Signature=0b87e043bde36d5fe2165c7bda79bb0b8a01c727160b9210e600b7864549d1a8&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=1335132&response-content-disposition=attachment%3B%20filename%3Dninja-linux.zip&response-content-type=application%2Foctet-stream
Resolving objects.githubusercontent.com (objects.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to objects.githubusercontent.com (objects.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 77854 (76K) [application/octet-stream]
Saving to: ‘ninja-linux.zip’

ninja-linux.zip     100%[===================>]  76.03K  --.-KB/s    in 0.01s   

2023-11-21 02:27:14 (5.40 MB/s) - ‘ninja-linux.zip’ saved [77854/77854]

Archive:  ninja-linux.zip
  inflating: /usr/local/bin/ninja    
update-alternatives: using /usr/local/bin/ninja to provide /usr/bin/ninja (ninja) in auto mode
Requirement already satisfied: gdown in /usr/local/lib/python3.10/dist-packages (4.6.6)
Collecting gdown
  Downloading gdown-4.7.1-py3-none-any.whl (15 kB)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from gdown) (3.13.1)
Requirement already satisfied: requests[socks] in /usr/local/lib/python3.10/dist-packages (from gdown) (2.31.0)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from gdown) (1.16.0)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from gdown) (4.66.1)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.10/dist-packages (from gdown) (4.11.2)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.10/dist-packages (from beautifulsoup4->gdown) (2.5)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests[socks]->gdown) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests[socks]->gdown) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests[socks]->gdown) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests[socks]->gdown) (2023.7.22)
Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /usr/local/lib/python3.10/dist-packages (from requests[socks]->gdown) (1.7.1)
Installing collected packages: gdown
  Attempting uninstall: gdown
    Found existing installation: gdown 4.6.6
    Uninstalling gdown-4.6.6:
      Successfully uninstalled gdown-4.6.6
Successfully installed gdown-4.7.1
Downloading...
From (uriginal): https://drive.google.com/uc?id=1D27HPNOSx9kWIhc13VevRy0pUv_xYiJb
From (redirected): https://drive.google.com/uc?id=1D27HPNOSx9kWIhc13VevRy0pUv_xYiJb&confirm=t&uuid=8203d271-cdfe-498e-91fd-685fad673511
To: /content/BlendGAN/pretrained_models/blendgan.pt
100%|██████████| 3.28G/3.28G [00:43<00:00, 74.9MB/s]
Downloading...
From (uriginal): https://drive.google.com/uc?id=1pWWSm_c75ieMExJPWJuYA1wby-hm4f1J
From (redirected): https://drive.google.com/uc?id=1pWWSm_c75ieMExJPWJuYA1wby-hm4f1J&confirm=t&uuid=89f2bbd2-58fa-4b03-bb25-41289bb47d13
To: /content/BlendGAN/pretrained_models/psp_encoder.pt
100%|██████████| 1.07G/1.07G [00:17<00:00, 62.3MB/s]
Downloading...
From (uriginal): https://drive.google.com/uc?id=1qshfqj8SdmgQv_kfLpiohbI3QPQF-OE5
From (redirected): https://drive.google.com/uc?id=1qshfqj8SdmgQv_kfLpiohbI3QPQF-OE5&confirm=t&uuid=b2e8688c-8e59-43ae-a180-f894dd147b9e
To: /content/BlendGAN/pretrained_models/style_encoder.pt
100%|██████████| 1.33G/1.33G [00:22<00:00, 60.5MB/s]
--2023-11-21 02:28:51--  http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
Resolving dlib.net (dlib.net)... 107.180.26.78
Connecting to dlib.net (dlib.net)|107.180.26.78|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 64040097 (61M)
Saving to: ‘shape_predictor_68_face_landmarks.dat.bz2’

shape_predictor_68_ 100%[===================>]  61.07M  25.9MB/s    in 2.4s    

2023-11-21 02:28:54 (25.9 MB/s) - ‘shape_predictor_68_face_landmarks.dat.bz2’ saved [64040097/64040097]

セルの実行終了② 後、左サイドバーの「ファイルボタン」を押す
「BlendGAN」③ の下に「pic」」④ フォルダがあることを確認する

↑

顔画像の切り出し †

顔画像の切り出し
・用意した画像を使用する場合は、「BlendGAN/pic」フォルダへ、
　用意したスタイル画像を使用する場合は「BlendGAN/test_imgs/style_imgs」フォルダへアップロードする

・以下のセルを実行する（実行時間 2分）

#@title 顔画像の切り出し
import os
import shutil
from tqdm import tqdm

if os.path.isdir('align'):
     shutil.rmtree('align')
os.makedirs('align', exist_ok=True)

def run_alignment(image_path):
  import dlib
  from alignment import align_face
  predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
  aligned_image = align_face(filepath=image_path, predictor=predictor)
  return aligned_image

path = './pic'
files = sorted(os.listdir(path))
for i, file in enumerate(tqdm(files)):
  if file=='.ipynb_checkpoints':
     continue
  input_image = run_alignment(path+'/'+file)
  input_image.resize((1024,1024))
  input_image.save('./align/'+file)

display_pic('align')

↑

動画の生成 †

画像ファイル指定

・以下のセルを実行する（実行時間 2分）

#@title 画像ファイル指定
input = "66.jpg"#@param {type:"string"}
file = './align/'+input

# original_imagesフォルダーリセット
if os.path.isdir('test_imgs/original_imgs'):
    shutil.rmtree('test_imgs/original_imgs')
os.makedirs('test_imgs/original_imgs', exist_ok=True)

# original_imagesフォルダーへコピー
import shutil
shutil.copy(file, 'test_imgs/original_imgs/'+input)

Style transfer を実行し動画を作成
・以下のセルを実行する（実行時間 2分）

#@title Style transfer を実行し動画を作成

# style_transferフォルダーリセット
if os.path.isdir('results/style_transfer'):
    shutil.rmtree('results/style_transfer')

! python style_transfer_folder.py --size 1024 --ckpt ./pretrained_models/blendgan.pt --psp_encoder_ckpt ./pretrained_models/psp_encoder.pt --style_img_path ./test_imgs/style_imgs/ --input_img_path ./test_imgs/original_imgs/ --outdir results/style_transfer/

# imagesフォルダーリセット
import os
import shutil
if os.path.isdir('results/images'):
    shutil.rmtree('results/images')
os.makedirs('results/images', exist_ok=True)

# output.mp4リセット
if os.path.exists('./output.mp4'):
   os.remove('./output.mp4')

# 画像のリサイズ
import cv2
import glob
files = glob.glob('results/style_transfer/*.jpg')
files.sort()
for i, file in enumerate(files):
    img = cv2.imread(file)
    img_resize = cv2.resize(img, dsize=(1536, 512))
    cv2.imwrite('results/images/'+str(i).zfill(3)+'.jpg', img_resize)

# 画像を動画に変換
!ffmpeg -r 0.6 -i results/images/%3d.jpg -vcodec libx264 -pix_fmt yuv420p output.mp4

▼　- log -　GoogleColab Tesla T4

ckpt:  ./pretrained_models/blendgan.pt
/usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG19_Weights.IMAGENET1K_V1`. You can also use `weights=VGG19_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Loading psp encoders weights from irse50!
0
Done!
ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, image2, from 'results/images/%3d.jpg':
  Duration: 00:00:01.20, start: 0.000000, bitrate: N/A
  Stream #0:0: Video: mjpeg (Baseline), yuvj420p(pc, bt470bg/unknown/unknown), 1536x512 [SAR 1:1 DAR 3:1], 25 fps, 25 tbr, 25 tbn, 25 tbc
Stream mapping:
  Stream #0:0 -> #0:0 (mjpeg (native) -> h264 (libx264))
Press [q] to stop, [?] for help
[swscaler @ 0x59bf09c32280] deprecated pixel format used, make sure you did set range correctly
[libx264 @ 0x59bf09b7c6c0] using SAR=1/1
[libx264 @ 0x59bf09b7c6c0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2 AVX512
[libx264 @ 0x59bf09b7c6c0] profile High, level 3.1, 4:2:0, 8-bit
[libx264 @ 0x59bf09b7c6c0] 264 - core 163 r3060 5db6aa6 - H.264/MPEG-4 AVC codec - Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=3 lookahead_threads=1 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=1 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'output.mp4':
  Metadata:
    encoder         : Lavf58.76.100
  Stream #0:0: Video: h264 (avc1 / 0x31637661), yuv420p(tv, bt470bg/unknown/unknown, progressive), 1536x512 [SAR 1:1 DAR 3:1], q=2-31, 0.60 fps, 12288 tbn
    Metadata:
      encoder         : Lavc58.134.100 libx264
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
frame=   30 fps=7.8 q=-1.0 Lsize=    2742kB time=00:00:45.00 bitrate= 499.1kbits/s speed=11.7x    
video:2741kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.038767%
[libx264 @ 0x59bf09b7c6c0] frame I:1     Avg QP:13.54  size:179294
[libx264 @ 0x59bf09b7c6c0] frame P:27    Avg QP:14.48  size: 87391
[libx264 @ 0x59bf09b7c6c0] frame B:2     Avg QP:16.41  size:133490
[libx264 @ 0x59bf09b7c6c0] consecutive B-frames: 86.7% 13.3%  0.0%  0.0%
[libx264 @ 0x59bf09b7c6c0] mb I  I16..4:  3.7% 80.5% 15.8%
[libx264 @ 0x59bf09b7c6c0] mb P  I16..4:  2.0% 56.2%  8.2%  P16..4:  0.9%  0.2%  0.1%  0.0%  0.0%    skip:32.4%
[libx264 @ 0x59bf09b7c6c0] mb B  I16..4:  0.5% 44.4% 17.2%  B16..8:  1.7%  1.5%  0.6%  direct: 1.4%  skip:32.7%  L0:38.5% L1:44.1% BI:17.4%
[libx264 @ 0x59bf09b7c6c0] 8x8 transform intra:83.6% inter:82.7%
[libx264 @ 0x59bf09b7c6c0] coded y,uvDC,uvAC intra: 93.8% 91.7% 82.0% inter: 2.6% 3.2% 2.3%
[libx264 @ 0x59bf09b7c6c0] i16 v,h,dc,p: 35%  6%  7% 52%
[libx264 @ 0x59bf09b7c6c0] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 24% 11% 13%  8%  7% 11%  6% 14%  8%
[libx264 @ 0x59bf09b7c6c0] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 20% 11%  8%  9% 12% 13%  9% 12%  7%
[libx264 @ 0x59bf09b7c6c0] i8c dc,h,v,p: 44% 15% 27% 13%
[libx264 @ 0x59bf09b7c6c0] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 0x59bf09b7c6c0] ref P L0: 52.8%  6.8% 25.3% 15.1%
[libx264 @ 0x59bf09b7c6c0] ref B L0: 84.4% 15.6%
[libx264 @ 0x59bf09b7c6c0] kb/s:448.93

動画の再生
・以下のセルを実行する

#@title 動画の再生
from IPython.display import HTML
from base64 import b64encode

mp4 = open('./output.mp4', 'rb').read()
data_url = 'data:video/mp4;base64,' + b64encode(mp4).decode()
HTML(f"""
<video width="100%" height="100%" controls>
      <source src="{data_url}" type="video/mp4">
</video>""")

↑

編集の終了・再接続後の実行 †

編集を終えるときは Colab「ランタイム」→「ランタイムを接続解除して削除」を選択する
・GPU 占有時間を少なくするためすべての実行作業が終了した場合は接続解除しておくことが望ましい
・接続解除して削除を実行しても、ノートブック上の実行結果はそのまま残る
再接続の場合は上記の環境設定からもう一度実行同じ手順をする

↑