AI_Program2 のバックアップ(No.7) - PukiWiki

[ トップ ] [ 一覧 | 検索 | 履歴 | ログイン ]

私的AI研究会 > AI_Program2

生成 AI プログラミング２ == 編集中 == †

　これまで検証してきた結果をもとに、Python で生成 AI プログラムを書く

▲　目　次

生成 AI プログラミング２ == 編集中 ==
参考資料

※ 最終更新:2025/06/16　

diffusersではじめめる Stable Diffusion （応用編） †

　画像から画像を生成する　img2img

　参考サイト：diffusers（Stable Diffusion）による画像の改造／合成／変換／修正／拡大

Step 30：一番簡単な画像から画像生成プログラム †

img2img 画像から画像生成
モデルの種類基本画像サイズパイプライン作成オブジェクト

SD1.5 512x512 StableDiffusionImg2ImgPipeline

SDXL 1024x1024 StableDiffusionXLImg2ImgPipeline

アニメ風画像をリアル風に変更する
例：使用モデル　beautifulRealistic_brav5

「sd_030.py」　　元になる画像 StableDiffusion_247.png →

## sd_030.py　画像から画像生成（img2img ）
## model:   beautifulRealistic_brav5.safetensors

import torch
from PIL import Image
from diffusers import StableDiffusionImg2ImgPipeline,DPMSolverMultistepScheduler, logging
from translate import Translator

logging.set_verbosity_error()

# モデルフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors"
image_path = "images/StableDiffusion_247.png"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda'

# seed 値
seed = 12345678

# パイプラインを作成
pipeline = StableDiffusionImg2ImgPipeline.from_single_file(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

# スケジューラ設定
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '黒髪で短い髪の女性'
prompt = trans(prompt_jp)
src_image = Image.open(image_path)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 30,
                    guidance_scale = 7,
                    strength = 0.6,
                    generator = generator
                    ).images[0]

image.save("results/image_030.png")

プログラムを実行する（実行時間：約 1秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_030.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 10.30it/s]
Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
prompt : 黒髪で短い髪の女性 → a woman with short black hair
100%|██████████████████████████████████████████| 18/18 [00:01<00:00, 15.78it/s]

画像ファイル「image_030.png」が生成される

パラメータ調整やプロンプト、使用するモデルによって結果は大きく変わってくる
特に img2img でしか使わない strength の値が重要

Step 31：変化の強さを調整する（strength） †

strengthは変化の強さを表すパラメータ
値の範囲は 0 から 1 （0 = ほとんど元の画像、1 = 完全に元画像を無視）
生成時のステップ数は「num_inference_steps × strength」となる
strength が小さいと速い。変化が少ない分、生成時間も短い

「sd_031.py」

## sd_031.py　画像から画像生成　strength 強さを表すパラメータ
## model:   beautifulRealistic_brav5.safetensors

import torch
from PIL import Image
from diffusers import StableDiffusionImg2ImgPipeline,DPMSolverMultistepScheduler, logging
from translate import Translator
import matplotlib.pyplot as plt

logging.set_verbosity_error()

# 画像生成
def image_generation(strength):
    # パイプラインを作成
    pipeline = StableDiffusionImg2ImgPipeline.from_single_file(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

    # スケジューラ設定
    pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 30,
                    guidance_scale = 7,
                    strength = strength,
                    generator = generator
                    ).images[0]
    return img

# モデルフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors"
image_path = "images/StableDiffusion_247.png"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda'

# seed 値
seed = 12345678

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '黒髪で短い髪の女性'
#prompt_jp = 'テラスでコーヒーを飲む金髪の女性'
prompt = trans(prompt_jp)
src_image = Image.open(image_path)

print(f'Seed: {seed}, Model: {model_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 複数画像を生成
plt.figure(figsize = [6, 15.5], dpi = 100)
for i in range(10):
    strength = 0.1 + i * 0.1
    img = image_generation(strength)
    plt.subplot(5, 2, i + 1, title = "strength = %.1f" % strength)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_031.png')
plt.close()

プログラムを実行する（実行時間：約 10秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_031.py

Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
prompt : 黒髪で短い髪の女性 → a woman with short black hair
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 15.31it/s]
100%|████████████████████████████████████████████| 3/3 [00:00<00:00, 16.70it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 26.95it/s]
100%|████████████████████████████████████████████| 6/6 [00:00<00:00, 26.25it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.83it/s]
100%|████████████████████████████████████████████| 9/9 [00:00<00:00, 25.62it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.48it/s]
100%|██████████████████████████████████████████| 12/12 [00:00<00:00, 25.21it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 22.46it/s]
100%|██████████████████████████████████████████| 15/15 [00:00<00:00, 24.61it/s]
Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 11032.36it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.50it/s]
100%|██████████████████████████████████████████| 18/18 [00:00<00:00, 24.45it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 29.71it/s]
100%|██████████████████████████████████████████| 21/21 [00:00<00:00, 24.55it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.32it/s]
100%|██████████████████████████████████████████| 24/24 [00:00<00:00, 24.17it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.67it/s]
100%|██████████████████████████████████████████| 27/27 [00:01<00:00, 24.19it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 23.52it/s]
100%|██████████████████████████████████████████| 30/30 [00:01<00:00, 24.25it/s]

画像ファイル「image_031.png」が生成される

プロンプト日本語入力自動英訳

① 黒髪で短い髪の女性 a woman with short black hair

② テラスでコーヒーを飲む金髪の女性 Blonde drinking coffee on the terrace

Step 32：プロンプトの重さ（guidance_scale） †

guidance_scale（CFG scale）はプロンプトの重要性を表すパラメータ

検証例：
① 海鮮丼 → ラーメンに変更してみる

② 海鮮丼 → 鰻丼に変更してみる

「sd_032.py」

## sd_032.py　画像から画像生成　プロンプトの重要度（guidance_scale）
## model:   beautifulRealistic_brav5.safetensors

import torch
from PIL import Image
from diffusers import StableDiffusionImg2ImgPipeline,DPMSolverMultistepScheduler, logging
from translate import Translator
import matplotlib.pyplot as plt

logging.set_verbosity_error()

# 画像生成
def image_generation(g_scale):
    # パイプラインを作成
    pipeline = StableDiffusionImg2ImgPipeline.from_single_file(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

    # スケジューラ設定
    pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 30,
                    guidance_scale = g_scale,
                    strength = 0.5,
                    generator = generator
                    ).images[0]
    return img

# モデルフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors"
image_path = "images/kaisendon.jpg"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda'

# seed 値
seed = 12345678

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = 'ラーメン'
#prompt_jp = '鰻丼'
prompt = trans(prompt_jp)
src_image = Image.open(image_path)

print(f'Seed: {seed}, Model: {model_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 複数画像を生成
plt.figure(figsize = [6, 9.5], dpi = 100)
for i in range(6):
    img = image_generation(i * 2)
    plt.subplot(3, 2, i + 1, title = 'guidance_scale = %d' % (i * 2))
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_032.png')
plt.close()

プログラムを実行する（実行時間：約 18秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_032.py

Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
prompt : ラーメン → Ramen
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 14.86it/s]
100%|██████████████████████████████████████████| 15/15 [00:02<00:00,  7.26it/s]
Fetching 11 files: 100%|█████████████████████| 11/11 [00:00<00:00, 8801.48it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.02it/s]
100%|██████████████████████████████████████████| 15/15 [00:03<00:00,  3.87it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.53it/s]
100%|██████████████████████████████████████████| 15/15 [00:03<00:00,  3.87it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.18it/s]
100%|██████████████████████████████████████████| 15/15 [00:03<00:00,  3.86it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 22.71it/s]
100%|██████████████████████████████████████████| 15/15 [00:03<00:00,  3.86it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.00it/s]
100%|██████████████████████████████████████████| 15/15 [00:03<00:00,  3.86it/s]

画像ファイル「image_032.png」が生成される

プロンプト日本語入力自動英訳

① ラーメン Ramen

② 鰻丼 Eel Rice Bowl

Step 33：【SDXL】モデル合成（refiner） †

リファイナー（refiner）について
・SDXLモデルと同時に公開された機能。SDXLモデルと一緒に使うことで画質向上の目的であったが違いが微妙という評価が多い
・2つの SDXLモデルを合成する用途がある。「ベースモデル」モデルで生成を始め、途中でもう一つの「リファイナーモデル」に渡して生成を続ける
・リファイナーとして使うモデルは SD-XL 1.0-refiner が推奨されたが、SDXLモデルどれもリファイナーとして使うことができる

検証例：
・ベースモデル → animexlXuebimix_v60LCM（アニメ風モデル）
・リファイナーモデル → fudukiMix_v20（リアル風モデル）

「sd_033.py」

## sd_033.py【SDXL】モデル合成（refiner）
## model:   animexlXuebimix_v60LCM.safetensors
##          fudukiMix_v20.safetensors

import torch
from PIL import Image
from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline, EulerAncestralDiscreteScheduler, logging
from translate import Translator
import matplotlib.pyplot as plt

logging.set_verbosity_error()

# モデルフォルダーのパス
model_base_path = "/StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors"
model_ref_path = "/StabilityMatrix/Data/Models/StableDiffusion/fudukiMix_v20.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda'

# seed 値
seed = 12345678

# ベースモデルのパイプライン
pipe_base = StableDiffusionXLPipeline.from_single_file(
                    model_base_path,
                    torch_dtype = torch.float16
                    ).to(device)

# スケジューラー設定
pipe_base.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe_base.scheduler.config)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

# リファイナーモデルのパイプライン
pipe_ref = StableDiffusionXLImg2ImgPipeline.from_single_file(
                    model_ref_path,
                    torch_dtype = torch.float16,
                    scheduler = pipe_base.scheduler             # スケジューラーを統一
                    ).to(device)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '猫を抱いている短い髪のの女性'
prompt = trans(prompt_jp)

print(f'Seed: {seed}')
print(f'Model1: {model_base_path}')
print(f'Model2: {model_ref_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# ベースモデルで画像生成
img0 = pipe_base(
                    prompt,
                    num_inference_steps = 20,
                    generator = generator,
                    denoising_end = 0.4,                        # 途中で生成をやめると指定
                    output_type = 'latent'                      # 出力を潜在空間と指定
                    ).images

# リファイナーモデルで画像生成
img = pipe_ref(
                    prompt,
                    image = img0,
                    num_inference_steps=20,
                    generator = generator,
                    denoising_start=0.4,                        # 生成を途中から続けると指定
                    ).images[0]

img.save('results/image_033.png')

プログラムを実行する（実行時間：約 6分 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_033.py

Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  4.16it/s]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 17009.34it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.38it/s]
Seed: 12345678
Model1: /StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors
Model2: /StabilityMatrix/Data/Models/StableDiffusion/fudukiMix_v20.safetensors
prompt : 猫を抱いている短い髪のの女性 → a short-haired woman holding a cat
100%|████████████████████████████████████████████| 8/8 [02:04<00:00, 15.57s/it]
100%|██████████████████████████████████████████| 12/12 [03:47<00:00, 18.99s/it]

画像ファイル「image_033.png」が生成される

Step 34：【SDXL】モデル合成（refiner）パラメータを比較する †

合成パラメータについて
・ベースパイプラインの出力を「潜在空間」と指定する output_type = 'latent'
・移り変わりポイントを示すパラメータ denoising_end とリファイナーパイプラインに denoising_start 同じ値（0～9）を指定
　例えば 0.4 を入れると 40%までベースモデルで生成し、その後はリファイナーモデルで生成を続けることになる
　ステップ数（num_inference_steps）20に設定すると 8 と 12 に分けられる
・数値が小さいとリファイナーモデルの影響が大きく、逆に数値が 1 に近づくとベースモデルだけ使うのと変わらないことになる
・denoising_end と denoising_startは SDXL ではない従来のモデルに使っても効果がない SDXLモデル限定の方法

「sd_034.py」

## sd_034.py【SDXL】モデル合成（refiner）２ パラメータ比較
## model:   animexlXuebimix_v60LCM.safetensors
##          fudukiMix_v20.safetensors

import torch
from PIL import Image
from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline, EulerAncestralDiscreteScheduler, logging
from translate import Translator
import matplotlib.pyplot as plt

logging.set_verbosity_error()

# 画像生成
def image_generation(sep):
    # ベースモデルのパイプライン
    pipe_base = StableDiffusionXLPipeline.from_single_file(
                    model_base_path,
                    torch_dtype = torch.float16
                    ).to(device)

    # スケジューラー設定
    pipe_base.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe_base.scheduler.config)

    # リファイナーモデルのパイプライン
    pipe_ref = StableDiffusionXLImg2ImgPipeline.from_single_file(
                    model_ref_path,
                    torch_dtype = torch.float16,
                    scheduler = pipe_base.scheduler             # スケジューラーを統一
                    ).to(device)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # ベースモデルで画像生成
    img0 = pipe_base(
                    prompt,
                    num_inference_steps = 20,
                    generator = generator,
                    denoising_end = sep,                        # 途中で生成をやめると指定
                    output_type = 'latent'                      # 出力を潜在空間と指定
                    ).images

    # リファイナーモデルで画像生成
    img = pipe_ref(
                    prompt,
                    image = img0,
                    num_inference_steps = 20,
                    generator = generator,
                    denoising_start = sep,                      # 生成を途中から続けると指定
                    ).images[0]

    return img

# モデルフォルダーのパス
model_base_path = "/StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors"
model_ref_path = "/StabilityMatrix/Data/Models/StableDiffusion/fudukiMix_v20.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda'

# seed 値
seed = 12345678

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '庭で兎と遊んでいる女性'
prompt = trans(prompt_jp)

print(f'Seed: {seed}')
print(f'Model1: {model_base_path}')
print(f'Model2: {model_ref_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 複数画像を生成
plt.figure(figsize = [6, 12.5], dpi = 100)
for i in range(8):
    sep = 0.1 + 0.1 * i
    img = image_generation(sep)
    plt.subplot(4, 2, i + 1, title = '%.1f' % sep)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout(pad = 0.5)
plt.savefig('results/image_034.png')
plt.close()

プログラムを実行する（実行時間：約 50分 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_034.py

Seed: 12345678
Model1: /StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors
Model2: /StabilityMatrix/Data/Models/StableDiffusion/fudukiMix_v20.safetensors
prompt : 庭で兎と遊んでいる女性 → Woman playing with a rabbit in the garden
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00,  8.33it/s]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 16989.08it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00, 14.65it/s]
100%|████████████████████████████████████████████| 2/2 [00:22<00:00, 11.01s/it]
100%|██████████████████████████████████████████| 18/18 [05:08<00:00, 17.14s/it]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00, 14.60it/s]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 17021.52it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00, 11.85it/s]
100%|████████████████████████████████████████████| 4/4 [00:52<00:00, 13.06s/it]
100%|██████████████████████████████████████████| 16/16 [04:44<00:00, 17.77s/it]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.44it/s]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 15972.93it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.16it/s]
100%|████████████████████████████████████████████| 6/6 [01:31<00:00, 15.20s/it]
100%|██████████████████████████████████████████| 14/14 [04:14<00:00, 18.19s/it]
Fetching 17 files: 100%|█████████████████████| 17/17 [00:00<00:00, 5665.73it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.50it/s]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 16876.49it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.68it/s]
100%|████████████████████████████████████████████| 8/8 [01:52<00:00, 14.07s/it]
100%|██████████████████████████████████████████| 12/12 [03:31<00:00, 17.66s/it]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.73it/s]
Fetching 17 files: 100%|█████████████████████| 17/17 [00:00<00:00, 8408.39it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.37it/s]
100%|██████████████████████████████████████████| 10/10 [02:25<00:00, 14.54s/it]
100%|██████████████████████████████████████████| 10/10 [03:42<00:00, 22.20s/it]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00,  7.43it/s]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.48it/s]
100%|██████████████████████████████████████████| 12/12 [02:55<00:00, 14.66s/it]
100%|████████████████████████████████████████████| 8/8 [02:14<00:00, 16.78s/it]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 17058.17it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00,  7.51it/s]
Fetching 17 files: 100%|█████████████████████| 17/17 [00:00<00:00, 7249.20it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.62it/s]
100%|██████████████████████████████████████████| 14/14 [03:20<00:00, 14.31s/it]
100%|████████████████████████████████████████████| 6/6 [01:38<00:00, 16.38s/it]
Fetching 17 files: 100%|█████████████████████| 17/17 [00:00<00:00, 5674.74it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.62it/s]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.62it/s]
100%|██████████████████████████████████████████| 16/16 [03:50<00:00, 14.40s/it]
100%|████████████████████████████████████████████| 4/4 [01:02<00:00, 15.63s/it]

画像ファイル「image_034.png」が生成される

Step 35：潜在空間の変換（latent） †

潜在空間（latent）について
・diffusersでは生成している途中の画像は潜在空間で、終わった後ピクセル空間に変換する仕組みになっている
・潜在空間の画像はピクセル空間よりずっと小さいので計算量も小さい。そのため潜在空間で生成してその後変換することで生成が高速化できる
・output_type = 'latent'を指定すると生成が終わった後ピクセル空間に変換せずに潜在空間のままで出力することになる
　リファイナーを使う時にはベースパイプラインの出力はまだ生成の途中だから変換する必要がなく潜在空間のままを使う

潜在空間での画像サイズ
・潜在空間は 4×64×64 ピクセルの画像。出力される画像のサイズは 512×512 なので潜在空間では 1/8 のサイズ
・このことから生成する画像サイズの指定 width と height は 8 で割り切れる数値でないといけない
・右は無理矢理画像として表示した潜在空間の画像

VAE デコーダー
・潜在空間の画像をピクセル空間の画像に変換する
・Stable Diffusionモデルのに VAE が埋め込まれてる。diffusersで生成する時にパイプラインの中で VAE が使われピクセル画像に変換する
・下記の例では潜在空間のまま出力し（左の画像）VAE を呼び出して変換（右の画像）している
エンコーダー / デコーダー
・潜在空間からピクセル空間に変換するのは VAE デコーダー
・ピクセル空間から潜在空間に変換するのは VAE エンコーダー
・img2img パイプラインは入力した画像を自動的に VAEエンコーダーによって潜在空間に変換するので、潜在空間で入れてもピクセル空間で入れてもよくなっている
・img2img として使う時に入力した画像が潜在空間に変換されるのであえて変換する必要はないが、変換したい場合は次のようにする
```
imgl2 = torch.HalfTensor(np.array(img1).transpose(2, 0, 1)[None,:] / 255).to(device)
imgl2 = pipe.vae.encode(imgl2).latent_dist.sample() * pipe.vae.config.scaling_factor
```

「sd_035.py」

## sd_035.py 潜在空間の変換（latent）
## model:   animePastelDream_softBakedVae.safetensors

import torch
from PIL import Image
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler, logging
from translate import Translator
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

logging.set_verbosity_error()

# モデルフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/animePastelDream_softBakedVae.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda'

# seed 値
seed = 12345678

# パイプラインを作成
pipeline  = StableDiffusionPipeline.from_single_file(
                    model_path,
                    torch_dtype = torch.float16
                    ).to(device)

# スケジューラー設定
pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '庭で兎と遊んでいる女性'
prompt = trans(prompt_jp)

print(f'Seed: {seed}  Model: {model_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像生成（潜在空間）
img_latent  = pipeline(
                    prompt = prompt,
                    num_inference_steps = 20,
                    generator = generator,
                    output_type='latent'
                    ).images

print(f'latent.shape = {img_latent.shape}')                     # torch.Size([1, 4, 64, 64])

# 潜在空間を画像として出力
imgl = np.float32(img_latent[0].cpu()).transpose(1, 2, 0)
plt.figure(figsize=[6, 6],dpi = 100)
plt.imshow((imgl - imgl.min()) / (imgl.max() - imgl.min()))
plt.tight_layout()
plt.savefig("results/image_035a.png")
plt.close()

# ピクセル空間に変換して出力
img1 = pipeline.vae.decode(img_latent / pipeline.vae.config.scaling_factor)
img1 = img1.sample[0].detach().cpu().numpy().transpose(1, 2, 0)
img1 = Image.fromarray(np.uint8(np.clip(img1 * 0.5 + 0.5, 0, 1) * 255))
img1.save("results/image_035.png")

プログラムを実行する（実行時間：約 2秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_035.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:01<00:00,  3.12it/s]
Seed: 12345678  Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/animePastelDream_softBakedVae.safetensors
prompt : 庭で兎と遊んでいる女性 → Woman playing with a rabbit in the garden
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 15.95it/s]
latent.shape = torch.Size([1, 4, 64, 64])

画像ファイル「image_035.png」が生成される

Step 36：元画像を4倍拡大（x4 upscaler） †

画像をきれいに拡大する（upscale）
・小さな画像を普段の画像編集ソフトウェアで拡大すると画像は破綻するが、機械学習モデルで変換することで綺麗な拡大画像ができる
・x4 upscaler は画像を4倍サイズにするアップスケーラーのモデル
・このアップスケーラーは多くのメモリーを消費するする
ダウンロード済みモデルではうまくいかないので .from_pretrained で自動ダウンロードする（初回のみダウンロード時間を要する）
プロンプトを入れると精度が上がるとの情報もあるが大きな差はなかったのでプロンプトなし（空白）とした

元画像128x128pixel → 512x512pixel

元画像256x256pixel → 1024x1024pixel

「sd_036.py」

## sd_036.py　元画像を4倍拡大（x4 upscaler ）
## model:   stabilityai/stable-diffusion-x4-upscaler

import torch
from PIL import Image
from diffusers import StableDiffusionUpscalePipeline, logging

logging.set_verbosity_error()

# モデルフォルダーのパス
model_path = "stabilityai/stable-diffusion-x4-upscaler"
#image_path = "images/uptest_128x128.png"
image_path = "images/uptest_256x256.png"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda'

# seed 値
seed = 12345678

# パイプラインを作成
pipeline = StableDiffusionUpscalePipeline.from_pretrained(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

# プロンプト
prompt = ''

# 元画像の読み込み
src_image = Image.open(image_path)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'prompt : {prompt}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 20,
                    generator = generator
                    ).images[0]

image.save("results/image_036.png")

プログラムを実行する（実行時間：約 5秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_036.py

Loading pipeline components...: 100%|████████████| 6/6 [00:01<00:00,  3.95it/s]
Seed: 12345678, Model: stabilityai/stable-diffusion-x4-upscaler
prompt :
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.71it/s]

画像ファイル「image_036.png」が生成される

Step 37：潜在空間で2倍拡大（x2 latent upscaler） †

#ref(): File not found: "sd_037_m.jpg" at page "AI_Program2"

「sd_037.py」

## sd_037.py　潜在空間で2倍拡大（x2 latent upscaler）

import torch
from diffusers import StableDiffusionPipeline, StableDiffusionLatentUpscalePipeline, logging
from translate import Translator

logging.set_verbosity_error()

# モデルのフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda'

# seed 値
seed = 12345678

# パイプラインを作成
pipeline = StableDiffusionPipeline.from_single_file(model_path).to(device)

# 2番目のパイプライン
pipeline_x2 = StableDiffusionLatentUpscalePipeline.from_pretrained(
                    'stabilityai/sd-x2-latent-upscaler',
                    torch_dtype=torch.float16,
                    ).to(device)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '満開の蘭'
prompt = trans(prompt_jp)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像を生成
img0 = pipeline(
                    prompt=prompt,
                    num_inference_steps = 20,
                    generator = generator,
                    output_type = 'latent'
                    ).images

image = pipeline_x2(
                    '',
                    img0,
                    num_inference_steps=20,
                    ).images[0]

image.save("results/sd_037.png")

# 途中の生成画像の保存
from PIL import Image
import numpy as np

img1 = pipeline.vae.decode(img0 / pipeline.vae.config.scaling_factor)
img1 = img1.sample[0].detach().cpu().numpy().transpose(1, 2, 0)
img1 = np.uint8(np.clip(img1 * 0.5 + 0.5, 0,1) * 255)
Image.fromarray(img1).save('results/sd_037_512.png')

#ref(): File not found: "sd_037_512_m.jpg" at page "AI_Program2"

プログラムを実行する（実行時間：約 4秒 RTX 4070 Ti 12GB）　とちゅで生成される 512x512サイズの画像 →

(sd_test) PS > python sd_037.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00,  8.84it/s]
Loading pipeline components...: 100%|████████████| 5/5 [00:01<00:00,  4.24it/s]
Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors
prompt : 満開の蘭 → Orchid in full bloom
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  7.19it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 12.08it/s]

画像ファイル「image_037.png」が生成される

忘備録 †

更新履歴 †

2025/06/15 初版

参考資料 †

Stable Diffusion

書籍など
- 日経ソフトウエア 2025年7月号「ローカル生成AIプログラミング」
- Interface 2025年3月号「画像による異常検出＆ローカルLLM作り - 仕事のための生成AI」