AI_Program2 のバックアップ(No.18)

私的AI研究会 > AI_Program2

生成 AI プログラミング２ †

　これまで検証してきた結果をもとに、Python で生成 AI プログラムを書く

▲　目　次

生成 AI プログラミング２
参考資料

※ 最終更新:2025/08/04　

↑

diffusersではじめめる Stable Diffusion （応用編） †

　画像から画像を生成する　img2img

　参考サイト：diffusers（Stable Diffusion）による画像の改造／合成／変換／修正／拡大

↑

動作環境 †

このプロジェクトは以下の Anaconda 仮想環境とプロジェクト・フォルダで動作する
```
(base) PS > conda activate sd_test
(sd_test) PS > cd workspace_3/sd_test
```

↑

概要 †

この章で作成するプログラム一覧と実行速度の目安

Step		プログラム	GPU					CPU
Step		プログラム	RTX 4070Ti	RTX 4060	RTX 4060L	RTX 3050	GTX 1050	i7-1260P
30	一番簡単な画像から画像生成	sd_030.py	00:01	00:01	00:05	00:03	00:19	×
31	変化の強さを調整する（strength）	sd_031.py	00:10	00:12	00:15	00:28	02:49	×
32	プロンプトの重さ（guidance_scale）	sd_032.py	00:18	00:53	01:06	02:45	14:01	×
33	【SDXL】モデル合成（refiner）	sd_033.py	06:00	07:03	08:21	12:20	26:46	×
34	【SDXL】モデル合成（refiner）パラメータを比較	sd_034.py	50:00	01:06:59	56:26	01:25:33	02:54:54	×
35	潜在空間の変換（latent）	sd_035.py	00:02	00:02	00:27	00:29	00:41	×
36	元画像を4倍拡大（x4 upscaler）	sd_036.py	00:05	02:07	03:53	02:07	02:51	×
37	潜在空間で2倍拡大（x2 latent upscaler）	sd_037.py	00:04	00:07	02:42	01:35	13:17	×
38	特定の部分だけ修正（inpaint）	sd_038.py	00:01	00:55	03:05	02:42	02:45	×

　・単位　（時：）分：秒
　× CPU では動作不可

↑

Step 30：一番簡単な画像から画像生成プログラム †

img2img 画像から画像生成
モデルの種類基本画像サイズパイプライン作成オブジェクト

SD1.5 512x512 StableDiffusionImg2ImgPipeline

SDXL 1024x1024 StableDiffusionXLImg2ImgPipeline

モデルの種類	基本画像サイズ	パイプライン作成オブジェクト
SD1.5	512x512	StableDiffusionImg2ImgPipeline
SDXL	1024x1024	StableDiffusionXLImg2ImgPipeline

アニメ風画像をリアル風に変更する
例：使用モデル　beautifulRealistic_brav5

「sd_030.py」　　元になる画像 StableDiffusion_247.png →

## sd_030.py　画像から画像生成（img2img ）
## model:   beautifulRealistic_brav5.safetensors

import torch
from PIL import Image
from diffusers import StableDiffusionImg2ImgPipeline,DPMSolverMultistepScheduler, logging
from translate import Translator
import sd_tools as sdt

logging.set_verbosity_error()

# モデルフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors"
image_path = "images/StableDiffusion_247.png"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# パイプラインを作成
pipeline = StableDiffusionImg2ImgPipeline.from_single_file(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

# スケジューラ設定
pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '黒髪で短い髪の女性'
prompt = trans(prompt_jp)
src_image = Image.open(image_path)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 30,
                    guidance_scale = 7,
                    strength = 0.6,
                    generator = generator
                    ).images[0]

image.save("results/image_030.png")
save_path = 'results/image_030.png'
sdt.image_save2(image, save_path, save_path)

プログラムを実行する（実行時間：約 1秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_030.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 10.30it/s]
Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
prompt : 黒髪で短い髪の女性 → a woman with short black hair
100%|██████████████████████████████████████████| 18/18 [00:01<00:00, 15.78it/s]

画像ファイル「image_030.png」が生成される

パラメータ調整やプロンプト、使用するモデルによって結果は大きく変わってくる
特に img2img でしか使わない strength の値が重要

↑

Step 31：変化の強さを調整する（strength） †

strengthは変化の強さを表すパラメータ
値の範囲は 0 から 1 （0 = ほとんど元の画像、1 = 完全に元画像を無視）
生成時のステップ数は「num_inference_steps × strength」となる
strength が小さいと速い。変化が少ない分、生成時間も短い

「sd_031.py」

## sd_031.py　画像から画像生成　strength 強さを表すパラメータ
## model:   beautifulRealistic_brav5.safetensors

import torch
from PIL import Image
from diffusers import StableDiffusionImg2ImgPipeline,DPMSolverMultistepScheduler, logging
from translate import Translator
import matplotlib.pyplot as plt
import sd_tools as sdt

logging.set_verbosity_error()

# 画像生成
def image_generation(strength):
    # パイプラインを作成
    pipeline = StableDiffusionImg2ImgPipeline.from_single_file(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

    # スケジューラ設定
    pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 30,
                    guidance_scale = 7,
                    strength = strength,
                    generator = generator
                    ).images[0]
    return img

# モデルフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors"
image_path = "images/StableDiffusion_247.png"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '黒髪で短い髪の女性'
#prompt_jp = 'テラスでコーヒーを飲む金髪の女性'
prompt = trans(prompt_jp)
src_image = Image.open(image_path)

print(f'Seed: {seed}, Model: {model_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 複数画像を生成
plt.figure(figsize = [6, 15.5], dpi = 100)
for i in range(10):
    strength = 0.1 + i * 0.1
    img = image_generation(strength)
    plt.subplot(5, 2, i + 1, title = "strength = %.1f" % strength)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
save_path = 'results/image_031.png'
plt.savefig(save_path)
plt.close()

sdt.image_disp(save_path, save_path)

プログラムを実行する（実行時間：約 10秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_031.py

Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
prompt : 黒髪で短い髪の女性 → a woman with short black hair
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 15.31it/s]
100%|████████████████████████████████████████████| 3/3 [00:00<00:00, 16.70it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 26.95it/s]
100%|████████████████████████████████████████████| 6/6 [00:00<00:00, 26.25it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.83it/s]
100%|████████████████████████████████████████████| 9/9 [00:00<00:00, 25.62it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.48it/s]
100%|██████████████████████████████████████████| 12/12 [00:00<00:00, 25.21it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 22.46it/s]
100%|██████████████████████████████████████████| 15/15 [00:00<00:00, 24.61it/s]
Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 11032.36it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.50it/s]
100%|██████████████████████████████████████████| 18/18 [00:00<00:00, 24.45it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 29.71it/s]
100%|██████████████████████████████████████████| 21/21 [00:00<00:00, 24.55it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.32it/s]
100%|██████████████████████████████████████████| 24/24 [00:00<00:00, 24.17it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.67it/s]
100%|██████████████████████████████████████████| 27/27 [00:01<00:00, 24.19it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 23.52it/s]
100%|██████████████████████████████████████████| 30/30 [00:01<00:00, 24.25it/s]

画像ファイル「image_031.png」が生成される

プロンプト日本語入力自動英訳

① 黒髪で短い髪の女性 a woman with short black hair

② テラスでコーヒーを飲む金髪の女性 Blonde drinking coffee on the terrace

プロンプト	日本語入力	自動英訳
①	黒髪で短い髪の女性	a woman with short black hair
②	テラスでコーヒーを飲む金髪の女性	Blonde drinking coffee on the terrace

↑

Step 32：プロンプトの重さ（guidance_scale） †

guidance_scale（CFG scale）はプロンプトの重要性を表すパラメータ

検証例：
① 海鮮丼 → ラーメンに変更してみる

② 海鮮丼 → 鰻丼に変更してみる

「sd_032.py」

## sd_032.py　画像から画像生成　プロンプトの重要度（guidance_scale）
## model:   beautifulRealistic_brav5.safetensors

import torch
from PIL import Image
from diffusers import StableDiffusionImg2ImgPipeline,DPMSolverMultistepScheduler, logging
from translate import Translator
import matplotlib.pyplot as plt
import sd_tools as sdt

logging.set_verbosity_error()

# 画像生成
def image_generation(g_scale):
    # パイプラインを作成
    pipeline = StableDiffusionImg2ImgPipeline.from_single_file(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

    # スケジューラ設定
    pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 30,
                    guidance_scale = g_scale,
                    strength = 0.5,
                    generator = generator
                    ).images[0]
    return img

# モデルフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors"
image_path = "images/kaisendon.jpg"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = 'ラーメン'
#prompt_jp = '鰻丼'
prompt = trans(prompt_jp)
src_image = Image.open(image_path)

print(f'Seed: {seed}, Model: {model_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 複数画像を生成
plt.figure(figsize = [6, 9.5], dpi = 100)
for i in range(6):
    img = image_generation(i * 2)
    plt.subplot(3, 2, i + 1, title = 'guidance_scale = %d' % (i * 2))
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
save_path = 'results/image_032.png'
plt.savefig(save_path)
plt.close()

sdt.image_disp(save_path, save_path)

プログラムを実行する（実行時間：約 18秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_032.py

Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
prompt : ラーメン → Ramen
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 14.86it/s]
100%|██████████████████████████████████████████| 15/15 [00:02<00:00,  7.26it/s]
Fetching 11 files: 100%|█████████████████████| 11/11 [00:00<00:00, 8801.48it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.02it/s]
100%|██████████████████████████████████████████| 15/15 [00:03<00:00,  3.87it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.53it/s]
100%|██████████████████████████████████████████| 15/15 [00:03<00:00,  3.87it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.18it/s]
100%|██████████████████████████████████████████| 15/15 [00:03<00:00,  3.86it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 22.71it/s]
100%|██████████████████████████████████████████| 15/15 [00:03<00:00,  3.86it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.00it/s]
100%|██████████████████████████████████████████| 15/15 [00:03<00:00,  3.86it/s]

画像ファイル「image_032.png」が生成される

プロンプト日本語入力自動英訳

① ラーメン Ramen

② 鰻丼 Eel Rice Bowl

プロンプト	日本語入力	自動英訳
①	ラーメン	Ramen
②	鰻丼	Eel Rice Bowl

↑

Step 33：【SDXL】モデル合成（refiner） †

リファイナー（refiner）について
・SDXLモデルと同時に公開された機能。SDXLモデルと一緒に使うことで画質向上の目的であったが違いが微妙という評価が多い
・2つの SDXLモデルを合成する用途がある。「ベースモデル」モデルで生成を始め、途中でもう一つの「リファイナーモデル」に渡して生成を続ける
・リファイナーとして使うモデルは SD-XL 1.0-refiner が推奨されたが、SDXLモデルどれもリファイナーとして使うことができる

検証例：
・ベースモデル → animexlXuebimix_v60LCM（アニメ風モデル）
・リファイナーモデル → fudukiMix_v20（リアル風モデル）

「sd_033.py」

## sd_033.py【SDXL】モデル合成（refiner）
## model:   animexlXuebimix_v60LCM.safetensors
##          fudukiMix_v20.safetensors

import torch
from PIL import Image
from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline, EulerAncestralDiscreteScheduler, logging
from translate import Translator
import matplotlib.pyplot as plt
import sd_tools as sdt

logging.set_verbosity_error()

# モデルフォルダーのパス
model_base_path = "/StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors"
model_ref_path = "/StabilityMatrix/Data/Models/StableDiffusion/fudukiMix_v20.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# ベースモデルのパイプライン
pipe_base = StableDiffusionXLPipeline.from_single_file(
                    model_base_path,
                    torch_dtype = torch.float16
                    ).to(device)

# スケジューラー設定
pipe_base.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe_base.scheduler.config)

# リファイナーモデルのパイプライン
pipe_ref = StableDiffusionXLImg2ImgPipeline.from_single_file(
                    model_ref_path,
                    torch_dtype = torch.float16,
                    scheduler = pipe_base.scheduler             # スケジューラーを統一
                    ).to(device)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '猫を抱いている短い髪のの女性'
prompt = trans(prompt_jp)

print(f'Seed: {seed}')
print(f'Model1: {model_base_path}')
print(f'Model2: {model_ref_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# ベースモデルで画像生成
img0 = pipe_base(
                    prompt,
                    num_inference_steps = 20,
                    generator = generator,
                    denoising_end = 0.4,                        # 途中で生成をやめると指定
                    output_type = 'latent'                      # 出力を潜在空間と指定
                    ).images

# リファイナーモデルで画像生成
image = pipe_ref(
                    prompt,
                    image = img0,
                    num_inference_steps = 20,
                    generator = generator,
                    denoising_start = 0.4,                      # 生成を途中から続けると指定
                    ).images[0]

#image.save('results/image_033.png')
save_path = 'results/image_033.png'
sdt.image_save2(image, save_path, save_path)

プログラムを実行する（実行時間：約 6分 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_033.py

Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  4.16it/s]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 17009.34it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.38it/s]
Seed: 12345678
Model1: /StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors
Model2: /StabilityMatrix/Data/Models/StableDiffusion/fudukiMix_v20.safetensors
prompt : 猫を抱いている短い髪のの女性 → a short-haired woman holding a cat
100%|████████████████████████████████████████████| 8/8 [02:04<00:00, 15.57s/it]
100%|██████████████████████████████████████████| 12/12 [03:47<00:00, 18.99s/it]

画像ファイル「image_033.png」が生成される

↑

Step 34：【SDXL】モデル合成（refiner）パラメータを比較する †

合成パラメータについて
・ベースパイプラインの出力を「潜在空間」と指定する output_type = 'latent'
・移り変わりポイントを示すパラメータ denoising_end とリファイナーパイプラインに denoising_start 同じ値（0～9）を指定
　例えば 0.4 を入れると 40%までベースモデルで生成し、その後はリファイナーモデルで生成を続けることになる
　ステップ数（num_inference_steps）20に設定すると 8 と 12 に分けられる
・数値が小さいとリファイナーモデルの影響が大きく、逆に数値が 1 に近づくとベースモデルだけ使うのと変わらないことになる
・denoising_end と denoising_startは SDXL ではない従来のモデルに使っても効果がない SDXLモデル限定の方法

「sd_034.py」

## sd_034.py【SDXL】モデル合成（refiner）２ パラメータ比較
## model:   animexlXuebimix_v60LCM.safetensors
##          fudukiMix_v20.safetensors

import torch
from PIL import Image
from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline, EulerAncestralDiscreteScheduler, logging
from translate import Translator
import matplotlib.pyplot as plt
import sd_tools as sdt

logging.set_verbosity_error()

# 画像生成
def image_generation(sep):
    # ベースモデルのパイプライン
    pipe_base = StableDiffusionXLPipeline.from_single_file(
                    model_base_path,
                    torch_dtype = torch.float16
                    ).to(device)

    # スケジューラー設定
    pipe_base.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe_base.scheduler.config)

    # リファイナーモデルのパイプライン
    pipe_ref = StableDiffusionXLImg2ImgPipeline.from_single_file(
                    model_ref_path,
                    torch_dtype = torch.float16,
                    scheduler = pipe_base.scheduler             # スケジューラーを統一
                    ).to(device)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # ベースモデルで画像生成
    img0 = pipe_base(
                    prompt,
                    num_inference_steps = 20,
                    generator = generator,
                    denoising_end = sep,                        # 途中で生成をやめると指定
                    output_type = 'latent'                      # 出力を潜在空間と指定
                    ).images

    # リファイナーモデルで画像生成
    img = pipe_ref(
                    prompt,
                    image = img0,
                    num_inference_steps = 20,
                    generator = generator,
                    denoising_start = sep,                      # 生成を途中から続けると指定
                    ).images[0]

    return img

# モデルフォルダーのパス
model_base_path = "/StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors"
model_ref_path = "/StabilityMatrix/Data/Models/StableDiffusion/fudukiMix_v20.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '庭で兎と遊んでいる女性'
prompt = trans(prompt_jp)

print(f'Seed: {seed}')
print(f'Model1: {model_base_path}')
print(f'Model2: {model_ref_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 複数画像を生成
plt.figure(figsize = [6, 12.5], dpi = 100)
for i in range(8):
    sep = 0.1 + 0.1 * i
    img = image_generation(sep)
    plt.subplot(4, 2, i + 1, title = '%.1f' % sep)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout(pad = 0.5)
save_path = 'results/image_034.png'
plt.savefig(save_path)
plt.close()

sdt.image_disp(save_path, save_path)

プログラムを実行する（実行時間：約 50分 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_034.py

Seed: 12345678
Model1: /StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors
Model2: /StabilityMatrix/Data/Models/StableDiffusion/fudukiMix_v20.safetensors
prompt : 庭で兎と遊んでいる女性 → Woman playing with a rabbit in the garden
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00,  8.33it/s]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 16989.08it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00, 14.65it/s]
100%|████████████████████████████████████████████| 2/2 [00:22<00:00, 11.01s/it]
100%|██████████████████████████████████████████| 18/18 [05:08<00:00, 17.14s/it]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00, 14.60it/s]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 17021.52it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00, 11.85it/s]
100%|████████████████████████████████████████████| 4/4 [00:52<00:00, 13.06s/it]
100%|██████████████████████████████████████████| 16/16 [04:44<00:00, 17.77s/it]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.44it/s]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 15972.93it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.16it/s]
100%|████████████████████████████████████████████| 6/6 [01:31<00:00, 15.20s/it]
100%|██████████████████████████████████████████| 14/14 [04:14<00:00, 18.19s/it]
Fetching 17 files: 100%|█████████████████████| 17/17 [00:00<00:00, 5665.73it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.50it/s]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 16876.49it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.68it/s]
100%|████████████████████████████████████████████| 8/8 [01:52<00:00, 14.07s/it]
100%|██████████████████████████████████████████| 12/12 [03:31<00:00, 17.66s/it]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.73it/s]
Fetching 17 files: 100%|█████████████████████| 17/17 [00:00<00:00, 8408.39it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.37it/s]
100%|██████████████████████████████████████████| 10/10 [02:25<00:00, 14.54s/it]
100%|██████████████████████████████████████████| 10/10 [03:42<00:00, 22.20s/it]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00,  7.43it/s]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.48it/s]
100%|██████████████████████████████████████████| 12/12 [02:55<00:00, 14.66s/it]
100%|████████████████████████████████████████████| 8/8 [02:14<00:00, 16.78s/it]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 17058.17it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00,  7.51it/s]
Fetching 17 files: 100%|█████████████████████| 17/17 [00:00<00:00, 7249.20it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.62it/s]
100%|██████████████████████████████████████████| 14/14 [03:20<00:00, 14.31s/it]
100%|████████████████████████████████████████████| 6/6 [01:38<00:00, 16.38s/it]
Fetching 17 files: 100%|█████████████████████| 17/17 [00:00<00:00, 5674.74it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.62it/s]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.62it/s]
100%|██████████████████████████████████████████| 16/16 [03:50<00:00, 14.40s/it]
100%|████████████████████████████████████████████| 4/4 [01:02<00:00, 15.63s/it]

画像ファイル「image_034.png」が生成される

↑

Step 35：潜在空間の変換（latent） †

潜在空間（latent）について
・diffusersでは生成している途中の画像は潜在空間で、終わった後ピクセル空間に変換する仕組みになっている
・潜在空間の画像はピクセル空間よりずっと小さいので計算量も小さい。そのため潜在空間で生成してその後変換することで生成が高速化できる
・output_type = 'latent'を指定すると生成が終わった後ピクセル空間に変換せずに潜在空間のままで出力することになる
　リファイナーを使う時にはベースパイプラインの出力はまだ生成の途中だから変換する必要がなく潜在空間のままを使う

潜在空間での画像サイズ
・潜在空間は 4×64×64 ピクセルの画像。出力される画像のサイズは 512×512 なので潜在空間では 1/8 のサイズ
・このことから生成する画像サイズの指定 width と height は 8 で割り切れる数値でないといけない
・右は無理矢理画像として表示した潜在空間の画像

VAE デコーダー
・潜在空間の画像をピクセル空間の画像に変換する
・Stable Diffusionモデルのに VAE が埋め込まれてる。diffusersで生成する時にパイプラインの中で VAE が使われピクセル画像に変換する
・下記の例では潜在空間のまま出力し（左の画像）VAE を呼び出して変換（右の画像）している
エンコーダー / デコーダー
・潜在空間からピクセル空間に変換するのは VAE デコーダー
・ピクセル空間から潜在空間に変換するのは VAE エンコーダー
・img2img パイプラインは入力した画像を自動的に VAEエンコーダーによって潜在空間に変換するので、潜在空間で入れてもピクセル空間で入れてもよくなっている
・img2img として使う時に入力した画像が潜在空間に変換されるのであえて変換する必要はないが、変換したい場合は次のようにする
```
imgl2 = torch.HalfTensor(np.array(img1).transpose(2, 0, 1)[None,:] / 255).to(device)
imgl2 = pipe.vae.encode(imgl2).latent_dist.sample() * pipe.vae.config.scaling_factor
```

「sd_035.py」

## sd_035.py 潜在空間の変換（latent）
## model:   animePastelDream_softBakedVae.safetensors

import torch
from PIL import Image
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler, logging
from translate import Translator
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import sd_tools as sdt

logging.set_verbosity_error()

# モデルフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/animePastelDream_softBakedVae.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# パイプラインを作成
pipeline  = StableDiffusionPipeline.from_single_file(
                    model_path,
                    torch_dtype = torch.float16
                    ).to(device)

# スケジューラー設定
pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '庭で兎と遊んでいる女性'
prompt = trans(prompt_jp)

print(f'Seed: {seed}  Model: {model_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像生成（潜在空間）
img_latent  = pipeline(
                    prompt = prompt,
                    num_inference_steps = 20,
                    generator = generator,
                    output_type='latent'
                    ).images

print(f'latent.shape = {img_latent.shape}')                     # torch.Size([1, 4, 64, 64])

# 潜在空間を画像として出力
imgl = np.float32(img_latent[0].cpu()).transpose(1, 2, 0)
plt.figure(figsize=[6, 6],dpi = 100)
plt.imshow((imgl - imgl.min()) / (imgl.max() - imgl.min()))
plt.tight_layout()
save_path = 'results/image_035a.png'
plt.savefig(save_path)
plt.close()

sdt.image_disp(save_path, save_path)

# ピクセル空間に変換して出力
img1 = pipeline.vae.decode(img_latent / pipeline.vae.config.scaling_factor)
img1 = img1.sample[0].detach().cpu().numpy().transpose(1, 2, 0)
img1 = Image.fromarray(np.uint8(np.clip(img1 * 0.5 + 0.5, 0, 1) * 255))

#img1.save("results/image_035.png")
save_path = 'results/image_035.png'
sdt.image_save2(img1, save_path, save_path)

プログラムを実行する（実行時間：約 2秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_035.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:01<00:00,  3.12it/s]
Seed: 12345678  Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/animePastelDream_softBakedVae.safetensors
prompt : 庭で兎と遊んでいる女性 → Woman playing with a rabbit in the garden
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 15.95it/s]
latent.shape = torch.Size([1, 4, 64, 64])

画像ファイル「image_035.png」が生成される

↑

Step 36：元画像を4倍拡大（x4 upscaler） †

画像をきれいに拡大する（upscale）
・小さな画像を普段の画像編集ソフトウェアで拡大すると画像は破綻するが、機械学習モデルで変換することで綺麗な拡大画像ができる
・x4 upscaler は画像を4倍サイズにするアップスケーラーのモデル
・このアップスケーラーは多くのメモリーを消費するする
ダウンロード済みモデルではうまくいかないので .from_pretrained で自動ダウンロードする（初回のみダウンロード時間を要する）
プロンプトを入れると精度が上がるとの情報もあるが大きな差はなかったのでプロンプトなし（空白）とした

元画像128x128pixel → 512x512pixel

元画像256x256pixel → 1024x1024pixel

「sd_036.py」

## sd_036.py　元画像を4倍拡大（x4 upscaler ）
## model:   stabilityai/stable-diffusion-x4-upscaler

import torch
from PIL import Image
from diffusers import StableDiffusionUpscalePipeline, logging
import sd_tools as sdt

logging.set_verbosity_error()

# モデルフォルダーのパス
model_path = "stabilityai/stable-diffusion-x4-upscaler"
#image_path = "images/uptest_128x128.png"
image_path = "images/uptest_256x256.png"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# パイプラインを作成
pipeline = StableDiffusionUpscalePipeline.from_pretrained(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

# プロンプト
prompt = ''

# 元画像の読み込み
src_image = Image.open(image_path)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'prompt : {prompt}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 20,
                    generator = generator
                    ).images[0]

#image.save("results/image_036.png")
save_path = 'results/image_036.png'
sdt.image_save2(image, save_path, save_path)

プログラムを実行する（実行時間：約 5秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_036.py

Loading pipeline components...: 100%|████████████| 6/6 [00:01<00:00,  3.95it/s]
Seed: 12345678, Model: stabilityai/stable-diffusion-x4-upscaler
prompt :
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.71it/s]

画像ファイル「image_036.png」が生成される

↑

Step 37：潜在空間で2倍拡大（x2 latent upscaler） †

「sd_037.py」

## sd_037.py　潜在空間で2倍拡大（x2 latent upscaler）

import torch
from diffusers import StableDiffusionPipeline, StableDiffusionLatentUpscalePipeline, logging
from translate import Translator
import sd_tools as sdt

logging.set_verbosity_error()

# モデルのフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# パイプラインを作成
pipeline = StableDiffusionPipeline.from_single_file(model_path).to(device)

# 2番目のパイプライン
pipeline_x2 = StableDiffusionLatentUpscalePipeline.from_pretrained(
                    'stabilityai/sd-x2-latent-upscaler',
                    torch_dtype=torch.float16,
                    ).to(device)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '満開の蘭'
prompt = trans(prompt_jp)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像を生成
img0 = pipeline(
                    prompt=prompt,
                    num_inference_steps = 20,
                    generator = generator,
                    output_type = 'latent'
                    ).images

image = pipeline_x2(
                    '',
                    img0,
                    num_inference_steps=20,
                    ).images[0]

#image.save("results/image_037.png")
save_path = 'results/image_037.png'
sdt.image_save2(image, save_path, save_path)

# 途中の生成画像の保存
from PIL import Image
import numpy as np

img1 = pipeline.vae.decode(img0 / pipeline.vae.config.scaling_factor)
img1 = img1.sample[0].detach().cpu().numpy().transpose(1, 2, 0)
img1 = np.uint8(np.clip(img1 * 0.5 + 0.5, 0,1) * 255)
#Image.fromarray(img1).save('results/image_037_512.png')

img = Image.fromarray(img1)
save_path = 'results/image_037_512.png'
sdt.image_save2(img, save_path, save_path)

プログラムを実行する（実行時間：約 4秒 RTX 4070 Ti 12GB）　途中で生成される 512x512サイズの画像 →

(sd_test) PS > python sd_037.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00,  8.84it/s]
Loading pipeline components...: 100%|████████████| 5/5 [00:01<00:00,  4.24it/s]
Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors
prompt : 満開の蘭 → Orchid in full bloom
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  7.19it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 12.08it/s]

画像ファイル「image_037.png」が生成される
・SD1.5モデルの推奨サイズは基本 512×512 で 1024×1024 を直接生成しようとしてもいい結果が出ない場合も多い
・512×512 の生成結果をこの手順でで 1024×1024にアップスケールすることができる

↑

Step 38：特定の部分だけ修正（inpaint） †

前準備
・元画像はプロンプト「正面を見て立っている黒髪の女の子」で「sd_038_test.py」で生成
・マスク画像は元画像からペイントソフトで顔の部分を白、残りを黒となる画像を作成
・修正するプロンプト「こっちを見て微笑んでいる女の子」を「sd_038_test.py」の生成画像で確認

▼テスト画像作成実行ログ

(sd_test) PS> python sd_038_test.py
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:01<00:00,  4.60it/s]
Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
prompt : 正面を見て立っている黒髪の女の子 → A dark-haired girl standing in front of you
100%|██████████████████████████████████████████| 30/30 [00:03<00:00,  8.24it/s]

(sd_test) PS> python sd_038_test.py
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:01<00:00,  4.01it/s]
Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
prompt : こっちを見て微笑んでいる女の子 → A girl smiling at me
100%|██████████████████████████████████████████| 30/30 [00:03<00:00,  8.27it/s]
(sd_test) PS D:\anaconda_win\workspace_3\sd_test>

「sd_038.py」

## sd_038.py　特定の部分だけ修正（inpaint）

import torch
from PIL import Image
from diffusers import StableDiffusionInpaintPipeline, logging
from translate import Translator
import sd_tools as sdt

logging.set_verbosity_error()

# モデルのフォルダーのパス
model_path = 'runwayml/stable-diffusion-inpainting'         # モデル
image_path = 'images/sd_038_test.png'                       # 元画像
mask_path = 'images/sd_038_test_mask.png'                   # マスク画像

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# パイプラインを作成
pipeline = StableDiffusionInpaintPipeline.from_pretrained(
                    model_path,
                    torch_dtype = torch.float16,
                    variant = 'fp16'
                    ).to(device)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = 'こっちを見て微笑んでいる女の子'
prompt = trans(prompt_jp)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

img0 = Image.open(image_path)
img_mask = Image.open(mask_path)

print(f'Seed: {seed}')
print(f'prompt : {prompt_jp} → {prompt}')
print(f'Model  : {model_path}')
print(f'source : {image_path}')
print(f'mask   : {mask_path}')

# 画像を生成
image = pipeline(
                    prompt=prompt,
                    image = img0,
                    mask_image = img_mask,
                    num_inference_steps = 20,
                    generator = generator,
                    ).images[0]

#image.save("results/image_038.png")
save_path = 'results/image_038.png'
sdt.image_save2(image, save_path, save_path)

プログラムを実行する（実行時間：約 1秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_038.py

Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 19.11it/s]
Seed: 12345678
prompt : こっちを見て微笑んでいる女の子 → A girl smiling at me
Model  : runwayml/stable-diffusion-inpainting
source : images/sd_038_test.png
mask   : images/sd_038_test_mask.png
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 15.12it/s]

画像ファイル「image_038.png」が生成される

↑

忘備録 †

↑

更新履歴 †

2025/06/15 初版

↑

参考資料 †

Stable Diffusion

【Stable Diffusion】画像を修正する「Inpaint」機能や「txt2mask」について解説

書籍など
- 日経ソフトウエア 2025年7月号「ローカル生成AIプログラミング」
- Interface 2025年3月号「画像による異常検出＆ローカルLLM作り - 仕事のための生成AI」