AI_Program3 のバックアップ(No.8)

私的AI研究会 > AI_Program3

生成 AI プログラミング３ == 編集中 == †

　これまで検証してきた結果をもとに、Python で生成 AI プログラムを書く

▲　目　次

生成 AI プログラミング３ == 編集中 ==
参考資料

※ 最終更新:2025/07/09　

↑

diffusersではじめめる Stable Diffusion （応用編２） †

　画像から画像を生成する　instruct-pix2pix と controlnet instruct-pix2pix

　参考サイト：instruct-pix2pixで画像を指示した通り変更したり

↑

概要 †

この章で作成するプログラム一覧と実行速度の目安

Step		プログラム	GPU					CPU
Step		プログラム	RTX 4070Ti	RTX 4060	RTX 4060L	RTX 3050	GTX 1050	i7-1260P
40	「instruct-pix2pix」で画像を変換	sd_040.py	00:03		00:08		00:50	05:32
40	「instruct-pix2pix」で画像を変換	sd_040a.py	00:08		00:31		18:19	24:11
41	image_guidance_scale パラメータによる変化	sd_041.py	00:12		00:24		04:52	14:23
41	image_guidance_scale パラメータによる変化	sd_041a.py	00:42		02:00		02:40:30	03:38:17
42	「controlnet instruct-pix2pix」で画像を変換	sd_042.py	00:02		00:14		00:54	06:30
43	パラメータによる変化	sd_043.py	00:06		00:24		04:56	17:01

　・単位　（時：）分：秒
　× CPU では動作不可

instruct-pix2pix と controlnet instruct-pix2pix の違い

名称	機能	処理内容	プロンプトの書き方	モデルの場所
instruct-pix2pix	元画像をから新しい画像を作る	指示された内容との関係がある部分だけ変えられる	「これに変えたい」と書く	【SD1.5】instruct-pix2pix
instruct-pix2pix	元画像をから新しい画像を作る	指示された内容との関係がある部分だけ変えられる	「これに変えたい」と書く	【SDXL】sdxl-instructpix2pix-768
controlnet instruct-pix2pix	元画像を改造する	元画像全体を変えられる	欲しい結果画像の姿を描写する	【SD1.5】control_v11e_sd15_ip2p

・「instruct-pix2pix」は SD1.5/SDXL それぞれ専用のモデルで動作する
・「controlnet instruct-pix2pix」の場合はコントロールネットのモデルとベース・モデルが必要

↑

動作環境 †

このプロジェクトは以下の Anaconda 仮想環境とプロジェクト・フォルダで動作する
```
(base) PS > conda activate sd_test
(sd_test) PS > cd workspace_3/sd_test
```

↑

Step 40：「instruct-pix2pix」で画像を変換する †

　　SD1.5 版　　「雪の中の場面にする」

「sd_040.py」　　元になる画像（右） sd_040_test.png　生成画像（左） image_040.png →

## sd_040.py【SD1.5】　画像から画像生成（instruct-pix2pix）サンプル・ソースコード
##      https://qiita.com/phyblas/items/28c342740c2ed00250b8
##      Model: https://huggingface.co/timbrooks/instruct-pix2pix
##      Ver. 0.00   2025/07/05

import torch
from PIL import Image
from diffusers import StableDiffusionInstructPix2PixPipeline, logging
from translate import Translator

logging.set_verbosity_error()

# フォルダーのパス
model_path = "timbrooks/instruct-pix2pix"                       # モデル
image_path = "images/sd_040_test.png"                           # 元画像

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 0

# パイプラインを作成
if device == 'cpu':
    pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_path).to(device)
else:
    pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '雪の中の場面にする'                                # プロンプト
prompt = trans(prompt_jp)
src_image = Image.open(image_path)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'source_image: {image_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 20,
                    image_guidance_scale = 1.5,
                    generator = generator
                    ).images[0]

image.save("results/image_040.png")                            # 生成画像

プログラムを実行する（実行時間：約 3秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_040.py
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  2.48it/s]
Seed: 0, Model: timbrooks/instruct-pix2pix
source_image: images/sd_040_test.png
prompt : 雪の中の場面にする → Make it a scene in the snow
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 16.78it/s]

画像ファイル「image_040.png」が生成される

　　SDXL 版　　「雪の中の場面にする」

「sd_040a.py」　　元になる画像は同じ sd_040_test.png　生成画像（左） image_040a.png →

## sd_040a.py【SDXL】　画像から画像生成（instruct-pix2pix）サンプル・ソースコード
##      https://qiita.com/phyblas/items/28c342740c2ed00250b8
##      Model: https://huggingface.co/timbrooks/instruct-pix2pix
##      Ver. 0.00   2025/07/07

import torch
from PIL import Image
from diffusers import StableDiffusionXLInstructPix2PixPipeline, logging
from translate import Translator

logging.set_verbosity_error()

# フォルダーのパス
model_path = "diffusers/sdxl-instructpix2pix-768"              # モデル
image_path = "images/sd_040_test.png"                          # 元画像

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 0

# 画像サイズ
resolution = 768

# パイプラインを作成
pipeline = StableDiffusionXLInstructPix2PixPipeline.from_pretrained(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '雪の中の場面にする'                                # プロンプト
prompt = trans(prompt_jp)
#src_image = Image.open(image_path)

from diffusers.utils import load_image
src_image = load_image(image_path).resize((resolution, resolution))

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'source_image: {image_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    height = resolution,
                    width = resolution,
                    guidance_scale=3.0,
                    image_guidance_scale = 1.5,
                    num_inference_steps = 20,
                    generator = generator
                    ).images[0]

image.save("results/image_040a.png")                            # 生成画像

プログラムを実行する（実行時間：約 8秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_040a.py
Loading pipeline components...: 100%|████████████| 7/7 [00:05<00:00,  1.38it/s]
Seed: 0, Model: diffusers/sdxl-instructpix2pix-768
source_image: images/sd_040_test.png
prompt : 雪の中の場面にする → Make it a scene in the snow
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  5.30it/s]

画像ファイル「image_040a.png」が生成される

SD1.5 / SDXL モデルによる生成画像の比較

プロンプト雪の中の場面にする春の場面にする夏の場面にする秋の場面にする冬の場面にする

SD1.5

SDXL

SD1.5 / SDXL モデルパイプラインを作成するオブジェクトの違い

モデルの種類基本画像サイズパイプライン作成オブジェクト

SD1.5 512x512 StableDiffusionInstructPix2PixPipeline

SDXL 768x768 StableDiffusionXLInstructPix2PixPipeline

モデルの種類	基本画像サイズ	パイプライン作成オブジェクト
SD1.5	512x512	StableDiffusionInstructPix2PixPipeline
SDXL	768x768	StableDiffusionXLInstructPix2PixPipeline

SDXL版留意点
・元画像のイメージオブジェクトは PILイメージとは異なるようで、サンプルコードにある diffusers.utils.load_image() で作成
・生成サイズは 768x768 固定のようでこのサイズにリサイズしたものを元画像とする

↑

Step 41：「instruct-pix2pix」image_guidance_scale パラメータによる変化をみる †

image_guidance_scale
・画像をどれくらい変えるかを決めるパラメータ
・1 以上を設定（初期値：1.5）

　　SD1.5 版　

プログラムを実行する（実行時間：約 12秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_041.py
Seed: 12345678, Model: timbrooks/instruct-pix2pix
source_image: images/sd_040_test.png
prompt : 雪の中の場面にする → Make it a scene in the snow
** image_guidance_scale 1.0 ～ 1.5 **
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.17it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 17.95it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.85it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 18.39it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.01it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 17.88it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.57it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 18.38it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.14it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 18.45it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  6.13it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 18.44it/s]

画像ファイル「image_041.png」が生成される

モジュール・ソースコード

▼「sd_041.py」

## sd_041.py【SD1.5】　画像から画像生成（instruct-pix2pix）サンプル・ソースコード
## === イメージ・ガイダンススケールを調べる ===
##      https://qiita.com/phyblas/items/28c342740c2ed00250b8
##      Model: https://huggingface.co/timbrooks/instruct-pix2pix
##      Ver. 0.00   2025/07/05

import torch
from PIL import Image
from diffusers import StableDiffusionInstructPix2PixPipeline, logging
from translate import Translator
import matplotlib.pyplot as plt

logging.set_verbosity_error()

# 画像生成
def image_generation(ig_scale):
    # パイプラインを作成
    if device == 'cpu':
        pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(model_path).to(device)
    else:
        pipeline = StableDiffusionInstructPix2PixPipeline.from_pretrained(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 20,
                    image_guidance_scale = ig_scale,
                    generator = generator
                    ).images[0]
    return img

# フォルダーのパス
model_path = "timbrooks/instruct-pix2pix"                      # モデル
image_path = "images/sd_040_test.png"                          # 元画像

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '雪の中の場面にする'                               # プロンプト
prompt = trans(prompt_jp)
src_image = Image.open(image_path)

print(f'Seed: {seed}, Model: {model_path}')
print(f'source_image: {image_path}')
print(f'prompt : {prompt_jp} → {prompt}')
print('** image_guidance_scale 1.0 ～ 1.5 **')


# 複数画像を生成
plt.figure(figsize=[6, 9.5], dpi = 100)
for i in range(6):
    ig_scale = 1 + 0.1 * i
    img = image_generation(ig_scale)
    plt.subplot(3, 2, i + 1, title = 'image_guidance_scale = %.1f'%ig_scale)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_041.png')
plt.close()

　　SDXL 版　

プログラムを実行する（実行時間：約 42秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_041a.py
Seed: 0, Model: diffusers/sdxl-instructpix2pix-768
source_image: images/sd_040_test.png
prompt : 雪の中の場面にする → Make it a scene in the snow
** image_guidance_scale 1.0 ～ 1.5 **
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  2.95it/s]
100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.40it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  3.10it/s]
100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.41it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  2.97it/s]
100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.40it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  3.10it/s]
100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.41it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  3.00it/s]
100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.38it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  3.09it/s]
100%|██████████████████████████████████████████| 30/30 [00:05<00:00,  5.39it/s]

画像ファイル「image_041a.png」が生成される

モジュール・ソースコード

▼「sd_041a.py」

## sd_041a.py【SDXL】　画像から画像生成（instruct-pix2pix）サンプル・ソースコード
## === イメージ・ガイダンススケールを調べる ===
##      https://qiita.com/phyblas/items/28c342740c2ed00250b8
##      Model: https://huggingface.co/timbrooks/instruct-pix2pix
##      Ver. 0.00   2025/07/07

import torch
from PIL import Image
from diffusers import StableDiffusionXLInstructPix2PixPipeline, logging
from translate import Translator
from diffusers.utils import load_image
import matplotlib.pyplot as plt

logging.set_verbosity_error()

# 画像生成
def image_generation(ig_scale):
    # パイプラインを作成
    pipeline = StableDiffusionXLInstructPix2PixPipeline.from_pretrained(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    image = src_image,
                    height = resolution,
                    width = resolution,
                    guidance_scale=3.0,
                    image_guidance_scale = ig_scale,
                    num_inference_steps = 30,
                    generator = generator
                    ).images[0]
    return img

# フォルダーのパス
model_path = "diffusers/sdxl-instructpix2pix-768"               # モデル
image_path = "images/sd_040_test.png"                           # 元画像

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678
seed = 0

# 画像サイズ
resolution = 768

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '雪の中の場面にする'                                # プロンプト
prompt = trans(prompt_jp)
src_image = load_image(image_path).resize((resolution, resolution))

print(f'Seed: {seed}, Model: {model_path}')
print(f'source_image: {image_path}')
print(f'prompt : {prompt_jp} → {prompt}')
print('** image_guidance_scale 1.0 ～ 1.5 **')


# 複数画像を生成
plt.figure(figsize=[6, 9.5], dpi = 100)
for i in range(6):
    ig_scale = 1.0 + 0.1 * i
    img = image_generation(ig_scale)
    plt.subplot(3, 2, i + 1, title = 'image_guidance_scale = %.1f'%ig_scale)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_041a.png')
plt.close()

↑

Step 42：「controlnet instruct-pix2pix」で画像を変換する †

　　SD1.5 版　　「浜辺の場面にする」

「sd_042.py」　　元になる画像（右） sd_040_test.png　生成画像（左） image_040.png →

## sd_042.py【SD1.5】　画像から画像生成（controlnet instruct-pix2pix）サンプル・ソースコード
##      https://qiita.com/phyblas/items/28c342740c2ed00250b8
##      Ver. 0.00   2025/07/07
##
##      command:    python sd_042.py [プロンプト]
##
##       プロンプト     '浜辺の場面にする'  （デフォールト）
##                      '雪の中の場面にする'
##                      '炎の中の場面にする'
##                      '森の中の場面にする'
##                      '山中の場面にする'
##                      '砂漠の場面にする'
##                      '着物姿に着替える'
##
##                      'イラスト画像にする'
##                      'アニメ画像にする'
##                      '微笑んだ顔のアニメ画像にする'
##                      '泣き顔のアニメ画像にする'
##                      '嬉しそうな顔のアニメ画像にする'
##
##      model:          control_v11e_sd15_ip2p_fp16.safetensors
##      base model:     beautifulRealistic_brav5.safetensors        （リアル系）
##                      animePastelDream_softBakedVae.safetensors   （イラスト系）

import torch
from PIL import Image
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, EulerAncestralDiscreteScheduler, logging
from translate import Translator
import sys

logging.set_verbosity_error()

# フォルダーのパス
model_path = '/StabilityMatrix/Data/Models/ControlNet/control_v11e_sd15_ip2p_fp16.safetensors'              # コントロールネット・モデル
model_base_path = '/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors' # ベースモデル
#model_base_path = '/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/animePastelDream_softBakedVae.safetensors' # ベースモデル
image_path = "images/sd_040_test.png"                           # 元画像

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# パイプラインを作成
if device == 'cpu':
    controlnet = ControlNetModel.from_single_file(model_path).to(device)
    pipeline = StableDiffusionControlNetPipeline.from_single_file(model_base_path).to(device)
else:
    controlnet = ControlNetModel.from_single_file(model_path, torch_dtype=torch.float16).to(device)
    pipeline = StableDiffusionControlNetPipeline.from_single_file(
                    model_base_path,
                    controlnet=controlnet,
                    torch_dtype = torch.float16,
                    ).to(device)

# スケジューラー
pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)

# プロンプト
trans = Translator('en','ja').translate
args = sys.argv
prompt_jp = '浜辺の場面にする' if len(args) <= 1 else args[1]    # プロンプト
prompt = trans(prompt_jp)
src_image = Image.open(image_path)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'base Model: {model_base_path}')
print(f'source_image: {image_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 25,
                    generator = generator
                    ).images[0]

image.save("results/image_042.png")                             # 生成画像

プログラムを実行する（実行時間：約 2秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_042.py
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 14.80it/s]
Seed: 12345678, Model: /StabilityMatrix/Data/Models/ControlNet/control_v11e_sd15_ip2p_fp16.safetensors
base Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
source_image: images/sd_040_test.png
prompt : 浜辺の場面にする → Set the scene on the beach
100%|██████████████████████████████████████████| 25/25 [00:02<00:00,  9.98it/s]

画像ファイル「image_042.png」が生成される

プロンプトを変えて生成する
・「python sd_042.py ['プロンプト']」

(sd_test) PS > python sd_042.py '雪の中の場面にする'

・ベースモデル「beautifulRealistic_brav5.safetensors（リアル系）」

浜辺の場面にする	雪の中の場面にする	炎の中の場面にする	森の中の場面にする	山中の場面にする	砂漠の場面にする

着物姿に着替える	イラスト画像にする	アニメ画像にする	微笑んだ顔のアニメ画像	泣き顔のアニメ画像にする	嬉しそうな顔のアニメ画像

・ベースモデル「animePastelDream_softBakedVae.safetensors（イラスト系）」

浜辺の場面にする	雪の中の場面にする	炎の中の場面にする	森の中の場面にする	山中の場面にする	砂漠の場面にする

着物姿に着替える	イラスト画像にする	アニメ画像にする	微笑んだ顔のアニメ画像	泣き顔のアニメ画像にする	嬉しそうな顔のアニメ画像

↑

Step 43：「controlnet instruct-pix2pix」controlnet_conditioning_scale パラメータによる変化をみる †

controlnet_conditioning_scale
・コントロール画像の影響の重みを決めるパラメータ
・既定値は最大値の 1（1より小さい値にしたら入力画像の影響が薄くなる）

プログラムを実行する（実行時間：約 6秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_043.py
Seed: 12345678, Model: /StabilityMatrix/Data/Models/ControlNet/control_v11e_sd15_ip2p_fp16.safetensors
source_image: images/sd_040_test.png
prompt : 浜辺の場面にする → Set the scene on the beach
** controlnet_conditioning_scale 0.6 ～ 1.0 **
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 18.80it/s]
100%|██████████████████████████████████████████| 25/25 [00:01<00:00, 14.93it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 32.75it/s]
100%|██████████████████████████████████████████| 25/25 [00:01<00:00, 17.17it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.71it/s]
100%|██████████████████████████████████████████| 25/25 [00:01<00:00, 17.11it/s]
Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 11013.93it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 21.55it/s]
100%|██████████████████████████████████████████| 25/25 [00:01<00:00, 17.10it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.15it/s]
100%|██████████████████████████████████████████| 25/25 [00:01<00:00, 17.17it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 32.32it/s]
100%|██████████████████████████████████████████| 25/25 [00:01<00:00, 17.17it/s]

画像ファイル「image_043.png」が生成される

モジュール・ソースコード

▼「sd_043.py」

## sd_043.py【SD1.5】　画像から画像生成（controlnet instruct-pix2pix）サンプル・ソースコード
## === controlnet_conditioning_scale を調べる ===
##      https://qiita.com/phyblas/items/28c342740c2ed00250b8
##      Ver. 0.00   2025/07/07

import torch
from PIL import Image
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, EulerAncestralDiscreteScheduler, logging
from translate import Translator
import matplotlib.pyplot as plt

logging.set_verbosity_error()

# 画像生成
def image_generation(cc_scale):
    if device == 'cpu':
        controlnet = ControlNetModel.from_single_file(model_path).to(device)
        pipeline = StableDiffusionControlNetPipeline.from_single_file(model_base_path).to(device)
    else:
        controlnet = ControlNetModel.from_single_file(model_path, torch_dtype=torch.float16).to(device)
        pipeline = StableDiffusionControlNetPipeline.from_single_file(
                    model_base_path,
                    controlnet=controlnet,
                    torch_dtype = torch.float16,
                    ).to(device)

    # スケジューラー
    pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 25,
                    controlnet_conditioning_scale = cc_scale,
                    generator = generator
                    ).images[0]

    return img

# フォルダーのパス
model_path = '/StabilityMatrix/Data/Models/ControlNet/control_v11e_sd15_ip2p_fp16.safetensors'              # コントロールネット・モデル
model_base_path = '/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors' # ベースモデル
image_path = "images/sd_040_test.png"                           # 元画像

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = '雪の中の場面にする'                                # プロンプト
prompt_jp = '浜辺の場面にする'
prompt = trans(prompt_jp)
src_image = Image.open(image_path)

print(f'Seed: {seed}, Model: {model_path}')
print(f'source_image: {image_path}')
print(f'prompt : {prompt_jp} → {prompt}')
print('** controlnet_conditioning_scale 0.6 ～ 1.0 **')


# 複数画像を生成
plt.figure(figsize=[6, 9.5], dpi = 100)
for i in range(6):
    cc_scale = 0.6 + 0.08 * i
    img = image_generation(cc_scale)
    plt.subplot(3, 2, i + 1, title = 'control_condition_scale = %.2f'%cc_scale)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_043.png')
plt.close()

↑

Step 44：「controlnet inpaint」で画像の一部を変換する †

画像の一部を修正する「inpaint」機能は「diffusers」は次の 2つが用意されている
① 従来のinpaint → Step 38：特定の部分だけ修正（inpaint）
② controlnet inpaint → Step 44, 45

・使用するパイプライン・オブジェクトの違い

種類	パイプライン作成オブジェクト
従来のinpaint	StableDiffusionInpaintPipeline
controlnet inpaint	StableDiffusionControlNetInpaintPipeline
controlnet (参考)	StableDiffusionControlNetPipeline

「sd_044.py」　　マスク画像（左） sd_038_test_mask.png　元画像（右） sd_038_test.png →

## sd_044.py【SD1.5】　画像から画像生成（controlnet inpaint）サンプル・ソースコード
##      https://qiita.com/phyblas/items/7cacb9297650afd63d34
##      https://zako-lab929.hatenablog.com/entry/20240212/1707743575
##      Ver. 0.00   2025/07/08
##
##      command:    python sd_044.py [プロンプト]
##
##       プロンプト     '微笑んでいる女性'（デフォールト）
##                      '泣いている女性'
##                      '怒っている女性'
##                      '照れている女性'
##                      '見つめている女性'
##                      '笑っている女性'
##                      '目を瞑っている女性'
##                      'ウィンクしている女性'
##                      '苛立っている女性'
##                      '怖がっている女性'
##                      '驚いている女性'
##                      '疲れている女性'
##
##      model:          control_v11p_sd15_inpaint_fp16.safetensors
##      base model:     beautifulRealistic_brav5.safetensors        （リアル系）
##                      animePastelDream_softBakedVae.safetensors   （イラスト系）
##
##      元画像:         images/sd_038_test.png
##                      images/sd_044_test1.png
##                      images/sd_044_test2.png
##                      images/sd_044_test3.png
##      マスク画像:     images/sd_038_test_mask.png
##                      images/sd_044_test1_mask.png
##                      images/sd_044_test2_mask.png
##                      images/sd_044_test3_mask.png

import torch
from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel, EulerAncestralDiscreteScheduler, logging
from diffusers.utils import load_image
from translate import Translator
import numpy as np
import sys

logging.set_verbosity_error()

# コントロールイメージを作成するメソッド
def make_inpaint_condition(image, image_mask):
    image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
    image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0

    assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
    image[image_mask > 0.5] = -1.0  # set as masked pixel
    image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return image

# フォルダーのパス
model_path = '/StabilityMatrix/Data/Models/ControlNet/control_v11p_sd15_inpaint_fp16.safetensors'           # コントロールネット・モデル
model_base_path = '/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors' # ベースモデル

image_path = "images/sd_038_test.png"                           # 元画像
mask_path = "images/sd_038_test_mask.png"                       # マスク

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# パイプラインを作成
if device == 'cpu':
    controlnet = ControlNetModel.from_single_file(model_path).to(device)
    pipeline = StableDiffusionControlNetInpaintPipeline.from_single_file(model_base_path).to(device)
else:
    controlnet = ControlNetModel.from_single_file(model_path, torch_dtype=torch.float16).to(device)
    pipeline = StableDiffusionControlNetInpaintPipeline.from_single_file(
                    model_base_path,
                    controlnet=controlnet,
                    torch_dtype = torch.float16,
                    ).to(device)

# スケジューラー
pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)

# プロンプト
trans = Translator('en','ja').translate
args = sys.argv
prompt_jp = '微笑んでいる女性' if len(args) <= 1 else args[1]   # プロンプト
prompt = trans(prompt_jp)

src_image = load_image(image_path).resize((512, 512))           # 元画像
msk_image = load_image(mask_path).resize((512, 512))            # マスク画像
img_ctrl = make_inpaint_condition(src_image,msk_image)          # コントロール画像

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'base Model: {model_base_path}')
print(f'source_image: {image_path}')
print(f'mask_image: {mask_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    mask_image = msk_image,
                    control_image=img_ctrl,
                    num_inference_steps = 20,
                    generator = generator
                    ).images[0]

image.save("results/image_044.png")                             # 生成画像

プログラムを実行する（実行時間：約 1秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_044.py
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 12.47it/s]
Seed: 12345678, Model: /StabilityMatrix/Data/Models/ControlNet/control_v11p_sd15_inpaint_fp16.safetensors
base Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
source_image: images/sd_038_test.png
mask_image: images/sd_038_test_mask.png
prompt : 微笑んでいる女性 → Woman smiling
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 10.60it/s]

画像ファイル「image_044.png」が生成される

プロンプトを変えて生成する
・「python sd_044.py ['プロンプト']」

(sd_test) PS > python sd_044.py '見つめている女性'

・ベースモデル「beautifulRealistic_brav5.safetensors（リアル系）」

微笑んでいる女性	泣いている女性	怒っている女性	照れている女性	見つめている女性	笑っている女性

目を瞑っている女性	ウィンクしている女性	苛立っている女性	怖がっている女性	驚いている女性	疲れている女性

・ベースモデル「animePastelDream_softBakedVae.safetensors（イラスト系）」

微笑んでいる女性	泣いている女性	怒っている女性	照れている女性	見つめている女性	笑っている女性



目を瞑っている女性	ウィンクしている女性	苛立っている女性	怖がっている女性	驚いている女性	疲れている女性

↑

Step 45：「controlnet inpaint」strength パラメータによる変化をみる †

strength
・どれくらいその部分を変更するかを決める数値
・既定値は 1（完全に新しいものに入れ替える）
・0 ～ 1 の値を指定して元の画像の形を保つ程度を決めることができる

プログラムを実行する（実行時間：約 5秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_045.py
Seed: 12345678, Model: /StabilityMatrix/Data/Models/ControlNet/control_v11p_sd15_inpaint_fp16.safetensors
base Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
source_image: images/sd_038_test.png
mask_image: images/sd_038_test_mask.png
prompt : 微笑んでいる女性 → Woman smiling
** strength 0.1 ～ 1.0 **
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 22.39it/s]
100%|█████████████████████████████████████████████| 2/2 [00:00<00:00, 10.33it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 33.31it/s]
100%|█████████████████████████████████████████████| 4/4 [00:00<00:00, 16.52it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 34.29it/s]
100%|█████████████████████████████████████████████| 6/6 [00:00<00:00, 15.99it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 22.93it/s]
100%|█████████████████████████████████████████████| 8/8 [00:00<00:00, 15.01it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 33.96it/s]
100%|███████████████████████████████████████████| 10/10 [00:00<00:00, 16.53it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 33.38it/s]
100%|███████████████████████████████████████████| 12/12 [00:00<00:00, 16.25it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 32.07it/s]
100%|███████████████████████████████████████████| 14/14 [00:00<00:00, 15.34it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 33.52it/s]
100%|███████████████████████████████████████████| 16/16 [00:01<00:00, 15.45it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 33.84it/s]
100%|███████████████████████████████████████████| 18/18 [00:01<00:00, 16.50it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 33.51it/s]
100%|███████████████████████████████████████████| 20/20 [00:01<00:00, 16.35it/s]

画像ファイル「image_045.png」が生成される

モジュール・ソースコード

▼「sd_045.py」

## sd_045.py【SD1.5】　画像から画像生成（controlnet inpaint）サンプル・ソースコード
## === strengthを調べる ===
##      https://qiita.com/phyblas/items/7cacb9297650afd63d34
##      https://zako-lab929.hatenablog.com/entry/20240212/1707743575
##      Ver. 0.00   2025/07/09
##
##      command:    python sd_045.py [プロンプト]
##
##       プロンプト     '微笑んでいる女性'（デフォールト）
##                      '泣いている女性'
##                      '怒っている女性'
##                      '照れている女性'
##                      '見つめている女性'
##                      '笑っている女性'
##                      '目を瞑っている女性'
##                      'ウィンクしている女性'
##                      '苛立っている女性'
##                      '怖がっている女性'
##                      '驚いている女性'
##                      '疲れている女性'
##
##      model:          control_v11p_sd15_inpaint_fp16.safetensors
##      base model:     beautifulRealistic_brav5.safetensors        （リアル系）
##                      animePastelDream_softBakedVae.safetensors   （イラスト系）
##
##      元画像:         images/sd_038_test.png
##                      images/sd_044_test1.png
##                      images/sd_044_test2.png
##                      images/sd_044_test3.png
##      マスク画像:     images/sd_038_test_mask.png
##                      images/sd_044_test1_mask.png
##                      images/sd_044_test2_mask.png
##                      images/sd_044_test3_mask.png

import torch
from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel, EulerAncestralDiscreteScheduler, logging
from diffusers.utils import load_image
from translate import Translator
import numpy as np
import matplotlib.pyplot as plt
import sys

logging.set_verbosity_error()

# コントロールイメージを作成するメソッド
def make_inpaint_condition(image, image_mask):
    image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
    image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0

    assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
    image[image_mask > 0.5] = -1.0  # set as masked pixel
    image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return image

# 画像生成
def image_generation(strength):
    # パイプラインを作成
    if device == 'cpu':
        controlnet = ControlNetModel.from_single_file(model_path).to(device)
        pipeline = StableDiffusionControlNetInpaintPipeline.from_single_file(model_base_path).to(device)
    else:
        controlnet = ControlNetModel.from_single_file(model_path, torch_dtype=torch.float16).to(device)
        pipeline = StableDiffusionControlNetInpaintPipeline.from_single_file(
                    model_base_path,
                    controlnet=controlnet,
                    torch_dtype = torch.float16,
                    ).to(device)

    # スケジューラー
    pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    mask_image = msk_image,
                    control_image=img_ctrl,
                    num_inference_steps = 20,
                    strength=strength,
                    generator = generator
                    ).images[0]
    return image


# フォルダーのパス
model_path = '/StabilityMatrix/Data/Models/ControlNet/control_v11p_sd15_inpaint_fp16.safetensors'                   # コントロールネット・モデル
model_base_path = '/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors'         # ベースモデル

image_path = "images/sd_038_test.png"                           # 元画像
mask_path = "images/sd_038_test_mask.png"                       # マスク

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# プロンプト
trans = Translator('en','ja').translate
args = sys.argv
prompt_jp = '微笑んでいる女性' if len(args) <= 1 else args[1]   # プロンプト
prompt = trans(prompt_jp)

src_image = load_image(image_path).resize((512, 512))           # 元画像
msk_image = load_image(mask_path).resize((512, 512))            # マスク画像
img_ctrl = make_inpaint_condition(src_image,msk_image)          # コントロール画像

print(f'Seed: {seed}, Model: {model_path}')
print(f'base Model: {model_base_path}')
print(f'source_image: {image_path}')
print(f'mask_image: {mask_path}')
print(f'prompt : {prompt_jp} → {prompt}')
print('** strength 0.1 ～ 1.0 **')


# 複数画像を生成
plt.figure(figsize=[6, 15.5], dpi = 100)
for i in range(10):
    strength = 0.1 + i * 0.1
    img = image_generation(strength)
    plt.subplot(5, 2, i + 1, title = 'strength = %.1f'%strength)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_045.png')
plt.close()

↑

忘備録 †

↑

更新履歴 †

2025/07/05 初版

↑

参考資料 †

Stable Diffusion

書籍など
- 日経ソフトウエア 2025年7月号「ローカル生成AIプログラミング」
- Interface 2025年3月号「画像による異常検出＆ローカルLLM作り - 仕事のための生成AI」

プロンプト	雪の中の場面にする	春の場面にする	夏の場面にする	秋の場面にする	冬の場面にする
SD1.5
SDXL