AI_Program3 のバックアップ(No.16)

私的AI研究会 > AI_Program3

生成 AI プログラミング３ == 編集中 == †

　これまで検証してきた結果をもとに、Python で生成 AI プログラムを書く

▲　目　次

生成 AI プログラミング３ == 編集中 ==
参考資料

※ 最終更新:2025/07/15　

↑

diffusersではじめめる Stable Diffusion （応用編２） †

　画像から画像を生成する　instruct-pix2pix と controlnet instruct-pix2pix

　参考サイト：instruct-pix2pixで画像を指示した通り変更したり

↑

概要 †

この章で作成するプログラム一覧と実行速度の目安

Step		プログラム	GPU					CPU
Step		プログラム	RTX 4070Ti	RTX 4060	RTX 4060L	RTX 3050	GTX 1050	i7-1260P
40	「instruct-pix2pix」で画像を変換	sd_040.py	00:03		00:08		00:50	05:32
40	「instruct-pix2pix」で画像を変換	sd_040a.py	00:08		00:31		18:19	24:11
41	image_guidance_scale パラメータによる変化	sd_041.py	00:12		00:24		04:52	14:23
41	image_guidance_scale パラメータによる変化	sd_041a.py	00:42		02:00		02:40:30	03:38:17
42	「controlnet instruct-pix2pix」で画像を変換	sd_042.py	00:02		00:14		00:54	06:30
43	controlnet_conditioning_scale パラメータによる変化	sd_043.py	00:06		00:24		04:56	17:01
44	「controlnet inpaint」で画像の一部を変換	sd_044.py	00:01		00:10		00:45	05:17
45	strength パラメータによる変化	sd_045.py	00:05		00:15		03:53	12:12
46	「outpaint」画像の外側を書き加える	sd_046.py	00:01		00:12		00:45	05:15
47	「controlnet scribble」手描きの線画から画像を生成	sd_047.py	00:01		00:12		00:53	05:36
48	「controlnet openpose」画像から同じ姿勢の画像を生成	sd_048.py	00:02				01:17

　・単位　（時：）分：秒

instruct-pix2pix と controlnet instruct-pix2pix の違い

名称	機能	処理内容	プロンプトの書き方	モデルの場所
instruct-pix2pix	元画像をから新しい画像を作る	指示された内容との関係がある部分だけ変えられる	「これに変えたい」と書く	【SD1.5】instruct-pix2pix
instruct-pix2pix	元画像をから新しい画像を作る	指示された内容との関係がある部分だけ変えられる	「これに変えたい」と書く	【SDXL】sdxl-instructpix2pix-768
controlnet instruct-pix2pix	元画像を改造する	元画像全体を変えられる	欲しい結果画像の姿を描写する	【SD1.5】control_v11e_sd15_ip2p

・「instruct-pix2pix」は SD1.5/SDXL それぞれ専用のモデルで動作する
・「controlnet instruct-pix2pix」の場合はコントロールネットのモデルとベース・モデルが必要

↑

動作環境 †

このプロジェクトは以下の Anaconda 仮想環境とプロジェクト・フォルダで動作する
```
(base) PS > conda activate sd_test
(sd_test) PS > cd workspace_3/sd_test
```

↑

Step 40：「instruct-pix2pix」で画像を変換する †

このプログラムは「SD1.5」「SDXL」モデルで動作する
・モデルは --model_dir パラメータで指定したフォルダに配置する
・モデル名は --model_path パラメータで指定する

プログラムを実行する　　SD1.5モデル　　（実行時間：約 3秒 RTX 4070 Ti 12GB）

 python sd_040.py

　生成画像（左） image_040.png　元になる画像（右） sd_040_test.png →

(sd_test) > python sd_040.py

Stable Diffusion with diffusers(040)  Ver 0.01: Starting application...

 --result_image             :   results/image_040.png
 --cpu                      :   False
 --log                      :   3
 --model_path               :   timbrooks/instruct-pix2pix
 --image_path               :   images/sd_040_test.png
 --max_size                 :   0
 --prompt                   :   雪の中の場面にする
 --seed                     :   0
 --width                    :   512
 --height                   :   512
 --step                     :   20
 --scale                    :   7.0
 --image_scale              :   1.5

prompt: Make it a scene in the snow
width: 512, height: 512
seed: 0
Loading pipeline components...: 100%|████████████| 7/7 [00:02<00:00,  2.97it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 16.75it/s]
result_file: results/image_040.png

Finished.

画像ファイル「image_040.png」が生成される

プログラムを実行する　　SDXLモデル　　（実行時間：約 8秒 RTX 4070 Ti 12GB）

 python sd_040.py --model_path 'diffusers/sdxl-instructpix2pix-768' --width 768 --height 768 --result_image 'results/image_040a.png'

　生成画像 image_040a.png　元になる画像は同じ sd_040_test.png →

(sd_test) PS > python sd_040.py --model_path 'diffusers/sdxl-instructpix2pix-768' --width 768 --height 768 --result_image 'results/image_040a.png'

Stable Diffusion with diffusers(040)  Ver 0.01: Starting application...

 --result_image             :   results/image_040a.png
 --cpu                      :   False
 --log                      :   3
 --model_path               :   diffusers/sdxl-instructpix2pix-768
 --image_path               :   images/sd_040_test.png
 --max_size                 :   0
 --prompt                   :   雪の中の場面にする
 --seed                     :   0
 --width                    :   768
 --height                   :   768
 --step                     :   20
 --scale                    :   7.0
 --image_scale              :   1.5

prompt: Make it a scene in the snow
width: 768, height: 768
seed: 0
Loading pipeline components...: 100%|████████████| 7/7 [00:04<00:00,  1.67it/s]
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  5.27it/s]
result_file: results/image_040a.png

Finished.

画像ファイル「image_040a.png」が生成される

SD1.5 / SDXL モデルによる生成画像の比較

プロンプト雪の中の場面にする春の場面にする夏の場面にする秋の場面にする冬の場面にする

SD1.5

SDXL

SD1.5 / SDXL モデルパイプラインを作成するオブジェクトの違い

モデルの種類基本画像サイズパイプライン作成オブジェクト

SD1.5 512x512 StableDiffusionInstructPix2PixPipeline

SDXL 768x768 StableDiffusionXLInstructPix2PixPipeline

モデルの種類	基本画像サイズ	パイプライン作成オブジェクト
SD1.5	512x512	StableDiffusionInstructPix2PixPipeline
SDXL	768x768	StableDiffusionXLInstructPix2PixPipeline

SDXL版留意点
・元画像のイメージオブジェクトは PILイメージとは異なるようで、サンプルコードにある diffusers.utils.load_image() で作成
・生成サイズは 768x768 固定のようなのでプログラム内部でこのサイズにリサイズしたものを元画像とする

モジュール・ソースコード

▼「sd_040.py」

▲「sd_040.py」
　※ 上記ソースコードは表示の都合上、半角コード '}' が全角 '｝'になっていることに注意

↑

Step 41：「instruct-pix2pix」image_guidance_scale パラメータによる変化をみる †

このプログラムは「SD1.5」「SDXL」モデルで動作する
image_guidance_scale
・画像をどれくらい変えるかを決めるパラメータ
・1 以上を設定（初期値：1.5）

プログラムを実行する　　SD1.5モデル　　（実行時間：約 12秒 RTX 4070 Ti 12GB）

 python sd_041.py

(sd_test) PS > python sd_041.py

Stable Diffusion with diffusers(041)  Ver 0.01: Starting application...

 --result_image             :   results/image_041.png
 --cpu                      :   False
 --log                      :   3
 --model_path               :   timbrooks/instruct-pix2pix
 --image_path               :   images/sd_040_test.png
 --max_size                 :   0
 --prompt                   :   雪の中の場面にする
 --seed                     :   0
 --width                    :   512
 --height                   :   512
 --step                     :   20
 --scale                    :   7.0
 --image_scale              :   1.5

prompt: Make it a scene in the snow
width: 512, height: 512
seed: 0
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  4.87it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 18.16it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.65it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 18.43it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.70it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 18.44it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.35it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 16.70it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.65it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 18.42it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  5.66it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 18.43it/s]
result_file: results/image_041.png

Finished.

画像ファイル「image_041.png」が生成される

プログラムを実行する　　SDXLモデル　　（実行時間：約 42秒 RTX 4070 Ti 12GB）

 python sd_041.py --model_path 'diffusers/sdxl-instructpix2pix-768' --width 768 --height 768 --result_image 'results/image_041a.png'

(sd_test) PS > python sd_041.py --model_path 'diffusers/sdxl-instructpix2pix-768' --width 768 --height 768 --result_image 'results/image_041a.png'

Stable Diffusion with diffusers(041)  Ver 0.01: Starting application...

 --result_image             :   results/image_041a.png
 --cpu                      :   False
 --log                      :   3
 --model_path               :   diffusers/sdxl-instructpix2pix-768
 --image_path               :   images/sd_040_test.png
 --max_size                 :   0
 --prompt                   :   雪の中の場面にする
 --seed                     :   0
 --width                    :   768
 --height                   :   768
 --step                     :   20
 --scale                    :   7.0
 --image_scale              :   1.5

prompt: Make it a scene in the snow
width: 768, height: 768
seed: 0
Loading pipeline components...: 100%|█████████████| 7/7 [00:03<00:00,  1.76it/s]
100%|███████████████████████████████████████████| 20/20 [00:03<00:00,  5.36it/s]
Loading pipeline components...: 100%|█████████████| 7/7 [00:04<00:00,  1.70it/s]
100%|███████████████████████████████████████████| 20/20 [00:03<00:00,  5.45it/s]
Loading pipeline components...: 100%|█████████████| 7/7 [00:04<00:00,  1.73it/s]
100%|███████████████████████████████████████████| 20/20 [00:03<00:00,  5.44it/s]
Loading pipeline components...: 100%|█████████████| 7/7 [00:04<00:00,  1.66it/s]
100%|███████████████████████████████████████████| 20/20 [00:03<00:00,  5.45it/s]
Loading pipeline components...: 100%|█████████████| 7/7 [00:03<00:00,  1.80it/s]
100%|███████████████████████████████████████████| 20/20 [00:03<00:00,  5.44it/s]
Loading pipeline components...: 100%|█████████████| 7/7 [00:03<00:00,  1.78it/s]
100%|███████████████████████████████████████████| 20/20 [00:03<00:00,  5.45it/s]
result_file: results/image_041a.png

Finished.

画像ファイル「image_041a.png」が生成される

モジュール・ソースコード

▼「sd_041.py」

▲「sd_041.py」
　※ 上記ソースコードは表示の都合上、半角コード '}' が全角 '｝'になっていることに注意

↑

Step 42：「controlnet instruct-pix2pix」で画像を変換する †

プログラムを実行する（実行時間：約 2秒 RTX 4070 Ti 12GB）

 python sd_042.py

　生成画像（左） image_042.png　元になる画像（右） sd_040_test.png →

(sd_test) PS D:\anaconda_win\workspace_3\sd_test> python sd_042.py

Stable Diffusion with diffusers(042)  Ver 0.01: Starting application...

 --result_image             :   results/image_042.png
 --cpu                      :   False
 --log                      :   3
 --model_dir                :   /StabilityMatrix/Data/Models/StableDiffusion
 --model_path               :   SD1.5/beautifulRealistic_brav5.safetensors
 --ctrl_model_dir           :   /StabilityMatrix/Data/Models/ControlNet
 --ctrl_model_path          :   control_v11e_sd15_ip2p_fp16.safetensors
 --image_path               :   images/sd_040_test.png
 --max_size                 :   0
 --prompt                   :   浜辺の場面にする
 --seed                     :   12345678
 --width                    :   512
 --height                   :   512
 --step                     :   20
 --scale                    :   7.0
 --cc_scale                 :   1.0

prompt: Set the scene on the beach
width: 512, height: 512
seed: 12345678
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 17.47it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 10.55it/s]
result_file: results/image_042.png

Finished.

画像ファイル「image_042.png」が生成される

プロンプトを変えて生成する
・「python sd_042.py --prompt 'プロンプト'」

 python sd_042.py --prompt '雪の中の場面にする'

・ベースモデル「beautifulRealistic_brav5.safetensors（リアル系）」

浜辺の場面にする	雪の中の場面にする	炎の中の場面にする	森の中の場面にする	山中の場面にする	砂漠の場面にする

着物姿に着替える	イラスト画像にする	アニメ画像にする	微笑んだ顔のアニメ画像	泣き顔のアニメ画像にする	嬉しそうな顔のアニメ画像

・ベースモデル「animePastelDream_softBakedVae.safetensors（イラスト系）」

浜辺の場面にする	雪の中の場面にする	炎の中の場面にする	森の中の場面にする	山中の場面にする	砂漠の場面にする

着物姿に着替える	イラスト画像にする	アニメ画像にする	微笑んだ顔のアニメ画像	泣き顔のアニメ画像にする	嬉しそうな顔のアニメ画像

モジュール・ソースコード

▼「sd_042.py」

▲「sd_042.py」
　※ 上記ソースコードは表示の都合上、半角コード '}' が全角 '｝'になっていることに注意

↑

Step 43：「controlnet instruct-pix2pix」controlnet_conditioning_scale パラメータによる変化をみる †

controlnet_conditioning_scale
・コントロール画像の影響の重みを決めるパラメータ
・既定値は最大値の 1（1より小さい値にしたら入力画像の影響が薄くなる）

プログラムを実行する（実行時間：約 6秒 RTX 4070 Ti 12GB）

 python sd_043.py

(sd_test) PS D:\anaconda_win\workspace_3\sd_test> python sd_043.py

Stable Diffusion with diffusers(043)  Ver 0.01: Starting application...

 --result_image             :   results/image_043.png
 --cpu                      :   False
 --log                      :   3
 --model_dir                :   /StabilityMatrix/Data/Models/StableDiffusion
 --model_path               :   SD1.5/beautifulRealistic_brav5.safetensors
 --ctrl_model_dir           :   /StabilityMatrix/Data/Models/ControlNet
 --ctrl_model_path          :   control_v11e_sd15_ip2p_fp16.safetensors
 --image_path               :   images/sd_040_test.png
 --max_size                 :   0
 --prompt                   :   浜辺の場面にする
 --seed                     :   12345678
 --width                    :   512
 --height                   :   512
 --step                     :   20
 --scale                    :   7.0
 --cc_scale                 :   1.0

prompt: Set the scene on the beach
width: 512, height: 512
seed: 12345678
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 30.96it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 14.83it/s]
Fetching 11 files: 100%|█████████████████████| 11/11 [00:00<00:00, 7498.35it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.70it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 17.03it/s]
Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 11016.56it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.90it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 16.16it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 21.80it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 16.25it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.68it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 16.56it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.80it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 16.98it/s]
result_file: results/image_043.png

Finished.

画像ファイル「image_043.png」が生成される
モジュール・ソースコード

▼「sd_043.py」

▲「sd_043.py」
　※ 上記ソースコードは表示の都合上、半角コード '}' が全角 '｝'になっていることに注意

↑

Step 44：「controlnet inpaint」で画像の一部を変換する †

画像の一部を修正する「inpaint」機能は「diffusers」は次の 2つが用意されている
① 従来のinpaint → Step 38：特定の部分だけ修正（inpaint）
② controlnet inpaint → Step 44, 45
マスク画像が必要（左） sd_038_test_mask.png　元画像（右） sd_038_test.png →

使用するパイプライン・オブジェクトの違い

種類	パイプライン作成オブジェクト
従来のinpaint	StableDiffusionInpaintPipeline
controlnet inpaint	StableDiffusionControlNetInpaintPipeline
controlnet (参考)	StableDiffusionControlNetPipeline

プログラムを実行する（実行時間：約 1秒 RTX 4070 Ti 12GB）

 python sd_044.py

　マスク画像（左） sd_038_test_mask.png　元画像（右） sd_038_test.png →

(sd_test) PS > python sd_044.py

Stable Diffusion with diffusers(044)  Ver 0.01: Starting application...

 --result_image             :   results/image_044.png
 --cpu                      :   False
 --log                      :   3
 --model_dir                :   /StabilityMatrix/Data/Models/StableDiffusion
 --model_path               :   SD1.5/beautifulRealistic_brav5.safetensors
 --ctrl_model_dir           :   /StabilityMatrix/Data/Models/ControlNet
 --ctrl_model_path          :   control_v11p_sd15_inpaint_fp16.safetensors
 --image_path               :   images/sd_038_test.png
 --ctrl_image_path          :   images/sd_038_test_mask.png
 --max_size                 :   0
 --prompt                   :   微笑んでいる女性
 --seed                     :   12345678
 --width                    :   512
 --height                   :   512
 --step                     :   20
 --scale                    :   7.0
 --cc_scale                 :   1.0
 --strength                 :   0.6

prompt: Woman smiling
width: 512, height: 512
seed: 12345678
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 16.91it/s]
100%|██████████████████████████████████████████| 12/12 [00:01<00:00, 10.44it/s]
result_file: results/image_044.png

Finished.

画像ファイル「image_044.png」が生成される

プロンプトを変えて生成する
・「python sd_044.py --prompt 'プロンプト'」

 python sd_044.py --prompt '見つめている女性'

・元画像

・ベースモデル「beautifulRealistic_brav5.safetensors（リアル系）」

微笑んでいる女性	泣いている女性	怒っている女性	照れている女性	見つめている女性	笑っている女性

目を瞑っている女性	ウィンクしている女性	苛立っている女性	怖がっている女性	驚いている女性	疲れている女性

・ベースモデル「animePastelDream_softBakedVae.safetensors（イラスト系）」

微笑んでいる女性	泣いている女性	怒っている女性	照れている女性	見つめている女性	笑っている女性



目を瞑っている女性	ウィンクしている女性	苛立っている女性	怖がっている女性	驚いている女性	疲れている女性

モジュール・ソースコード

▼「sd_044.py」

▲「sd_044.py」
　※ 上記ソースコードは表示の都合上、半角コード '}' が全角 '｝'になっていることに注意

↑

Step 45：「controlnet inpaint」strength パラメータによる変化をみる †

strength
・どれくらいその部分を変更するかを決める数値
・既定値は 1（完全に新しいものに入れ替える）
・0 ～ 1 の値を指定して元の画像の形を保つ程度を決めることができる

プログラムを実行する（実行時間：約 5秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_045.py
Seed: 12345678, Model: /StabilityMatrix/Data/Models/ControlNet/control_v11p_sd15_inpaint_fp16.safetensors
base Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
source_image: images/sd_038_test.png
mask_image: images/sd_038_test_mask.png
prompt : 微笑んでいる女性 → Woman smiling
** strength 0.1 ～ 1.0 **
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 22.39it/s]
100%|█████████████████████████████████████████████| 2/2 [00:00<00:00, 10.33it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 33.31it/s]
100%|█████████████████████████████████████████████| 4/4 [00:00<00:00, 16.52it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 34.29it/s]
100%|█████████████████████████████████████████████| 6/6 [00:00<00:00, 15.99it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 22.93it/s]
100%|█████████████████████████████████████████████| 8/8 [00:00<00:00, 15.01it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 33.96it/s]
100%|███████████████████████████████████████████| 10/10 [00:00<00:00, 16.53it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 33.38it/s]
100%|███████████████████████████████████████████| 12/12 [00:00<00:00, 16.25it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 32.07it/s]
100%|███████████████████████████████████████████| 14/14 [00:00<00:00, 15.34it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 33.52it/s]
100%|███████████████████████████████████████████| 16/16 [00:01<00:00, 15.45it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 33.84it/s]
100%|███████████████████████████████████████████| 18/18 [00:01<00:00, 16.50it/s]
Fetching 11 files: 100%|████████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 33.51it/s]
100%|███████████████████████████████████████████| 20/20 [00:01<00:00, 16.35it/s]

画像ファイル「image_045.png」が生成される

モジュール・ソースコード

▼「sd_045.py」

## sd_045.py【SD1.5】　画像から画像生成（controlnet inpaint）サンプル・ソースコード
## === strengthを調べる ===
##      https://qiita.com/phyblas/items/7cacb9297650afd63d34
##      https://zako-lab929.hatenablog.com/entry/20240212/1707743575
##      Ver. 0.00   2025/07/09
##
##      command:    python sd_045.py [プロンプト]
##
##       プロンプト     '微笑んでいる女性'（デフォールト）
##                      '泣いている女性'
##                      '怒っている女性'
##                      '照れている女性'
##                      '見つめている女性'
##                      '笑っている女性'
##                      '目を瞑っている女性'
##                      'ウィンクしている女性'
##                      '苛立っている女性'
##                      '怖がっている女性'
##                      '驚いている女性'
##                      '疲れている女性'
##
##      model:          control_v11p_sd15_inpaint_fp16.safetensors
##      base model:     beautifulRealistic_brav5.safetensors        （リアル系）
##                      animePastelDream_softBakedVae.safetensors   （イラスト系）
##
##      元画像:         images/sd_038_test.png
##                      images/sd_044_test1.png
##                      images/sd_044_test2.png
##                      images/sd_044_test3.png
##      マスク画像:     images/sd_038_test_mask.png
##                      images/sd_044_test1_mask.png
##                      images/sd_044_test2_mask.png
##                      images/sd_044_test3_mask.png

import torch
from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel, EulerAncestralDiscreteScheduler, logging
from diffusers.utils import load_image
from translate import Translator
import numpy as np
import matplotlib.pyplot as plt
import sys

logging.set_verbosity_error()

# コントロールイメージを作成するメソッド
def make_inpaint_condition(image, image_mask):
    image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
    image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0

    assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
    image[image_mask > 0.5] = -1.0  # set as masked pixel
    image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return image

# 画像生成
def image_generation(strength):
    # パイプラインを作成
    if device == 'cpu':
        controlnet = ControlNetModel.from_single_file(model_path).to(device)
        pipeline = StableDiffusionControlNetInpaintPipeline.from_single_file(model_base_path, controlnet=controlnet).to(device)
    else:
        controlnet = ControlNetModel.from_single_file(model_path, torch_dtype=torch.float16).to(device)
        pipeline = StableDiffusionControlNetInpaintPipeline.from_single_file(
                    model_base_path,
                    controlnet=controlnet,
                    torch_dtype = torch.float16,
                    ).to(device)

    # スケジューラー
    pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    mask_image = msk_image,
                    control_image=img_ctrl,
                    num_inference_steps = 20,
                    strength=strength,
                    generator = generator
                    ).images[0]
    return image


# フォルダーのパス
model_path = '/StabilityMatrix/Data/Models/ControlNet/control_v11p_sd15_inpaint_fp16.safetensors'                   # コントロールネット・モデル
model_base_path = '/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors'         # ベースモデル

image_path = "images/sd_038_test.png"                           # 元画像
mask_path = "images/sd_038_test_mask.png"                       # マスク

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# プロンプト
trans = Translator('en','ja').translate
args = sys.argv
prompt_jp = '微笑んでいる女性' if len(args) <= 1 else args[1]   # プロンプト
prompt = trans(prompt_jp)

src_image = load_image(image_path).resize((512, 512))           # 元画像
msk_image = load_image(mask_path).resize((512, 512))            # マスク画像
img_ctrl = make_inpaint_condition(src_image,msk_image)          # コントロール画像

print(f'Seed: {seed}, Model: {model_path}')
print(f'base Model: {model_base_path}')
print(f'source_image: {image_path}')
print(f'mask_image: {mask_path}')
print(f'prompt : {prompt_jp} → {prompt}')
print('** strength 0.1 ～ 1.0 **')


# 複数画像を生成
plt.figure(figsize=[6, 15.5], dpi = 100)
for i in range(10):
    strength = 0.1 + i * 0.1
    img = image_generation(strength)
    plt.subplot(5, 2, i + 1, title = 'strength = %.1f'%strength)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_045.png')
plt.close()

↑

Step 46：「outpaint」画像の外側を書き加える †

「controlnet inpaint」を利用して画像の外側を修正する「outpaint」機能を実現する
≪処理の概要≫
① 縦長の元画像を用意する
② 画像を正方形にして左右を黒で埋める
③ 画像の部分を黒（元画像より左右の領域を小さくする）、残りを白のマスク画像を作成する
　②③ はプログラム内で処理され 512x512 のソース画像とマスク画像が準備される
④ Step 44「controlnet inpaint」の機能で左右を生成する

プログラムを実行する（実行時間：約 1秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_046.py
Fetching 11 files: 100%|█████████████████████| 11/11 [00:00<00:00, 11069.42it/s]
Loading pipeline components...: 100%|█████████████| 6/6 [00:00<00:00, 14.91it/s]
Seed: 12345678, Model: /StabilityMatrix/Data/Models/ControlNet/control_v11p_sd15_inpaint_fp16.safetensors
base Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
source_image: images/sd_046_test_src.png
mask_image: images/sd_046_test_msk.png
prompt : 庭に立って微笑んでいる女性 → Woman standing in a garden smiling
100%|███████████████████████████████████████████| 20/20 [00:01<00:00, 11.12it/s]

画像ファイル「image_046.png」が生成される

モジュール・ソースコード

▼「sd_045.py」

## sd_046.py【SD1.5】　画像から画像生成（outpaint）サンプル・ソースコード
##      https://qiita.com/phyblas/items/7cacb9297650afd63d34
##      https://zako-lab929.hatenablog.com/entry/20240212/1707743575
##      Ver. 0.00   2025/07/09
##
##      command:    python sd_044.py [プロンプト]
##
##       プロンプト     '庭に立って微笑んでいる女性'（デフォールト）
##
##      model:          control_v11p_sd15_inpaint_fp16.safetensors
##      base model:     beautifulRealistic_brav5.safetensors        （リアル系）
##
##      元画像:         images/sd_046_test.png

import numpy as np
import cv2
import os
import my_imagetool

size = 512
src_path = 'images/sd_046_test.png'

# マスク作成
def mask_square(image):
    img_h, img_w = image.shape[:2]
    x0 = 0
    y0 = 0
    x1 = 0
    y1 = 0
    if img_h > img_w:
        size = img_h
        x0 = int((size - img_w) / 2)
        x1 = x0 + img_w
        y1 = size
    else:
        size = img_w
        y0 = int((size - img_h) / 2)
        y1 = y0 + img_h
        x1 = size

    # 白ベースの画像を生成
    dist = np.array([size, size, 1])                                        # 縦×横 3チャンネル
    img = np.full(dist, 255, dtype=np.uint8)
    img[y0:y1, x0 + 16:x1 - 32] = 0                                         # 中央部分を黒（左右 16ピクセルづつ狭く）
    return img

s = os.path.splitext(src_path)
image_path = s[0] + '_src' + s[1]
mask_path = s[0] + '_msk' + s[1]

winname = image_path
img = cv2.imread(src_path)
msk = mask_square(img)
msk = my_imagetool.frame_resize(msk, size)
my_imagetool.image_disp(msk, winname, False, mask_path)                     # ソース画像保存
img = my_imagetool.frame_square(img, (0, 0, 0))
img = my_imagetool.frame_resize(img, size)
my_imagetool.image_disp(img, winname, False, image_path)                    # マスク画像保存


## 以下 sd.044.py と同じコード

import torch
from diffusers import StableDiffusionControlNetInpaintPipeline, ControlNetModel, EulerAncestralDiscreteScheduler, logging
from diffusers.utils import load_image
from translate import Translator
import numpy as np
import sys

logging.set_verbosity_error()

# コントロールイメージを作成するメソッド
def make_inpaint_condition(image, image_mask):
    image = np.array(image.convert("RGB")).astype(np.float32) / 255.0
    image_mask = np.array(image_mask.convert("L")).astype(np.float32) / 255.0

    assert image.shape[0:1] == image_mask.shape[0:1], "image and image_mask must have the same image size"
    image[image_mask > 0.5] = -1.0  # set as masked pixel
    image = np.expand_dims(image, 0).transpose(0, 3, 1, 2)
    image = torch.from_numpy(image)
    return image

# フォルダーのパス
model_path = '/StabilityMatrix/Data/Models/ControlNet/control_v11p_sd15_inpaint_fp16.safetensors'                   # コントロールネット・モデル
model_base_path = '/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors'         # ベースモデル

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# パイプラインを作成
if device == 'cpu':
    controlnet = ControlNetModel.from_single_file(model_path).to(device)
    pipeline = StableDiffusionControlNetInpaintPipeline.from_single_file(model_base_path, controlnet=controlnet).to(device)
else:
    controlnet = ControlNetModel.from_single_file(model_path, torch_dtype=torch.float16).to(device)
    pipeline = StableDiffusionControlNetInpaintPipeline.from_single_file(
                    model_base_path,
                    controlnet=controlnet,
                    torch_dtype = torch.float16,
                    ).to(device)

# スケジューラー
pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)

# プロンプト
trans = Translator('en','ja').translate
args = sys.argv
prompt_jp = '庭に立って微笑んでいる女性' if len(args) <= 1 else args[1]     # プロンプト
prompt = trans(prompt_jp)

src_image = load_image(image_path).resize((512, 512))                       # 元画像
msk_image = load_image(mask_path).resize((512, 512))                        # マスク画像
img_ctrl = make_inpaint_condition(src_image,msk_image)                      # コントロール画像

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'base Model: {model_base_path}')
print(f'source_image: {image_path}')
print(f'mask_image: {mask_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    mask_image = msk_image,
                    control_image=img_ctrl,
                    num_inference_steps = 20,
                    generator = generator
                    ).images[0]

image.save("results/image_046.png")                                         # 生成画像

↑

Step 47：「controlnet scribble」手描きの線画から画像を生成 †

手描きしたイラストからテキスト（プロンプト）で画像を生成する

「sd_047.py」　　線画イラスト（左） sd_047.png　生成画像（右） image_047.png →

## sd_047.py【SD1.5】　手描きの線画から画像を生成（ControlNet scribble）サンプル・ソースコード
##      https://blog.mindboardapps.com/posts/stable-diffusion-and-control-net-img2img/
##      Ver. 0.00   2025/07/10
##
##      command:    python sd_047.py [プロンプト]
##
##       プロンプト     'テーブル上の白いコーヒーカップ'（デフォールト）
##                      '木製のテーブルの上に置かれた白いコーヒーカップ'
##                      'ビーチに置かれたオレンジ色のコーヒーカップ'
##
##      model:          control_v11p_sd15_scribble_fp16.safetensors
##      base model:     v1-5-pruned-emaonly.safetensors
##
##      線画画像:       images/sd_047.png
##                      images/sd_047_1.png
##                      images/sd_047_2.png

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, logging
from diffusers.utils import load_image
from translate import Translator
import numpy as np
import sys

logging.set_verbosity_error()

# フォルダーのパス
model_path = '/StabilityMatrix/Data/Models/ControlNet/control_v11p_sd15_scribble_fp16.safetensors'      # コントロールネット・モデル
model_base_path = '/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors'  # ベースモデル

image_path = 'images/sd_047.png'

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# パイプラインを作成
if device == 'cpu':
    controlnet = ControlNetModel.from_single_file(model_path).to(device)
    pipeline = StableDiffusionControlNetPipeline.from_single_file(model_base_path, controlnet=controlnet).to(device)
else:
    controlnet = ControlNetModel.from_single_file(model_path, torch_dtype=torch.float16).to(device)
    pipeline = StableDiffusionControlNetPipeline.from_single_file(
                    model_base_path,
                    controlnet=controlnet,
                    torch_dtype = torch.float16,
                    ).to(device)

# プロンプト
trans = Translator('en','ja').translate
args = sys.argv
prompt_jp = 'テーブル上の白いコーヒーカップ' if len(args) <= 1 else args[1]     # プロンプト
prompt = trans(prompt_jp)

src_image = load_image(image_path).resize((512, 512))                           # 線画画像

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'base Model: {model_base_path}')
print(f'source_image: {image_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    image = src_image,
                    num_inference_steps = 20,
                    generator = generator
                    ).images[0]

image.save("results/image_047.png")                                             # 生成画像

プログラムを実行する（実行時間：約 1秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_047.py
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:01<00:00,  5.68it/s]
Seed: 12345678, Model: /StabilityMatrix/Data/Models/ControlNet/control_v11p_sd15_scribble_fp16.safetensors
base Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors
source_image: images/sd_047.png
prompt : テーブル上の白いコーヒーカップ → White coffee cup on the table
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 10.10it/s]

画像ファイル「image_047.png」が生成される

プロンプトを変えて生成する
・「python sd_047.py ['プロンプト']」

(sd_test) PS > python sd_047.py '木製のテーブルの上に置かれた白いコーヒーカップ'

・ベースモデル「v1-5-pruned-emaonly.safetensors

線画イラスト	テーブル上の白いコーヒーカップ	木製のテーブルの上に置かれた白いコーヒーカップ	ビーチに置かれたオレンジ色のコーヒーカップ

↑

Step 48：「controlnet openpose」画像から同じ姿勢の画像を生成 †

元画像から姿勢を推定してテキスト（プロンプト）で同じ姿勢の画像を生成する

「sd_048.py」　　推定された姿勢（左） sd_048_test1_pose.png　元画像（右） sd_048_test1.png →

## sd_048_.py【SD1.5】　画像から画像を生成（ControlNet openpose）サンプル・ソースコード
##      https://note.com/npaka/n/n06b9ca7994a4
##      https://huggingface.co/lllyasviel/control_v11p_sd15_openpose
##      Ver. 0.00   2025/07/11
##
##      command:    python sd_047.py [シード値（-1 = ランダム生成）]
##
##      promp@t     'ダンスを踊る女性'（デフォールト）
##
##      model:      control_v11p_sd15_openpose_fp16.safetensors
##      base model: beautifulRealistic_brav5.safetensors        （リアル系）
##                  animePastelDream_softBakedVae.safetensors   （イラスト系）
##
##      元画像:     images/sd_048_test1.png
##                  images/sd_048_test2.png
##                  images/sd_048_test3.png

import warnings
warnings.simplefilter('ignore')

import torch
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, logging
from diffusers.utils import load_image
from translate import Translator
from controlnet_aux import OpenposeDetector

import numpy as np
import sys
import os
import random

logging.set_verbosity_error()

# シード値を得る
def _get_seed_value(n):
    seed = int(n)
    if seed == -1:                                              # ランダムなシード値を決める
        seed = random.randint(0, 2**32-1)
    return seed

# フォルダーのパス
model_path = '/StabilityMatrix/Data/Models/ControlNet/control_v11p_sd15_openpose_fp16.safetensors'  # コントロールネット・モデル
#model_base_path = '/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/animePastelDream_softBakedVae.safetensors'  # ベースモデル
model_base_path = '/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors'  # ベースモデル

image_path = 'images/sd_048_test1.png'
s = os.path.splitext(image_path)
pose_path = s[0] + '_pose' + s[1]

src_image = load_image(image_path)                              # 元画像
openpose_detector = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')
openpose_image = openpose_detector(src_image)
openpose_image.save(pose_path)

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
args = sys.argv
n = -1 if len(args) <= 1 else args[1]
seed = _get_seed_value(n)

# パイプラインを作成
if device == 'cpu':
    controlnet = ControlNetModel.from_single_file(model_path).to(device)
    pipeline = StableDiffusionControlNetPipeline.from_single_file(model_base_path, controlnet=controlnet).to(device)
else:
    controlnet = ControlNetModel.from_single_file(model_path, torch_dtype=torch.float16).to(device)
    pipeline = StableDiffusionControlNetPipeline.from_single_file(
                    model_base_path,
                    controlnet=controlnet,
                    torch_dtype = torch.float16,
                    ).to(device)

# プロンプト
trans = Translator('en','ja').translate
prompt_jp = 'ダンスを踊る女性'
prompt = trans(prompt_jp)

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'base Model: {model_base_path}')
print(f'source_image: {image_path}')
print(f'pose_image: {pose_path}')
print(f'prompt : {prompt_jp} → {prompt}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    image = openpose_image,
                    num_inference_steps = 30,
                    generator = generator
                    ).images[0]

save_path = 'results/image_048_' + str(seed) + '.png'
print(f'save_image: {save_path}')
image.save(save_path)                                           # 生成画像

追加のパッケージをインストールする
```
(sd_test) PS > pip install controlnet_aux
```

プログラムを実行する（実行時間：約 2秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_048.py
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 13.38it/s]
Seed: 3510433536, Model: /StabilityMatrix/Data/Models/ControlNet/control_v11p_sd15_openpose_fp16.safetensors
base Model: /StabilityMatrix/Data/Models/StableDiffusion/SD1.5/beautifulRealistic_brav5.safetensors
source_image: images/sd_048_test1.png
pose_image: images/sd_048_test1_pose.png
prompt : ダンスを踊る女性 → Dancing Woman
100%|██████████████████████████████████████████| 30/30 [00:02<00:00, 10.46it/s]
save_image: results/image_048_3510433536.png

画像ファイル「image_048_3510433536.png」が生成される（ファイル名の末尾はシード値）
シード値を指定して生成する
・「python sd_047.py ['シード値（-1 = ランダム生成）']」
```
(sd_test) PS > python sd_048.py 1595966935
```
・ベースモデル「beautifulRealistic_brav5.safetensors（リアル系）」/「animePastelDream_softBakedVae.safetensors（イラスト系）」

元画像推定姿勢生成画像① 生成画像② 生成画像③