AI_Program のバックアップ(No.19)

私的AI研究会 > AI_Program

生成 AI プログラミング == 編集中 == †

　これまで検証してきた結果をもとに、Python で生成 AI プログラムを書く

▲　目　次

生成 AI プログラミング == 編集中 ==
参考資料

※ 最終更新:2025/06/26　

↑

diffusersではじめめる Stable Diffusion （基本編） †

　テキストから画像を生成する　txt2img

　参考サイト：猫耳とdiffusersで始めるStable Diffusion入門

↑

実行速度の目安 †

このプロジェクトで作成するプログラム一覧

Step		プログラム	GPU					CPU
Step		プログラム	RTX 4070Ti	RTX 4060	RTX 4060L	RTX 3050	GTX 1050	i7-1260P
1	一番簡単なテキストからの画像生成	sd_001.py	00:05	00:13	00:18			09:49
2	不要な出力抑制と画像サイズの指定	sd_002.py	00:09	00:22	00:32			15:23
3	半精度にして高速化とメモリー節	sd_003.py	00:04	00:08	00:18			×
4	ステップ数を指定して高速化する	sd_004.py	00:03	00:09	00:14			06:22
5	複数生成１ - 同じ条件で複数生成	sd_005.py	00:12	00:30	00:43			22:19
6	複数生成２ - 複数プロンプトで生成	sd_006.py	00:04	00:10	00:16			07:52
7	複数生成３ - メモリーの開放	sd_007.py	00:12	00:30	00:51			24:14
8	同じ画像を生成してステップ数の変化をみる	sd_008.py	00:10	00:23	00:35			09:23
9	プロンプトの重要度を変える	sd_009.py	00:16	00:38	00:53			25:58
10	CLIPを飛ばす	sd_010.py	00:25	01:00	01:24			48:24
11	スケジューラー（scheduler）を変える	sd_scheduler.py	00:00	00:00	00:00
11	スケジューラー（scheduler）を変える	sd_010.py	00:20	00:58	01:05			45:27
12	日本語でプロンプト入力する	sd_012.py	00:03	00:09	00:14			06:24
13	生成したくないないものを指定	sd_013.py	00:02	00:05	00:09			04:02
20	【SDXL】SDXL モデルを使用する	sd_020.py	00:05	04:15	00:25			×
21	【SDXL】VAE / スケジューラを設定する方法	sd_021.py	00:05	00:26	00:24			×
22	【SDXL】望ましくない結果を避ける	sd_022.py	00:05	03:11	00:20			×
23	【SDXL】LoRA を使う	sd_023.py	00:07	03:36	00:59			×
24	【SDXL】LoRAの比率を設定する（fuse_lora）	sd_024.py	00:32	10:47	01:53			×
30	一番簡単な画像から画像生成	sd_030.py	00:01	00:01	00:10			×
31	変化の強さを調整する（strength）	sd_031.py	00:10	00:12	00:15			×
32	プロンプトの重さ（guidance_scale）	sd_032.py	00:18	00:53	01:06			×
33	【SDXL】モデル合成（refiner）	sd_033.py	06:00	07:03	08:21			×
34	【SDXL】モデル合成（refiner）パラメータを比較	sd_034.py	50:00	66:59	56:26			×
35	潜在空間の変換（latent）	sd_035.py	00:02	00:02	00:27			×
36	元画像を4倍拡大（x4 upscaler）	sd_036.py	00:05	02:07	03:53			×
37	潜在空間で2倍拡大（x2 latent upscaler）	sd_037.py	00:04	00:07	02:42			×
38	特定の部分だけ修正（inpaint）	sd_038.py	00:01	00:55	03:05			×
40	テキストから画像生成（txt2img）」	sd_040.py	00:03	00:08				06:05
41	～コマンドラインからパラメータ入力～	sd_041.py
42	～ GUI プログラム～	sd_042.py

↑

環境構築 †

GPU 動作環境であれば予めドライバ類をインストールしておく → NVIDIA cuda GPU の設定
・「nvidia-smi」で搭載されてる NVIDIAグラフィックボードの詳細情報を表示

PS > nvidia-smi
Thu Jun 12 10:30:34 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 576.52                 Driver Version: 576.52         CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 Ti   WDDM  |   00000000:01:00.0  On |                  N/A |
| 31%   31C    P8              4W /  285W |    1013MiB /  12282MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

最初に用意するフレームワークとライブラリ

ライブラリ名	概要
PyTorch	深層学習向けの機械学習フレームワーク
Transformers	自然言語処理の Transformer 系モデルの学習と推論用のライブラリ
Diffusers	画像生成などに使われる拡散モデルのライブラリ
Accelerrate	PyTorch で分散学習や高速化を簡単にするためのライブラリ
SciPy	数値計算用のライブラリ

「Anaconda」の動作する環境を構築しておく
→ Anaconda 環境構築
「Python」バージョンを指定して仮想環境『sd_test』を構築する

・Python 3.11 で作成する
```
(base) PS > conda create -n sd_test python=3.11 -y
```

仮想環境を有効にする

(base) PS > conda activate sd_test
(sd_test) PS >

環境に合わせた「PyTorch」をインストール
・オフィシャルサイト https://pytorch.org/ を開いてインストールコマンドを取得する →
```
(sd_test) PS > pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
```

その他のパッケージをインストールする

(sd_test) PS > pip install transformers diffusers accelerate scipy

以後、必要になるパッケージはその都度インストールしていく

↑

前提条件 †

プロジェクトで使用するソースコード
・project_sd_test.zip (57.4MB) <sd_test> ※20250625更新
　ソースコード一式とテスト画像がダウンロードできる
・解凍してできる「project_sd_test/」フォルダ内を次のフォルダの下に上書きコピーする
　Windows の場合 →「anaconda_win/」　Linux の場合 → 「~/」

プロジェクトは下記のフォルダ構成で実行する

:\（ドライブ・ルート）
├─anaconda_win/
│  ├─workspace_3/
│  │  ├─sd_test/　　　　　　　　　　　　　　　　　　 ← プロジェクトの実行フォルダ
   :
├─StabilityMatrix/
│  └─Data/
│      ├─Models/
│      │   ├─StableDiffusion/
│      │   │   ├─SD1.5/　　　　　　　　　　　　　　 ← SD1.5 モデルの場所
│      │   │   └─・・・・・・　　　　　　　　　　　 ← SDXL モデルの場所

・「workspace_3/sd_test/」フォルダが配置されているドライブ直下に「StabilityMatrix」フォルダが存在すること
・「StabilityMatrix」内の所定の場所にあるダウンロード済みのモデルを使用する
・モデル配置が異なる場合は以下のプログラムソースの「モデルのフォルダーパス」を変更する必要がある

GPU が使用できるかの確認

(sd_test) PS > python -c 'import torch;print(torch.cuda.is_available())'

↑

Step 1：一番簡単な画像生成プログラム †

「sd_001.py」

## sd_001.py「自然と滝の写真」~

import torch
from diffusers import StableDiffusionPipeline

# モデルのフォルダーのパス
model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# パイプラインを作成
pipeline = StableDiffusionPipeline.from_single_file(model).to(device)

# プロンプト
prompt = "nature and waterfall photography"

# 画像を生成
response = pipeline(prompt=prompt)
image = response.images[0]
image.save("results/image_001.png")

・モデルのパスとデバイスを指定してパイプラインを作成する
・生成画像はリストとして出力されるが、1 枚しかないので.images[0]を付けて取得できる
・画像は PIL のオブジェクトで、save メソッドを使って保存できる

プログラムを実行する（実行時間：約 5秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_001.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00,  8.43it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
100%|██████████████████████████████████████████| 50/50 [00:05<00:00,  8.50it/s]

・警告メッセージ機械翻訳

`safety_checker=None` を渡すことで、<class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> のセーフティチェッカーが無効化されました。Stable Diffusion ライセンスの条件を遵守し、フィルタリングされていない結果を公開サービスやアプリケーションに公開しないでください。diffusers チームと Hugging Face は、セーフティフィルターをすべての公開状況で有効にし、ネットワークの動作分析や結果の監査が必要な場合にのみ無効にすることを強く推奨しています。詳細については、https://github.com/huggingface/diffusers/pull/254 をご覧ください。

画像ファイル「image_001.png」が生成される

↑

Step 2：不要な出力抑制と画像サイズの指定 †

「sd_002.py」

## sd_002.py「自然と滝の写真」（出力メッセージを抑制と画像サイズの指定）~

import torch
from diffusers import StableDiffusionPipeline, logging
logging.set_verbosity_error()                               ## 不要なエラー出力の抑制

# モデルのフォルダーのパス
model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# パイプラインを作成
pipeline = StableDiffusionPipeline.from_single_file(model).to(device)

# プロンプト
prompt = "nature and waterfall photography"

# 画像を生成
response = pipeline(prompt=prompt, width=768, height=512)   ## 出力サイズ 768x512
image = response.images[0]
image.save("results/image_002.png")

プログラムを実行する（実行時間：約 9秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_002.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00,  8.99it/s]
100%|██████████████████████████████████████████| 50/50 [00:09<00:00,  5.54it/s]

画像ファイル「image_002.png」が生成される

何度も表示される忠告メッセージを抑制して、本当に重大なエラーしか出力しないようになる
画像サイズは使用するモデルによって指定する推奨サイズが規定されている

↑

Step 3：半精度にして高速化とメモリー節約（GPU 動作のみ） †

「sd_003.py」

## sd_003.py「自然と滝の写真」（半精度にして高速化とメモリー節約）~

import torch
from diffusers import StableDiffusionPipeline, logging
logging.set_verbosity_error()                               ## 不要なエラー出力の抑制

# モデルのフォルダーのパス
model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# パイプラインを作成
pipeline = StableDiffusionPipeline.from_single_file(
                model,
                torch_dtype=torch.float16
                ).to(device)

# プロンプト
prompt = "nature and waterfall photography"

# 画像を生成
response = pipeline(prompt=prompt, width=768, height=512)   ## 出力サイズ 768x512
image = response.images[0]
image.save("results/image_003.png")

プログラムを実行する（実行時間：約 4秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_003.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:01<00:00,  4.30it/s]
100%|██████████████████████████████████████████| 50/50 [00:03<00:00, 14.21it/s]

画像ファイル「image_003.png」が生成される

生成時間：9秒 → 3秒に短縮された（例：RTX 4070 Ti ）
・torch_dtype を float16（半精度）に指定するとメモリー消費は半分になり生成時間も速くなる
　通常精度：float32
・CPUを使う場合は半精度が使えない
　このプログラムを 'cpu' で動かすと途中で止まる（終了はターミナルを閉じる）

↑

Step 4：ステップ数を指定して高速化する †

「sd_004.py」

## sd_004.py「自然と滝の写真」（ステップ数を指定する）~

import torch
from diffusers import StableDiffusionPipeline, logging
logging.set_verbosity_error()                               ## 不要なエラー出力の抑制

# モデルのフォルダーのパス
model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# パイプラインを作成
pipeline = StableDiffusionPipeline.from_single_file(model).to(device)

# プロンプト
prompt = "nature and waterfall photography"

# 画像を生成
response = pipeline(prompt=prompt,num_inference_steps=20, width=768, height=512)   ## 出力サイズ 768x512
image = response.images[0]
image.save("results/image_004.png")

プログラムを実行する（実行時間：約 3秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_004.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 10.10it/s]
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  5.28it/s]

画像ファイル「image_004.png」が生成される

diffusers の既定値ではステップ数は 50（20～30が妥当?）
生成時間：デフォールトの単精度で 9秒 → 3秒に短縮された（例：RTX 4070 Ti ）
・float16（半精度）にするとさらに速くなる（9秒 → 1秒）

↑

Step 5：複数生成１ - 同じ条件で複数生成 †

「sd_005.py」

## sd_005.py「自然と滝の写真」（同じ条件で複数生成）~

import torch
from diffusers import StableDiffusionPipeline, logging
from diffusers.utils import make_image_grid
logging.set_verbosity_error()                               ## 不要なエラー出力の抑制

# モデルのフォルダーのパス
model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# パイプラインを作成
pipeline = StableDiffusionPipeline.from_single_file(model).to(device)

# プロンプト
prompt = "nature and waterfall photography"

# 画像を生成
response = pipeline(
                prompt=prompt,
                num_inference_steps = 20,
                num_images_per_prompt = 6,
                width = 512,
                height = 512
                ).images
make_image_grid(response,2,3).save('results/image_005.png')

プログラムを実行する（実行時間：約 12秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_005.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 11.17it/s]
100%|██████████████████████████████████████████| 20/20 [00:12<00:00,  1.63it/s]

画像ファイル「image_005.png」が生成される

パイプライン実行時に num_images_per_prompt パラメータを入れると複数画像を生成できる
この方法ではその分メモリーの消耗が激しくなる
diffusers で準備した make_image_grid 関数を使い行と列で画像をグリッドに並べて出力する

↑

Step 6：複数生成２ - 複数プロンプトで生成 †

「sd_006.py」

## sd_006.py「自然と滝の写真/イラスト」（複数プロンプトで生成）~

import torch
from diffusers import StableDiffusionPipeline, logging
from diffusers.utils import make_image_grid
logging.set_verbosity_error()                               ## 不要なエラー出力の抑制

# モデルのフォルダーのパス
model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# パイプラインを作成
pipeline = StableDiffusionPipeline.from_single_file(model).to(device)

# プロンプト
prompt = ["nature and waterfall photography", "nature and waterfall illustration"]

# 画像を生成
response = pipeline(
                prompt=prompt,
                num_inference_steps = 20
                ).images
make_image_grid(response,1,2).save('results/image_006.png')

プログラムを実行する（実行時間：約 4秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_006.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00,  9.67it/s]
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.57it/s]

画像ファイル「image_006.png」が生成される

プロンプトをリストとして入れるとそのリストの数だけ画像を生成する

↑

Step 7：複数生成３ - メモリーの開放（負荷を抑える） †

「sd_007.py」

## sd_007.py「自然と滝の写真」（複数生成 負荷を抑えるメモリーの開放）

import torch
from diffusers import StableDiffusionPipeline, logging
from diffusers.utils import make_image_grid
logging.set_verbosity_error()                               ## 不要なエラー出力の抑制

# 画像生成
def image_generation():
    # パイプラインを作成
    pipeline = StableDiffusionPipeline.from_single_file(model).to(device)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    num_inference_steps = 20
                    ).images[0]
    return img

# モデルのフォルダーのパス
model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# プロンプト
prompt = "nature and waterfall photography"

# 複数画像を生成
images = []
for i in range(6):
    images.append(image_generation())

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

make_image_grid(images,2,3).save('results/image_007.png')

・ループ処理で1回の画像生成が終わるたびにメモリーを開放する
　CUDA では torch.cuda.empty_cache()、MPS では torch.mps.empty_cache() を使用する

プログラムを実行する（実行時間：約 12秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_007.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00,  8.96it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  7.94it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 16.52it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  7.97it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 11.52it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  7.73it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 16.98it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  7.82it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 17.09it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  7.84it/s]
Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 10998.17it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 16.50it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  7.80it/s]

画像ファイル「image_007.png」が生成される

↑

Step 8：同じ画像を生成してステップ数（num_inference_steps）の変化をみる †

「sd_008.py」

## sd_008.py「自然と滝の写真」ステップ数（num_inference_steps）の変化

import torch
from diffusers import StableDiffusionPipeline, logging
from diffusers.utils import make_image_grid
import matplotlib.pyplot as plt

logging.set_verbosity_error()                               ## 不要なエラー出力の抑制

# 画像生成
def image_generation(n_steo):
    # パイプラインを作成
    pipeline = StableDiffusionPipeline.from_single_file(model).to(device)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    num_inference_steps = n_step,
                    generator = generator
                    ).images[0]
    return img

# モデルのフォルダーのパス
model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# プロンプト
prompt = "nature and waterfall photography"

seed = 1234         # seed 固定

# 複数画像を生成
plt.figure(figsize=[9.5, 6], dpi = 100)
for i,n_step in enumerate(range(5, 31, 5)):
    img = image_generation(n_step)
    plt.subplot(2, 3, i + 1, title = "num_inference_steps=%d"%n_step)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_008.png')
plt.close()

・結果を固定したい場合は seed を指定して torch.Generator オブジェクトを作ってパイプラインを実行する時に generator パラメータを指定する
・seed 値は 0 より大きい整数
・seed と他の設定が同じなら全く同じ画像を生成する
・ステップ数（num_inference_steps）を変化して生成画像に与える影響を調べる

足りないパッケージをインストールする
```
(sd_test) PS > pip install matplotlib
```

プログラムを実行する（実行時間：約 10秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_008.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 14.85it/s]
100%|████████████████████████████████████████████| 5/5 [00:00<00:00,  6.56it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.26it/s]
100%|██████████████████████████████████████████| 10/10 [00:01<00:00,  8.18it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 35.28it/s]
100%|██████████████████████████████████████████| 15/15 [00:01<00:00,  8.16it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.28it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.29it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 35.31it/s]
100%|██████████████████████████████████████████| 25/25 [00:02<00:00,  8.55it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 22.39it/s]
100%|██████████████████████████████████████████| 30/30 [00:03<00:00,  8.59it/s]

画像ファイル「image_008.png」が生成される

↑

Step 9：プロンプトの重要度（guidance_scale）を変える †

「sd_009.py」

## sd_009.py「自然と滝の写真」プロンプトの重要度（guidance_scale）

import torch
from diffusers import StableDiffusionPipeline, logging
from diffusers.utils import make_image_grid
import matplotlib.pyplot as plt

logging.set_verbosity_error()                               ## 不要なエラー出力の抑制

# 画像生成
def image_generation(g_scale):
    # パイプラインを作成
    pipeline = StableDiffusionPipeline.from_single_file(model).to(device)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    num_inference_steps = 20,
                    guidance_scale = g_scale,
                    generator = generator
                    ).images[0]
    return img

# モデルのフォルダーのパス
model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# プロンプト
prompt = "nature and waterfall photography"

seed = 1234         # seed 固定

# 複数画像を生成
plt.figure(figsize=[12.5, 6], dpi = 100)
for i in range(1, 9):
    img = image_generation(i)
    plt.subplot(2, 4, i, title = "guidance_scale=%.1f"%i)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_009.png')
plt.close()

・guidance_scale パラメータは「どれくらいプロンプトを重視するか」を決める数値
・既定値は 7.5 で、0 から 10 で指定できる（数値が大きい程プロンプトが重視される）

プログラムを実行する（実行時間：約 16秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_009.py

Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 11003.42it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 15.43it/s]
100%|██████████████████████████████████████████| 20/20 [00:01<00:00, 13.50it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.54it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.59it/s]
Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 12323.01it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.35it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.37it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.21it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.63it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.81it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.64it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 19.16it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.26it/s]
Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 11008.67it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.43it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.44it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.80it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.34it/s]

画像ファイル「image_009.png」が生成される

↑

Step 10：CLIPを飛ばす（clip_skip） †

CLIP（Contrastive Language-Image Pretraining）とは
・CLIP は自然言語処理によって使われるモデルの一種
・Stable Diffusion に構成されている一つの部品として搭載されている
・役割は「入力したプロンプトを解析する」

プロンプトの影響を与えるパラメータ clip_skip について
・CLIP は 12層あり生成する時に基本的に 12層を全部使われるが、全部の層を使う必要がなく、途中でスキップ（CLIP skip）できる
・clip_skip は 0～11 の数値で指定でき既定値は 0（スキップを行わなず全部の層を使う）

「sd_010.py」

## sd_010.py「自然と滝の写真」CLIPを飛ばす（clip_skip）

import torch
from diffusers import StableDiffusionPipeline, logging
from diffusers.utils import make_image_grid
import matplotlib.pyplot as plt

logging.set_verbosity_error()                               ## 不要なエラー出力の抑制

# 画像生成
def image_generation(c_skip):
    # パイプラインを作成
    pipeline = StableDiffusionPipeline.from_single_file(model).to(device)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    num_inference_steps = 20,
                    guidance_scale = 7.5,
                    clip_skip = c_skip,
                    generator = generator
                    ).images[0]
    return img

# モデルのフォルダーのパス
model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# プロンプト
prompt = "nature and waterfall photography"

seed = 1234         # seed 固定

# 複数画像を生成
plt.figure(figsize=[7, 10], dpi = 100)
for i in range(12):
    img = image_generation(i)
    plt.subplot(4, 3, i + 1, title="clip_skip=%d"%i)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_010.png')
plt.close()

プログラムを実行する（実行時間：約 25秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_010.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 14.60it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.27it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 32.96it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.55it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 32.60it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.43it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.40it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.40it/s]
Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 10889.15it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.16it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.64it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 23.14it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.48it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.46it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.44it/s]
Fetching 11 files: 100%|█████████████████████| 11/11 [00:00<00:00, 9123.46it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 32.64it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.48it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.62it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.35it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.45it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.63it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 22.02it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.41it/s]
Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 10956.39it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 35.70it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.36it/s]

画像ファイル「image_010.png」が生成される

↑

Step 11：スケジューラー（scheduler）を変える †

スケジューラーについて
・スケジューラー（scheduler）又はサンプラー（sampler）は Stable Diffusion での画像生成処理のアルゴリズムの一部
・生成された画像に影響を与える要因の一つ
・スケジューラーが違うと適切なステップ数など他のパラメータも変わる場合もある
・スケジューラーの種類は多いが各モデルでは既定値のスケジューラーをそのまま使ってスケジューラーのことを意識しなくてもよい

既定値で使われるスケジューラを調べる
・パイプラインを使った後 scheduler 属性調べればどのスケジューラーが既定値で使われるかわかる

## sd_scheduler.py スケジューラー（scheduler）を調べる

from diffusers import StableDiffusionPipeline, logging

logging.set_verbosity_error()                               ## 不要なエラー出力の抑制

model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"
pipeline = StableDiffusionPipeline.from_single_file(model)

print(pipeline.scheduler)

・実行結果

(sd_test) PS > python sd_scheduler.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00,  9.80it/s]
PNDMScheduler {
  "_class_name": "PNDMScheduler",
  "_diffusers_version": "0.33.1",
  "beta_end": 0.012,
  "beta_schedule": "scaled_linear",
  "beta_start": 0.00085,
  "clip_sample": false,
  "num_train_timesteps": 1000,
  "prediction_type": "epsilon",
  "set_alpha_to_one": false,
  "skip_prk_steps": true,
  "steps_offset": 1,
  "timestep_spacing": "leading",
  "trained_betas": null
}

スケジューラ資料 https://huggingface.co/docs/diffusers/api/schedulers/overview
・Schedulers

A1111/k-diffusion	Diffusers	Usage
DPM++ 2M	DPMSolverMultistepScheduler
DPM++ 2M Karras	DPMSolverMultistepScheduler	init with use_karras_sigmas=True
DPM++ 2M SDE	DPMSolverMultistepScheduler	init with algorithm_type="sde-dpmsolver++"
DPM++ 2M SDE Karras	DPMSolverMultistepScheduler	init with use_karras_sigmas=True and algorithm_type="sde-dpmsolver++"
DPM++ 2S a	N/A	very similar to DPMSolverSinglestepScheduler
DPM++ 2S a Karras	N/A	very similar to DPMSolverSinglestepScheduler(use_karras_sigmas=True, ...)
DPM++ SDE	DPMSolverSinglestepScheduler
DPM++ SDE Karras	DPMSolverSinglestepScheduler	init with use_karras_sigmas=True
DPM2	KDPM2DiscreteScheduler
DPM2 Karras	KDPM2DiscreteScheduler	init with use_karras_sigmas=True
DPM2 a	KDPM2AncestralDiscreteScheduler
DPM2 a Karras	KDPM2AncestralDiscreteScheduler	init with use_karras_sigmas=True
DPM adaptive	N/A
DPM fast	N/A
Euler	EulerDiscreteScheduler
Euler a	EulerAncestralDiscreteScheduler
Heun	HeunDiscreteScheduler
LMS	LMSDiscreteScheduler
LMS Karras	LMSDiscreteScheduler	init with use_karras_sigmas=True
N/A	DEISMultistepScheduler
N/A	UniPCMultistepScheduler

・Noise schedules and schedule types

A1111/k-diffusion	Diffusers
Karras	init with use_karras_sigmas=True
sgm_uniform	init with timestep_spacing="trailing"
simple	init with timestep_spacing="trailing"
exponential	init with timestep_spacing="linspace", use_exponential_sigmas=True
beta	init with timestep_spacing="linspace", use_beta_sigmas=True

「sd_011.py」

## sd_011.py「自然と滝の写真」スケジューラー（scheduler）を変更

import torch
import diffusers
from diffusers import StableDiffusionPipeline, logging
from diffusers.utils import make_image_grid
import matplotlib.pyplot as plt

logging.set_verbosity_error()                               ## 不要なエラー出力の抑制

# 画像生成
def image_generation(scheduler):
    # パイプラインを作成
    pipeline = StableDiffusionPipeline.from_single_file(model).to(device)
    pipeline.scheduler = scheduler.from_config(pipeline.scheduler.config)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    num_inference_steps = 20,
                    generator = generator
                    ).images[0]
    return img

# モデルのフォルダーのパス
model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# プロンプト
prompt = "nature and waterfall photography"

seed = 1234         # seed 固定

lis_schdl = [
    'DDIMScheduler',
    'DDPMScheduler',
    'PNDMScheduler',
    'DPMSolverSinglestepScheduler',
    'DPMSolverMultistepScheduler',
    'LMSDiscreteScheduler',
    'EulerDiscreteScheduler',
    'EulerAncestralDiscreteScheduler',
    'HeunDiscreteScheduler',
    'KDPM2AncestralDiscreteScheduler',
]

# 複数画像を生成
plt.figure(figsize=[6, 15.5], dpi = 100)
for i, schdl in enumerate(lis_schdl):
    img = image_generation(getattr(diffusers, schdl))
    plt.subplot(5, 2, i + 1, title=schdl)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_011.png')
plt.close()

プログラムを実行する（実行時間：約 20秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_011.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 14.83it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.68it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.87it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.96it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.43it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  8.36it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.74it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  9.15it/s]
Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 11000.80it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.81it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  9.15it/s]
Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 11008.67it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 23.26it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  9.11it/s]
Fetching 11 files: 100%|█████████████████████| 11/11 [00:00<00:00, 8329.54it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 34.21it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  9.15it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 35.46it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  9.17it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.07it/s]
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.69it/s]
Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00, 33.34it/s]
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.69it/s]

画像ファイル「image_011.png」が生成される

↑

Step 12：日本語でプロンプト入力する †

日本語でプロンプトを入力できるモデルは少ない
Python の自動翻訳パッケージを利用して英語に翻訳してから入力する

「sd_012.py」

## sd_012.py「自然と滝の写真」日本語でプロンプト入力~

import torch
from diffusers import StableDiffusionPipeline, logging
from translate import Translator

logging.set_verbosity_error()                               ## 不要なエラー出力の抑制

# モデルのフォルダーのパス
model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

seed = 5678

# パイプラインを作成
pipeline = StableDiffusionPipeline.from_single_file(model).to(device)

# プロンプト
prompt = "自然と滝の写真"

# 英語に翻訳
generator = torch.Generator(device).manual_seed(seed)
trans = Translator('en','ja').translate
en_prompt = trans(prompt)
print(prompt, '→', en_prompt)

# 画像を生成
image = pipeline(
            prompt = en_prompt,
            num_inference_steps=20,
            width=768,
            height=512
            ).images[0]
image.save("results/image_012.png")

足りないパッケージをインストールする
```
(sd_test) PS > pip install translate
```

プログラムを実行する（実行時間：約 3秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_012.py

Fetching 11 files: 100%|███████████████████████████████| 11/11 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00,  8.67it/s]
自然と滝の写真 → Photos of nature and waterfalls
100%|██████████████████████████████████████████| 20/20 [00:03<00:00,  5.18it/s]

画像ファイル「image_012.png」が生成される

↑

Step 13：生成したくないないものを指定（negative prompt） †

生成したくないないものをネガティブプロンプト（negative prompt）として入力する
・Step 9 の例で guidance_scale = 7.0 で人物が生成されるので、ネガティブプロンプトで排除する

「sd_013.py」

## sd_013.py「自然と滝の写真」生成したくないないものを指定（negative prompt）~

import torch
from diffusers import StableDiffusionPipeline, logging

logging.set_verbosity_error()                               ## 不要なエラー出力の抑制

# モデルのフォルダーのパス
model = "/StabilityMatrix/Data/Models/StableDiffusion/SD1.5/v1-5-pruned-emaonly.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

seed = 1234

# パイプラインを作成
pipeline = StableDiffusionPipeline.from_single_file(model).to(device)

# プロンプト
prompt = "nature and waterfall photography"
ng_prompt = "person"

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

# 画像を生成
image = pipeline(
            prompt = prompt,
            negative_prompt = ng_prompt,
            num_inference_steps=20,
            guidance_scale = 7.0,
            generator = generator
            ).images[0]
image.save("results/image_013.png")

プログラムを実行する（実行時間：約 2秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_013.py

Fetching 11 files: 100%|████████████████████| 11/11 [00:00<00:00, 11058.81it/s]
Loading pipeline components...: 100%|████████████| 6/6 [00:00<00:00,  9.88it/s]
100%|██████████████████████████████████████████| 20/20 [00:02<00:00,  7.91it/s]

画像ファイル「image_013.png」が生成される
・ネガティブプロンプトを指定することで多少画像は変化する

↑

Step 20：【SDXL】SDXL モデルを使用する †

SDXL モデルについて
・デフォルトの画像サイズが 1024×1024 と大きくなった
・複雑な構図の画像を生成できる
・短いプロンプトでも高品質な画像を作りやすい
・モデル容量が大きくローカル環境で使うには高い PCスペックが必要

パイプラインを作成オブジェクトの違い

モデルの種類基本画像サイズパイプライン作成オブジェクト

SD1.5 512x512 StableDiffusionPipeline

SDXL 1024x1024 StableDiffusionXLPipeline

モデルの種類	基本画像サイズ	パイプライン作成オブジェクト
SD1.5	512x512	StableDiffusionPipeline
SDXL	1024x1024	StableDiffusionXLPipeline

「sd_020.py」

## sd_020.py【SDXL】「自然と滝の写真」SDXL モデル~
## model:   sd_xl_base_1.0.safetensors
##          juggernautXL_v8Rundiffusion.safetensors
##          animexlXuebimix_v60LCM.safetensors

import torch
from diffusers import StableDiffusionXLPipeline, logging
logging.set_verbosity_error()

# モデルのフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# パイプラインを作成
pipeline = StableDiffusionXLPipeline.from_single_file(
                    model_path,
                    torch_dtype = torch.float16,
                    ).to(device)

# プロンプト
prompt = "nature and waterfall photography"

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')

# 画像を生成
image = pipeline(
                    prompt=prompt,
                    num_inference_steps=20,
                    guidance_scale = 7.5,
                    generator = generator
                    ).images[0]
image.save("results/image_020.png")

プログラムを実行する（実行時間：約 5秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_020.py

Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  4.42it/s]
Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/sd_xl_base_1.0.safetensors
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.30it/s]

画像ファイル「image_020.png」が生成される

↑

Step 21：【SDXL】VAE / スケジューラを設定する方法 †

VAE やスケジューラーは生成結果に影響を与える要素の一つで適切なものを選ぶ必要がある
モデルによって適用する VAE やスケジューラが推奨されていることがある
例：AnimeXL-xuebiMIXモデル
　　VAE → SDXL-VAE-FP16-Fix
　　スケジューラ → EulerAncestralDiscreteScheduler

「sd_021.py」

## sd_021.py【SDXL】「自然と滝の写真」VAE スケジューラ指定~
## model:   animexlXuebimix_v60LCM.safetensors

import torch
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler, logging
from diffusers.models import AutoencoderKL                  # VAEのクラス
logging.set_verbosity_error()

# モデル/VAE のフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors"
vae_path = "/StabilityMatrix/Data/Models/VAE/sdxl_vae_fp16_fix.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# VAEオブジェクトを作成
vae = AutoencoderKL.from_single_file(
                    vae_path,
                    torch_dtype=torch.float16
                    )

# パイプラインを作成
pipeline = StableDiffusionXLPipeline.from_single_file(
                    model_path,
                    vae = vae,
                    torch_dtype = torch.float16,
                    ).to(device)

# スケジューラ設定
pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)

# プロンプト
prompt = "nature and waterfall photography"

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

print(f'Seed: {seed}, Model: {model_path}')
print(f'VAE : {vae_path}')

# 画像を生成
image = pipeline(
                    prompt=prompt,
                    num_inference_steps=20,
                    guidance_scale = 7.5,
                    generator = generator
                    ).images[0]
image.save("results/image_021.png")

プログラムを実行する（実行時間：約 5秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_021.py

Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  4.37it/s]
Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors
VAE : /StabilityMatrix/Data/Models/VAE/sdxl_vae_fp16_fix.safetensors
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.23it/s]

画像ファイル「image_021.png」が生成される
・左の画像は VAE なしの場合、白いノイズなどがある
・モデルによって必要な場合は適切な VAE 屋スケジューラを設定する必要がある

↑

Step 22：【SDXL】望ましくない結果を避ける（エンベディングモデル） †

望ましくない要素を避けるエンベディングモデル（embedding model）公開されている
例：EasyNegative

エンベディングの使い方
・load_textual_inversion 関数を使ってエンベディングモデルの.safetensorsを読み込み表現するネガティブプロンプトを token = 'EasyNegative'と指定
・パイプラインを実行する時に negative_prompt というキーワードに入れる

「sd_022.py」

## sd_022.py【SDXL】「自然と滝の写真」望ましくない結果を避ける~
## model:   animexlXuebimix_v60LCM.safetensors

import torch
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler, logging
from diffusers.models import AutoencoderKL                  # VAEのクラス
logging.set_verbosity_error()

# モデル/VAE/Embedding のフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors"
vae_path = "/StabilityMatrix/Data/Models/VAE/sdxl_vae_fp16_fix.safetensors"
emb_path = "/StabilityMatrix/Data/Models/Embeddings/EasyNegativeV2.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# VAEオブジェクトを作成
vae = AutoencoderKL.from_single_file(
                    vae_path,
                    torch_dtype=torch.float16
                    )

# パイプラインを作成
pipeline = StableDiffusionXLPipeline.from_single_file(
                    model_path,
                    vae = vae,
                    torch_dtype = torch.float16,
                    ).to(device)

# スケジューラ設定
pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)

# プロンプト
prompt = "nature and waterfall photography"
neg_prompt = "EasyNegative"

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

# エンベディングをロード
pipeline.load_textual_inversion(
    pretrained_model_name_or_path = emb_path,
    token='EasyNegative')

print(f'Seed: {seed}, Model: {model_path}')
print(f'Embeddings : {emb_path}')
print(f'VAE : {vae_path}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    negative_prompt = neg_prompt,
                    num_inference_steps=20,
                    guidance_scale = 7.5,
                    generator = generator
                    ).images[0]
image.save("results/image_022.png")

プログラムを実行する（実行時間：約 5秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_022.py

Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  4.31it/s]
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors
Embeddings : /StabilityMatrix/Data/Models/Embeddings/EasyNegativeV2.safetensors
VAE : /StabilityMatrix/Data/Models/VAE/sdxl_vae_fp16_fix.safetensors
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.31it/s]

画像ファイル「image_022.png」が生成される
・モデルによって必ずしもいい結果になるとは限らない

↑

Step 23：【SDXL】LoRA を使う †

LoRA（Low-Rank Adaptation）は既存のモデルに適用して特定な作風を生成するよう特化させることができる
・少量の画像を使った追加学習によって作られたもので誰も簡単に作って使うもの
例：中華風の絵を生成する「国风插画」

LoRA の使い方
・load_lora_weightsメソッドで使用する
・有効にするためには同時に配布されているトリガーワード（Trigger Word）をプロンプトに入れる必要がある
・上記「国风插画」のトリガーワードは「guofeng」(guofengは中国であり漢字で書くと「国風」)

「sd_023.py」

## sd_023.py【SDXL】「自然と滝の写真」LoRA を使う~
## model:   animexlXuebimix_v60LCM.safetensors

import torch
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler, logging
from diffusers.models import AutoencoderKL                  # VAEのクラス
logging.set_verbosity_error()

# モデル/VAE/Embedding/LoRA のフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors"
vae_path = "/StabilityMatrix/Data/Models/VAE/sdxl_vae_fp16_fix.safetensors"
emb_path = "/StabilityMatrix/Data/Models/Embeddings/EasyNegativeV2.safetensors"
lora_path = "/StabilityMatrix/Data/Models/Lora/国风插画SDXL.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# VAEオブジェクトを作成
vae = AutoencoderKL.from_single_file(
                    vae_path,
                    torch_dtype=torch.float16
                    )

# パイプラインを作成
pipeline = StableDiffusionXLPipeline.from_single_file(
                    model_path,
                    vae = vae,
                    torch_dtype = torch.float16,
                    ).to(device)

# スケジューラ設定
pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)

# プロンプト
prompt = "guofeng, nature and waterfall photography"        # トリガーワードが含まれる
neg_prompt = "EasyNegative"

# Generatorオブジェクト作成
generator = torch.Generator(device).manual_seed(seed)

# エンベディングをロード
pipeline.load_textual_inversion(
    pretrained_model_name_or_path = emb_path,
    token='EasyNegative')

# LoRA をロード
pipeline.load_lora_weights(".", weight_name = lora_path)

print(f'Seed: {seed}, Model: {model_path}')
print(f'Embeddings : {emb_path}')
print(f'LoRA : {lora_path}')
print(f'VAE : {vae_path}')

# 画像を生成
image = pipeline(
                    prompt = prompt,
                    negative_prompt = neg_prompt,
                    num_inference_steps=20,
                    guidance_scale = 7.5,
                    generator = generator
                    ).images[0]
image.save("results/image_023.png")

足りないパッケージをインストールする
```
(sd_test) PS > pip install -U peft
```

プログラムを実行する（実行時間：約 7秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_023.py

Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:01<00:00,  4.34it/s]
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors
Embeddings : /StabilityMatrix/Data/Models/Embeddings/EasyNegativeV2.safetensors
LoRA : /StabilityMatrix/Data/Models/Lora/国风插画SDXL.safetensors
VAE : /StabilityMatrix/Data/Models/VAE/sdxl_vae_fp16_fix.safetensors
100%|██████████████████████████████████████████| 20/20 [00:06<00:00,  3.17it/s]

画像ファイル「image_023.png」が生成される

↑

Step 24：【SDXL】LoRAの比率を設定する（fuse_lora） †

fuse_loraメソッドを使って LoRA の影響の比率を設定することができる
0 だと LoRA を使っていないのと同じで、大きくなると影響が大きくなる。大きすぎると絵が崩壊する

「sd_024.py」

## sd_024.py【SDXL】「自然と滝の写真」LoRA 比率の設定~
## model:   animexlXuebimix_v60LCM.safetensors

import torch
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler, logging
from diffusers.models import AutoencoderKL                  # VAEのクラス
import matplotlib.pyplot as plt

logging.set_verbosity_error()

# 画像生成
def image_generation(lora_s):
    # VAEオブジェクトを作成
    vae = AutoencoderKL.from_single_file(
                    vae_path,
                    torch_dtype=torch.float16
                    )

    # パイプラインを作成
    pipeline = StableDiffusionXLPipeline.from_single_file(
                    model_path,
                    vae = vae,
                    torch_dtype = torch.float16,
                    ).to(device)

    # スケジューラ設定
    pipeline.scheduler = EulerAncestralDiscreteScheduler.from_config(pipeline.scheduler.config)

    # Generatorオブジェクト作成
    generator = torch.Generator(device).manual_seed(seed)

    # エンベディングをロード
    pipeline.load_textual_inversion(
                    pretrained_model_name_or_path = emb_path,
                    token='EasyNegative')

    # LoRA をロード
    pipeline.load_lora_weights(".", weight_name = lora_path)
    pipeline.fuse_lora(lora_scale = lora_s)                 # LoRA 比率

    # 画像を生成
    img = pipeline(
                    prompt = prompt,
                    negative_prompt = neg_prompt,
                    num_inference_steps=20,
                    guidance_scale = 7.5,
                    generator = generator
                    ).images[0]
    return img


# モデル/VAE/Embedding/LoRA のフォルダーのパス
model_path = "/StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors"
vae_path = "/StabilityMatrix/Data/Models/VAE/sdxl_vae_fp16_fix.safetensors"
emb_path = "/StabilityMatrix/Data/Models/Embeddings/EasyNegativeV2.safetensors"
lora_path = "/StabilityMatrix/Data/Models/Lora/国风插画SDXL.safetensors"

# GPUを使う場合は"cuda" 使わない場合は"cpu"
device = 'cuda' if torch.cuda.is_available() else 'cpu'

# seed 値
seed = 12345678

# プロンプト
prompt = "guofeng, nature and waterfall photography"        # トリガーワードが含まれる
neg_prompt = "EasyNegative"

print(f'Seed: {seed}, Model: {model_path}')
print(f'Embeddings : {emb_path}')
print(f'LoRA : {lora_path}')
print(f'VAE : {vae_path}')

# 複数画像を生成
plt.figure(figsize = [6, 12.5], dpi = 100)
for i in range(8):
    lora_s = i * 0.2
    img = image_generation(lora_s)
    plt.subplot(4, 2, i + 1, title = "lora_scale=%.1f"%lora_s)
    plt.imshow(img)
    plt.axis('off')

    # メモリー開放
    if device == 'cuda':
        torch.cuda.empty_cache()
    elif device == 'mps':
        torch.mps.empty_cache()

plt.tight_layout()
plt.savefig('results/image_024.png')
plt.close()

プログラムを実行する（実行時間：約 32秒 RTX 4070 Ti 12GB）

(sd_test) PS > python sd_024.py

Seed: 12345678, Model: /StabilityMatrix/Data/Models/StableDiffusion/animexlXuebimix_v60LCM.safetensors
Embeddings : /StabilityMatrix/Data/Models/Embeddings/EasyNegativeV2.safetensors
LoRA : /StabilityMatrix/Data/Models/Lora/国风插画SDXL.safetensors
VAE : /StabilityMatrix/Data/Models/VAE/sdxl_vae_fp16_fix.safetensors
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00,  9.53it/s]
The new embeddings will be initialized from a multivariate normal distribution that has old embeddings mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.41it/s]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00, 15.99it/s]
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.47it/s]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00, 12.95it/s]
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.41it/s]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00, 15.60it/s]
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.42it/s]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00, 16.15it/s]
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.45it/s]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 17001.23it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00, 15.34it/s]
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.40it/s]
Fetching 17 files: 100%|████████████████████| 17/17 [00:00<00:00, 17005.29it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00, 13.26it/s]
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.50it/s]
Fetching 17 files: 100%|███████████████████████████████| 17/17 [00:00<?, ?it/s]
Loading pipeline components...: 100%|████████████| 7/7 [00:00<00:00, 15.60it/s]
100%|██████████████████████████████████████████| 20/20 [00:04<00:00,  4.40it/s]

画像ファイル「image_024.png」が生成される

↑

忘備録 †

↑

ValueError: PEFT backend is required for this method. †

エラー内容：LoRA がロードできない

    :
    raise ValueError("PEFT backend is required for this method.")
ValueError: PEFT backend is required for this method.

対処方法
・パッケージをアップデートする
```
(sd_test) PS > pip install -U peft transformers
```

参考サイト・PEFT diffusers -- integration alert #5489

↑

更新履歴 †

2025/06/12 初版

↑

参考資料 †

Stable Diffusion

Diffusers
- huggingface/diffusers
- diffusersでスケジューラを読み込んで画像生成する方法

書籍など
- 日経ソフトウエア 2025年7月号「ローカル生成AIプログラミング」
- Interface 2025年3月号「画像による異常検出＆ローカルLLM作り - 仕事のための生成AI」