ComfyUI9b の履歴(No.23)

私的AI研究会 > ComfyUI9b

画像生成AI「ComfyUI」９（動画編４）　== 編集中 ==†

　「ComfyUI」を使ってローカル環境でのAI画像生成を検証する

▲　目　次

画像生成AI「ComfyUI」９（動画編４）　== 編集中 ==
- LTX-2.3 による音声付き動画生成
- 更新履歴
参考資料

※ 最終更新:2026/04/16　

↑

LTX-2.3 による音声付き動画生成†

　2026年3月発表された音声対応の動画生成モデル。
　1月発表の「LTX-2」と比較して大幅な性能向上とのこと、ComfyUIでネイティブサポートされているので検証してみる

↑

概要†

「LTX-2.3」とは
- イスラエルの Lightricks（ライトリックス）社が開発 2026年 3月に公開した高性能オープンソース動画生成AIモデル
- 従来のモデル（LTX-2）と比較して動画の品質やプロンプト理解力が大幅に向上

主な特徴
- 高速・高品質な動画生成: 動画と音声をセットで生成可能で、ローカル環境でも高速に動作する設計
- 高解像度と長尺対応: 4K画質や、長時間の動画生成に対応している
- 音声との連携: 画像と音声を同時に入力することで、リップシンク（口の動きを合わせる）や歌に合わせた動きが可能
- プロンプト理解力の向上: 前世代の LTX-2 と比べて、プロンプトに忠実な映像を生成する能力が向上
- ローカル運用向け: ComfyUIでネイティブサポートされており、個人の PC環境（GPU）で動作させることが可能

前世代との違いと評価
- WAN 2.2との比較: ローカル生成AIのライバルである「WAN 2.2」がシネマティックな動きや画質に強みを持つ一方、LTX-2.3 は生成速度と出力の安定性に強みがある
- 用途: ストーリーボードの作成や、プロンプトを素早く試す（イテレーション）作業に向いている

利用方法
- ComfyUI というツールを使用して、ローカルのPC環境で動かすのが一般的
- モデルや追加のウェイト（ID-LoRAなど）をダウンロードし、動画と音声を生成するワークフローを設定して使用する

動作前提要件（公式ドキュメントより）
- ComfyUI installed
- CUDA-compatible GPU with 32GB+ VRAM
- 100GB+ free disk space for models and cache

オフィシャルサイト

（参考）低 VRAM 環境下の動作のための蒸留版
- LTX-2.3 には 8 ステップで動作する蒸留版（distilled version）も含まれている
- Classifier-Free Guidance（CFG）値 1で実行でき、フルモデルと比べて大幅な高速化が可能

↑

プロジェクトで作成するワークフロー†

このプロジェクトで作成するワークフローと関連データは下記にアップロードしている（更新されている場合は再度ダウンロードのこと）

ComfyUI_ex_proj.zip をダウンロード（随時更新中）※2026/04/12更新
・解凍してできるフォルダ

📂ComfyUI
  ├─📂input　　　　　　　　　　　　　　← ワークフローに含まれる入力画像
  └─📂user
        └─📂default
              └─📂workflows　　　　　　　　← ワークフローの保存場所
                    :
                    ├─📂_video
                    ├─📂_video2
                    ├─📂LTX 　　　　　　　 ← この章で作成するワークフロー
                    :

・解凍してできる「ComfyUI/」フォルダを「StabilityMatrix/Data/Packages/ComfyUI」へ上書きコピーする

ワークフローと動作環境による生成時間（分：秒）　　軽量版推奨ワークフロー　　　軽量 GGUF版推奨ワークフロー　

ワークフロー	機　能	モデル	CPU					CPU
ワークフロー	機　能	モデル	RTX 4070	RTX 4060	RTX 4060L	RTX 3050	GTX 1050	i7-1260P	i7-1185G7
5300_LTX-2.3_t2v_dev	Text to Video 基本ワークフロー	fp8 dev	06:07.31		24:22.67		非対応
5301_LTX-2.3_i2v_dev	Image to Video 基本ワークフロー	fp8 dev	04:11.95		22:28.35
5302_LTX-2.3_t2v_dev_simple	Text to Video 基本(simple)	fp8 dev	04:15.03		20:50.82
5303_LTX-2.3_i2v_dev_simple	Image to Video 基本(simple)	fp8 dev	04:32.22		20:50.82
5304_LTX-2.3_T2V_I2V_1st_dev	Text/Image to Video (dev)	fp8 dev	30:18.01		208:45.76
5340_LTX-2.3_t2v_dev_GGUF	Text to Video (GGUF)	GGUF dev	04:32.22		09:56.69
5341_LTX-2.3_i2v_dev_GGUF	Image to Video (GGUF)	GGUF dev	02:49.95		08:17.38
5342_LTX-2.3_T2V_dev_GGUF	Text to Video (GGUF 展開版)	GGUF dev	02:56.61	07:25.82	08:12.25	17:44.45
5343_LTX-2.3_I2V_dev_GGUF	Image to Video (GGUF 展開版)	GGUF dev	03:23.88	05:10.34	08:32.48	11:47.73
5344_LTX-2.3_T2V_I2V_1st_dev_GF	Text/Image to Video (GGUF dev)	GGUF dev	36:53.21		108:54.75
5400_LTX-2.3_t2v_distilled	Text to Video 基本ワークフロー	fp8 distilled	01:25.56		06:43.14
5401_LTX-2.3_i2v_distilled	Image to Video 基本ワークフロー	fp8 distiled	01:58.72		06:56.02
5402_LTX-2.3_t2v_distil_simple	Text to Video 基本(simple)	fp8 distilled	01:43.82		06:36.58
5403_LTX-2.3_i2v_distil_simple	Image to Video 基本(simple)	fp8 distilled	01:32.83		06:45.56
5404_LTX-2.3_T2V_I2V_1st_distilled	Text/Image to Video 1stage (蒸留版)	fp8 distilled	05:12.67		32:02.62
5405_LTX-2.3_T2V_I2V_2st_distilled	Text/Image to Video 2stage (蒸留版)	fp8 distilled	09:54.06		62:40.42
5440_LTX-2.3_t2v_distilled_GGUF	Text to Video (GGUF)	GGUF distill	02:51.28		09:24.07
5441_LTX-2.3_i2v_distilled_GGUF	Image to Video (GGUF)	GGUF distill	03:28.56		07:39.56
5442_LTX-2.3_T2V_distilled_GGUF	Text to Video (GGUF 展開版)	GGUF distill	03:43.24		07:59.01
5443_LTX-2.3_I2V_distilled_GGUF	Image to Video (GGUF 展開版)	GGUF distill	04:42.55		08:28.67
5444_LTX-2.3_T2V_I2V_1st_distil_GF	Text/Image to Video 1stage (GGUF蒸留版)	GGUF distil	08:02.39		16:39.30
5445_LTX-2.3_T2V_I2V_2st_distil_GF	Text/Image to Video 2stage (GGUF蒸留版)	GGUF distil	12:50.99		40:06.29

↑

動画生成のための環境構築†

必要モデルのダウンロードと配置

「Stability Matrix」上の「ComfyUI」ではモデルフォルダの場所が異なっていることに注意 → モデルフォルダの配置

モデル名	ファイル名（.safetensors）	配置先		ダウンロード URL
checkpoints	ltx-2.3-22b-dev-fp8	/StabilityMatrix/Data/ Models/	StableDiffusion/	ltx-2.3-22b-dev-fp8.safetensors.safetensors
	ltx-2.3-22b-distilled-fp8		StableDiffusion/	ltx-2.3-22b-distilled-fp8.safetensors
	ltx-2.3-22b-dev-Q4_K_M.gguf		diffusion_models/	ltx-2.3-22b-dev-Q4_K_M.gguf
	ltx-2.3-22b-distilled-Q4_K_M.gguf		diffusion_models/	ltx-2.3-22b-distilled-Q4_K_M.gguf
LoRA	ltx-2.3-22b-distilled-lora-384		Lora/	ltx-2.3-22b-distilled-lora-384.safetensors
LoRA	ltx-2.3-22b-distilled-lora-dynamic_fro09_avg_rank_105_bf16		Lora/	ltx-2.3-22b-distilled-lora-dynamic_fro09_avg_rank_105_bf16.safetensors
text_encoders	gemma_3_12B_it_fp4_mixed		text_encoders	gemma_3_12B_it_fp4_mixed.safetensors
	gemma_3_12B_it_fp8_scaled			gemma_3_12B_it_fp8_scaled.safetensors
	ltx-2.3_text_projection_bf16			ltx-2.3_text_projection_bf16
VAE	LTX23_audio_vae_bf16		VAE/	LTX23_audio_vae_bf16.safetensors
VAE	LTX23_video_vae_bf16		VAE/	LTX23_video_vae_bf16.safetensors
UP Scale	ltx-2.3-spatial-upscaler-x2-1.1	/StabilityMatrix/Data/ Packages/ComfyUI/models/	latent_upscale_models/	ltx-2.3-spatial-upscaler-x2-1.1.safetensors
UP Scale	~~ltx-2.3-spatial-upscaler-x2-1.0~~ ※	/StabilityMatrix/Data/ Packages/ComfyUI/models/	latent_upscale_models/	ltx-2.3-spatial-upscaler-x2-1.0

　・ LTX-2.3 FP8 Model Card

Name	Notes
ltx-2.3-22b-dev-fp8	The full model, flexible and trainable, in fp8
ltx-2.3-22b-distilled-fp8	The distilled version of the full model, 8 steps, CFG=1, in fp8

　・ GGUF版で使用　※ ltx-2.3-spatial-upscaler-x2-1.1.safetensors を使用する

GGUFモデルを使用する場合はカスタムノード『 ComfyUI-GGUF 』をインストールする（共通手順による）
・GitHub: ComfyUI-GGUF

↑

Step 1：オフィシャルサイトの標準テンプレートからワークフローを作成†

　「ltx-2.3-22b-dev-fp8.safetensors」標準(dev) fp8 モデルを使用する

ワークフローを選ぶ

① 左端のメニューから「Template」を選択
②「Video」を押す
③ 検索欄に「ltx2.3」を入力する

・表示された一覧からワークフローを選ぶ
④「LTX-2.3 Text to Video」テキストから動画生成
⑤「LTX-2.3 Image to Video」静止画像から動画生成

・ワークフローでエラーが発生する場合は前項のモデルの配置を確認する

・ワークフロー内で使われる画像データのダウンロード
　　 GitHub: ComfyUI-Org workflow_templates

動作確認を行ってから保存する
　④「LTX-2 Text to Video」→ 「video_ltx2_3_t2v_org.json」
　⑤「LTX-2 Image to Video)」→ 「video_ltx2_3_i2v_org.json」

・オリジナルのワークフロー

video_ltx2_3_t2v_org.json
入力画像ダミー画像	*Prompt:* Dynamic cinematic close-up of high-tech modular machinery self-assembling in midair, precision robotic parts, magnetic connectors, and glowing circuits clicking together, subtle smoke and light flares, extremely detailed titanium textures. The final product displays a clean, clear surface with large glowing engraved text “LTX-2.3” centered and unobstructed, dramatic lighting, photorealism, 8K, sharp focus.
入力画像ダミー画像	空中で自己組み立てされるハイテクモジュール式機械のダイナミックなシネマティッククローズアップ。精密なロボット部品、磁気コネクタ、光る回路がカチッと音を立てて組み合わさり、かすかな煙と光のフレア、極めて精緻なチタンの質感。最終製品は、中央に大きく光る刻印文字「LTX-2.3」が遮るものなく配置された、清潔でクリアな表面を呈し、ドラマチックな照明、フォトリアリズム、8K、シャープなフォーカスを実現しています。
↑ video_ltx2_3_t2v_org.json 　　　　　　　　SubGraph 展開 →
video_ltx2_3_i2v_org.json
入力画像 egyptian_queen.png	*Prompt:* Egyptian royal in blue-and-gold headdress and high collar, white dress with golden embroidery and armbands, desert, robot soldiers in formation left and right. She walks steadily forward, head held level and gaze fixed ahead—no dipping or lowering of the head. The camera performs a single, smooth push-in only: starting in a wider shot of her, the robots, and the desert, it moves steadily forward until she is in a medium or medium-close frame, then holds. She stops, posture and head still upright, and says: “The old gods are silent. I am not.” Robot soldiers shift or march in place; sand and fabric move with the wind. No pull-back; the only camera move is the continuous push-in.
入力画像 egyptian_queen.png	青と金の頭飾りとハイカラー、金の刺繍と腕輪のついた白いドレスを着たエジプトの王族。砂漠、左右に整列したロボット兵士たち。彼女は頭を水平に保ち、視線をまっすぐ前に向けたまま、頭を下げたり下げたりすることなく、着実に前進する。カメラは、彼女とロボット、砂漠を捉えたワイドショットから始まり、彼女がミディアムまたはミディアムクローズのフレームに入るまで着実に前進し、そこで静止する。彼女は立ち止まり、姿勢と頭は依然としてまっすぐで、「古い神々は沈黙している。私は沈黙しない」と言う。ロボット兵士たちはその場で移動したり行進したりし、砂と布は風に揺れる。プルバックはなく、カメラの動きは連続的なプッシュインのみである。
↑ video_ltx2_3_i2v_org.json 　　　　　　　　SubGraph 展開 →

・オリジナル・ワークフロー考察「video_ltx2_3_t2v_org.json」

　1. ワークフロー内に「switch to Text to Video?」の設定（true/false）がありデフォルトでは true となっている
　　True = Text to Video, False = Image to Video として機能（動作）を切り替えることができる
　2. このワークフローを実行すると、入力されたプロンプトからさらに詳細なプロンプトを生成し、このプロンプトにより生成が行われる
　3. 生成される詳細プロンプトは実行のたびに表現のニュアンスが違っている

< 内部で生成されたプロンプトの例 >
Style: realistic with cinematic lighting. In a close-up, high-tech modular machinery self-assembling dynamically in midair—precision robotic parts clicking together, magnetic connectors connecting, and glowing circuits connecting subtly. Subtle smoke and light flares drift through the air. The final product displays a clean, clear surface with large, glowing engraved text “LTX-2.3” centered and unobstructed. Dramatic lighting highlights the titanium textures. Extremely detailed titanium textures are visible everywhere, catching the light. Sharp focus creates a sense of precision. Ambient sounds include faint clicks and whirs as the machinery assembles itself. Behind the machinery, other patrons move subtly in and out of focus.

スタイル：映画のような照明を用いたリアルな表現。クローズアップでは、ハイテクなモジュール式機械が空中でダイナミックに自己組み立てされる様子が映し出される。精密なロボット部品がカチッと音を立てて組み合わさり、磁気コネクタが接続され、光る回路が微妙に接続される。かすかな煙と光のフレアが空中を漂う。完成品は、中央に大きく光る「LTX-2.3」の刻印文字が遮られることなく、すっきりとした表面を呈する。ドラマチックな照明がチタンの質感を際立たせる。至る所に極めて精緻なチタンの質感が見られ、光を捉えている。シャープなフォーカスが精密さを感じさせる。機械が組み立てられる際の微かなクリック音や唸り音が環境音として聞こえる。機械の背後では、他の客が微妙にピントが合ったり外れたりする。

ワークフローを整理する

5300 Text to Video 基本ワークフロー	5300 SubGraph

「LTX/」5300_LTX-2.3_t2v_dev.json
5301 Image to Video 基本ワークフロー	5301 SubGraph

「LTX/」5301_LTX-2.3_i2v_dev.json
5302 Text to Video 基本ワークフロー (simple)	5303 Image to Video 基本ワークフロー (simple)

「LTX/」5302_LTX-2.3_T2V_dev_simple.json	「LTX/」5303_LTX-2.3_I2V_dev_simple.json

生成結果動画（音声付き）

5302_LTX-2.3_T2V_simple.json 5303_LTX-2.3_I2V_simple.json

↑

Step 2：GGUF版（dev）ワークフローの作成†

　「ltx-2.3-22b-dev-fp8.safetensors」標準(dev) fp8 モデルでは VRAM 8GB 以下の環境ではメモリー不足のようなので GGUF 量子化モデルにしてみる

GGUF 量子化モデルについて
・基本的にはビット数が多い程、精度が上がるが VRAM 消費も多くなる
・GGUF は速度ではなくVRAMを節約する、技術的には GGUF は圧縮されてるから遅くなる
・モデル全体がVRAMに収まらない問題がある環境においては GGUF の方が速くなることもある

LTX-2.3-dev GGUF モデル
タイプ	ビット数	モデルサイズ	内容
Q2_K	2	8.28 GB	2ビット量子化。16ブロックのスーパーブロックで、各ブロックは16のウェイトを持つ。1ウェイトあたり2.5625ビットになる
Q3_K_M	3	18.8 GB	3ビット量子化。16ブロックのスーパーブロックで、各ブロックは16のウェイトを持つ。1ウェイトあたり3.4375ビットになる
Q3_K_S	3	9.95 GB	3ビット量子化。16ブロックのスーパーブロックで、各ブロックは16のウェイトを持つ。1ウェイトあたり3.4375ビットになる
Q4_K_M	4	14.3 GB	4ビット量子化。8ブロックのスーパーブロックで、各ブロックは32のウェイトを持つ。1ウェイトあたり4.5ビットになる
Q4_K_S	4	13.1 GB	4ビット量子化。8ブロックのスーパーブロックで、各ブロックは32のウェイトを持つ。1ウェイトあたり4.5ビットになる
Q5_K_M	5	16.1 GB	5ビット量子化。8ブロックのスーパーブロックで、各ブロックは32のウェイトを持つ。1ウェイトあたり5.5ビットになる
Q5_K_S	5	16.2 GB	5ビット量子化。8ブロックのスーパーブロックで、各ブロックは32のウェイトを持つ。1ウェイトあたり5.5ビットになる
Q6_K	6	17.8 GB	6ビット量子化。16ブロックのスーパーブロックで、各ブロックは16のウェイトを持つ。1ウェイトあたり6.5625ビットになる
Q8_0	8	22.8 GB	8ビット近似値に量子化。各ブロックは32のウェイトを持つ
F16	16	42.0 GB	16ビット標準IEEE754 半精度浮動小数点数

　※ https://huggingface.co/unsloth/LTX-2.3-GGUF/tree/main

GGUF 量子化モデル対応のワークフローをダウンロードする
・LTX-2.3 22B GGUF WORKFLOWS 12GB VRAM
・Rebels LTX-2.3 Dev (GGUF)

GGUF版整理したワークフロー

5340 Text to Video 基本ワークフロー (GGUF)	5340 SubGraph

「LTX/」5340_LTX-2.3_t2v_dev_GGUF.json
5341 Image to Video 基本ワークフロー (GGUF)	5341 SubGraph

「LTX/」5341_LTX-2.3_i2v_dev_GGUF.json
5342 Text to Video 基本ワークフロー (GGUF 展開版)	5343 Image to Video 基本ワークフロー (GGUF 展開版)

「LTX/」5342_LTX-2.3_T2V_GGUF.json	「LTX/」5343_LTX-2.3_I2V_dev_GGUF.json

生成結果動画（音声付き）

5340_LTX-2.3_t2v_dev_GGUF.json 5341_LTX-2.3_i2v_dev_GGUF.json

↑

Step 3：標準テンプレートのワークフローを蒸留版（distilled）にする†

　基本的に標準テンプレート (dev) のワークフローで LoRA (ltx-2.3-22b-distilled-lora-384) ノードをバイパスして、モデルを変更することで機能する
　「Text to Video 基本ワークフロー」については若干の修正を加える（後述）

ワークフローを整理する

5400 Text to Video 基本ワークフロー (distilled)	5400 SubGraph (distilled)

「LTX/」5400_LTX-2.3_t2v_distilled.json
5401 Image to Video 基本ワークフロー (distilled)	5401 SubGraph (distilled)

「LTX/」5401_LTX-2.3_i2v_distilled.json
5402 Text to Video 基本ワークフロー (distilled/simple)	5303 Image to Video 基本ワークフロー (distilled/simple)

「LTX/」5402_LTX-2.3_T2V_distilled_simple.json	「LTX/」5403_LTX-2.3_I2V_distilled_simple.json

Text to Video 基本ワークフローについて
・このワークフローを実行すると、入力されたプロンプトからさらに詳細なプロンプトを生成し、このプロンプトにより生成が行われる
・生成時間の短縮のため、このノードグループをバイパスして入力されたプロンプトそのもので生成するように変更する
・入力するプロンプトは、オリジナルワークフローを実行したときに生成されたプロンプトを使用する

・生成結果動画（音声付き）

オリジナルのワークフロー	プロンプト生成をバイパス

*Prompt:* Dynamic cinematic close-up of high-tech modular machinery self-assembling in midair, precision robotic parts, magnetic connectors, and glowing circuits clicking together, subtle smoke and light flares, extremely detailed titanium textures. The final product displays a clean, clear surface with large glowing engraved text “LTX-2.3” centered and unobstructed, dramatic lighting, photorealism, 8K, sharp focus.	*Prompt:* realistic with cinematic lighting. In a close-up, high-tech modular machinery self-assembling in midair, precision robotic parts and magnetic connectors click together with glowing circuits. Subtle smoke and light flares create dramatic effects as the titanium textures display extreme detail. The final product displays a clean, clear surface with large glowing engraved text “LTX-2.3” centered and unobstructed. The scene’s sharp focus highlights 8K photorealism.
空中で自己組み立てされるハイテクモジュール式機械のダイナミックなシネマティッククローズアップ。精密なロボット部品、磁気コネクタ、光る回路がカチッと音を立てて組み合わさり、かすかな煙と光のフレア、極めて精緻なチタンの質感。最終製品は、中央に大きく光る刻印文字「LTX-2.3」が遮るものなく配置された、清潔でクリアな表面を呈し、ドラマチックな照明、フォトリアリズム、8K、シャープなフォーカスを実現しています。	映画のようなライティングによるリアルな描写。クローズアップでは、ハイテクなモジュール式機械が空中で自己組み立てされ、精密なロボット部品と磁気コネクタが光る回路と共にカチッと嵌合する様子が描かれています。かすかな煙と光のフレアがドラマチックな効果を生み出し、チタンの質感は極めて精緻なディテールを際立たせています。完成品は、中央に大きく光る「LTX-2.3」の刻印文字が遮るものなく配置された、すっきりとしたクリアな表面を呈しています。シーンのシャープなフォーカスが8Kフォトリアリズムを際立たせています。

↑

Step 4：GGUF版（distilled）ワークフローの作成†

distilled GGUF 量子化モデルについて
・「ltx-2.3-22b-distilled-Q4_K_M.gguf」を使用する
・モデルのサイズは dev / distilled ほぼ同じ　→ GGUF版（dev）

GGUF 量子化モデルのワークフロー
・Step 3 で作成した dev ワークフローのモデルを変更する

GGUF版整理したワークフロー

5440 Text to Video 基本ワークフロー distilled (GGUF)	5440 SubGraph

「LTX/」5440_LTX-2.3_t2v_distilled_GGUF.json
5441 Image to Video 基本ワークフロー distilled (GGUF)	5441 SubGraph

「LTX/」5441_LTX-2.3_i2v_distilled_GGUF.json
5442 Text to Video 基本ワークフロー distilled (GGUF 展開版)	5443 Image to Video 基本ワークフロー distilled (GGUF 展開版)

「LTX/」5442_LTX-2.3_T2V_distilled_GGUF.json	「LTX/」5443_LTX-2.3_I2V_distilled_GGUF.json

↑

Step 5: Lightricks オフィシャルサイトのワークフロー†

　ComfyUI サイトとは別に LTX2.3 開発元の Lightricks オフィシャルサイトにもワークフローのサンプルが用意されているので検証する
　→ PSA: 公式のLTX 2.3ワークフローを使ってください。ComfyUIに含まれているものではなく、こちらの方がかなり良いです。

事前設定
1. 拡張ノード『 ComfyMath 』『 RES4LYF 』をインストールする（共通手順による）
　・https://github.com/evanspearman/ComfyMath
　・https://github.com/ClownsharkBatwing/RES4LYF

2. 拡張ノードをアップデートする（ワークフローのエラーが消えない場合）

①「Manager」ボタンを押す
②「Update All」を選択する
③「Restart」ボタンが表示されたら押す
④ 新規の顔面が表示されるまで待ち、新規画面を閉じる
④ 前の画面を閉じて Web ページを終了
⑤「StabilityMatrix」を終了し、再度起動する

Single Stage 版
1. ワークフロー「LTX-2.3_T2V_I2V_Single_Stage_Distilled_Full.json」をダウンロードする

2. モデルを変更する

旧	変更後	適応箇所
ltx-2.3-22b-dev.safetensors	ltx-2.3-22b-dev-fp8	5
ltx-2.3-22b-distilled-lora-384.safetensors	ltx-2.3-22b-distilled-lora-384	2
confy_gemma_3.12B_it.safetensors	gemma_3_12B_it_fp4_mixed	1

標準版(dev) / 蒸留版(distilled) Text to Video, Image to Video

「~beta」LTX-2.3_T2V_I2V_Single_Stage_Distilled_Full_org.json

3. ワークフローを整理する

通常版(dev) Text to Video / Image to Video	蒸留版(distilled) Text to Video / Image to Video

「LTX/」5304_LTX-2.3_T2V_I2V_1st_dev.json	「LTX/」5404_LTX-2.3_T2V_I2V_1st_distilled.json

*Prompt:* A traditional Japanese tea ceremony takes place in a tatami room as a host carefully prepares matcha. Soft traditional koto music plays in the background, adding to the serene atmosphere. The bamboo whisk taps rhythmically against the ceramic bowl while water simmers in an iron kettle. Guests kneel in formal seiza position, watching in respectful silence. The host bows and presents the tea bowl, turning it precisely before offering it to the first guest with soft-spoken words.
畳の部屋で、亭主が丁寧に抹茶を点てる伝統的な日本の茶道が繰り広げられる。静謐な琴の音色が背景に流れ、穏やかな雰囲気を醸し出す。鉄のやかんで湯が沸く中、竹製の茶筅が陶器の茶碗をリズミカルに叩く。客は正座の姿勢で跪き、静かにその様子を見守る。亭主は一礼し、茶碗を丁寧に回してから、最初の客にそっと口づけながら差し出す。

Teo Stage 版
1. ワークフロー「LTX-2.3_T2V_I2V_Two_Stage_Distilled.json」をダウンロードする

2. モデルを変更する

旧	変更後	適応箇所
ltx-2.3-22b-dev.safetensors	ltx-2.3-22b-dev-fp8	5
ltx-2.3-22b-distilled-lora-384.safetensors	ltx-2.3-22b-distilled-lora-384	1
confy_gemma_3.12B_it.safetensors	gemma_3_12B_it_fp4_mixed	1

蒸留版(distilled) Text to Video, Image to Video

「~beta」LTX-2.3_T2V_I2V_Two_Stage_Distilled_org_org.json

3. ワークフローを整理する

	蒸留版(distilled) Text to Video / Image to Video

	「LTX/」5405_LTX-2.3_T2V_I2V_2st_distilledlled.json

GGUF 版

通常版(dev) Text to Video / Image to Video 1stage	蒸留版(distilled) Text to Video / Image to Video 1stage

「LTX/」5344_LTX-2.3_T2V_I2V_1st_dev_GF.json	「LTX/」5444_LTX-2.3_T2V_I2V_1st_distil_GF.json
	蒸留版(distilled) Text to Video, Image to Video 2stage

	「LTX/」5445_LTX-2.3_T2V_I2V_2st_distil_GF.json

↑

GGUF モデルについて†

GGUF（GPT-Generated Unified Format）とは → 量子化モデルとは
・GGUFは、大規模言語モデル（LLM）を一般の消費者向けPC（CPUやGPU）で高速かつ効率的に動作させるためのファイルフォーマット
・旧来の GGML 形式を強化したもので、量子化（軽量化）モデルの配布に広く使われ、1ファイルでモデルの重みやメタデータを含む点が特徴

GGUFの主な特徴とメリット
1. ローカル環境への最適化: CPUやApple Silicon（M1/M2/M3）でも高速に推論可能。
　GPUのメモリ（VRAM）が足りない場合でも、メインメモリ（RAM）を使って動作できる
2. 1ファイル完結: モデルのパラメータ情報や、トークナイザーの設定など、必要なデータをすべて1つの.ggufファイルに集約しており、管理が容易
3. 量子化に対応 (K-quants): 「Q4_K_M」などの混合精度量子化技術により、高精度を維持しつつ、モデルサイズを大幅に削減（1/2～1/4程度）
4. 高い互換性: llama.cpp、Ollama、LM Studioなど、多くのローカルLLMツールでネイティブにサポート

GGUF 速度
・GGUF は速度ではなくVRAMを節約する、技術的には GGUF は圧縮されてるから遅くなる
・モデル全体がVRAMに収まらない問題がある環境においては GGUF の方が速くなることもある

GGUF モデルへ対応するワークフローの変更点

ノード通常モデル対応 GGUF 対応

Checkpoint
Load Checkpoint
Unet Loader (GGUF) / VAE Loder KJ

VAE
Load Audio VAE
VAE Encoder KJ

Text Encoder
LTXV Audio Text Encoder Loader
Dual CLIP Loader

　※ 参考URL → LTX-2.3 GGUF Image-to-Video & Text-to-Video in ComfyUI

ノード	通常モデル対応	GGUF 対応
Checkpoint	Load Checkpoint	Unet Loader (GGUF) / VAE Loder KJ
VAE	Load Audio VAE	VAE Encoder KJ
Text Encoder	LTXV Audio Text Encoder Loader	Dual CLIP Loader

↑

生成動画例：Image to Video†

入力動画の作成

番号	生成画像	プロンプト
①		Natural outdoor portrait of a beautiful Japanese woman in a white cotton T-shirt, smiling while holding a soda can. The scene is a golden hour wheat field with a warm, sun-drenched atmosphere. Wide shot, medium perspective. Realistic, cinematic, 8k.
①		黄金色の麦畑、温かい西日に包まれた雰囲気の中で、白いコットンTシャツを着た美しい日本人女性がソーダの缶を持って微笑んでいる、ナチュラルな屋外ポートレート。ワイドショット、ミディアムパースペクティブ。リアルで映画的な、8k画像。
②		A sleek black sports car driving through a city street at night, low-angle dynamic shot, motion blur on wheels and background. Wet asphalt reflecting neon lights in red, blue, and magenta tones, glossy car surface with sharp reflections. Urban environment with blurred buildings and streetlights, cinematic composition. Headlights on, soft glow illuminating the road. Shallow depth of field, sharp focus on the front and side of the car, high contrast lighting, realistic reflections and surface details.
②		夜の都市の通りを走る黒いスポーツカー、ローアングルのダイナミックな構図、車輪や背景にモーションブラー。濡れたアスファルトに赤・青・マゼンタのネオン光が反射し、車体は光沢がありシャープな反射を持つ。都市の建物や街灯はぼかされ、シネマティックな構図。ヘッドライトが点灯し、路面を柔らかく照らす。浅い被写界深度、車の前面と側面にシャープなピント、高コントラストのライティング、リアルな反射と質感描写。
③		A shot of a Japanese woman from the waist up, running in stylish sportswear. In the background is a wide bridge at sunset, with the city skyline glowing golden. The background has a dynamic motion blur, and the woman is in focus. High-resolution photograph.
③		スタイリッシュなスポーツウェアを着て走る日本人女性の腰から上のショット。背景には夕暮れ時の広い橋があり、街のスカイラインが黄金色に輝いている。背景にはダイナミックなモーションブラーがかかり、女性にピントが合っている。高解像度写真。
④		A Japanese woman, wearing a white cotton T-shirt and with her hair casually down, stands on a rooftop against the backdrop of a hazy cityscape at dusk. She faces the camera, wearing round sunglasses, which she is slightly pulling down with her hand. The warm twilight light envelops everything in a soft glow.
④		白い綿のTシャツを着て、髪を無造作に下ろした日本人女性。屋上に立ち、夕暮れ時のぼやけた街並みを背景にしている。彼女はカメラの方を向いて立ち、丸いサングラスをかけていて、手で少し下げようとしている。全体的に温かみのある夕暮れ時の光が、すべてを淡い輝きで包み込んでいる。
⑤		A Japanese woman in a white cotton t-shirt with messy loose hair. Seated at a cafe table against a background with windows on the left, other vacant tables and chairs in the foreground and center-right, and a counter and an espresso machine on the far right. She sits facing forward, looking at the camera, holding a cup and saucer with both hands on the table. Natural light and warm tones, cinematic photo.
⑤		白いコットンTシャツを着て、無造作に髪を下ろした日本人女性。左側に窓があり、手前と中央右に他の空のテーブルと椅子があり、右奥にカウンターとエスプレッソマシンがある。彼女は、正面を向いて座り、カメラを見つめている。テーブルの上で、両手でカップとソーサーを持っている。自然光と温かみのあるトーン、映画のような写真。

　※ ワークフロー → 2101_z_image_turbo_simple

生成動画

①

*Prompt:* The woman gently tilts the can and takes a refreshing sip, her eyes closing slightly with pleasure. A light breeze makes her hair and t-shirt flutter The camera slowly pans and tilts upward as the sunlight flares more intensely behind her, creating a dreamy golden shimmer. Light lens flares, soft wind movement in the wheat field, subtle camera shake for realism. Warm and radiant motion, smooth transitions, soft glowing ambiance, cinematic light bloom, 16:9 aspect ratio.
女性は缶をそっと傾け、爽やかな一口を飲み、満足げに目を少し閉じます。そよ風が彼女の髪とTシャツをひらひらと揺らめかせます。カメラはゆっくりとパンし、上向きに傾けます。彼女の背後で太陽の光がより強く輝き、夢のような黄金色のきらめきを生み出します。軽いレンズフレア、麦畑の柔らかな風の動き、リアリティを追求するための微妙なカメラの揺れ。暖かく輝く動き、滑らかなトランジション、柔らかな光の雰囲気、映画のような光のブルーム、16:9のアスペクト比。
②

*Prompt:* The camera tracks the sleek black sports car as it races down a wet, neon-lit city street at night. Reflections of magenta, cyan, and red lights shimmer on the car’s glossy surface and the wet asphalt. The car accelerates slightly as the lights streak past in the background, with a subtle motion blur and tire spray. Its headlights flare and cast sharp beams forward, illuminating the wet road ahead. The camera rotates around the front-left side of the car, highlighting its curves and aggressive stance. Soft raindrops hit the windshield in slow motion. Soundless, but with cinematic tension. High contrast lighting, futuristic tone, slow motion elements, hyper-realistic motion, 16:9 aspect ratio
カメラは、夜の濡れたネオンライトに照らされた街路を疾走する、流線型の黒いスポーツカーを捉える。マゼンタ、シアン、赤の光が車の光沢のある表面と濡れたアスファルトにきらめく。背景を光が駆け抜ける中、車はわずかに加速し、かすかなモーションブラーとタイヤの飛沫が加わる。ヘッドライトが閃光を放ち、前方の濡れた路面を鋭く照らす。カメラは車の左前部を回り込み、その曲線美とアグレッシブな姿勢を強調する。柔らかな雨粒がスローモーションでフロントガラスに当たる。音はないが、映画的な緊張感がある。高コントラスト照明、未来的なトーン、スローモーション要素、超リアルな動き、16:9のアスペクト比
③

*Prompt:* The woman runs steadily forward, her steps rhythmic and powerful. Her ponytail bounces with each stride as warm morning light ripples across her body and the bridge. Subtle camera shake adds realism as the scene follows her from a side angle. A light breeze moves her clothing naturally. The sun rises behind her, casting golden flares through the bridge cables. Drops of sweat glisten and roll down her skin in slow motion. The video closes with her stopping to catch her breath, turning toward the camera with a confident smile. Realistic motion, slow-to-normal pacing blend, dynamic light transitions, motivational mood, cinematic tone, 16:9 format
女性は力強くリズミカルに、着実に前進する。温かい朝の光が彼女の体と橋に波紋のように広がる中、彼女のポニーテールは一歩ごとに揺れる。横からのアングルで彼女を追うシーンでは、わずかなカメラの揺れがリアリティを増している。そよ風が彼女の衣服を自然に揺らす。太陽が彼女の背後から昇り、橋のケーブルを通して黄金色の光の筋を投げかける。汗の滴が光り輝き、ゆっくりと彼女の肌を伝って流れ落ちる。ビデオは、彼女が息を整えるために立ち止まり、自信に満ちた笑顔でカメラの方を向くところで終わる。リアルな動き、スローからノーマルへのテンポのブレンド、ダイナミックな光の遷移、モチベーションを高める雰囲気、映画のようなトーン、16:9フォーマット
④

*Prompt:* The woman slowly lifts and puts on her sunglasses as the golden sun sets behind her. Her hair moves gently in the wind, and the reflection in the lenses captures the glowing city skyline. As the glasses settle on her face, the light subtly shifts, casting a cinematic flare across the lens. The camera slowly pushes in toward her face, enhancing the cool, composed mood. Lens flares, soft camera movement, golden hour light, confident tone, 16:9 aspect ratio
女性は、背後に黄金色の夕日が沈む中、ゆっくりとサングラスを上げてかける。風に髪が優しく揺れ、レンズに映る光り輝く街のスカイラインが捉えられる。サングラスが顔に落ち着くと、光が微妙に変化し、レンズに映画のようなフレアが映る。カメラはゆっくりと彼女の顔に近づき、クールで落ち着いた雰囲気を際立たせる。レンズフレア、ソフトなカメラワーク、ゴールデンアワーの光、自信に満ちたトーン、16:9のアスペクト比
⑤

*Prompt:* A young woman sits in a cozy modern café, facing the camera at eye level. She smiles gently and speaks directly to the viewer in a calm, friendly tone. Her lips sync naturally as she says: “It’s kind of amazing… with the release of LTX-2, I can finally talk to you like this. It feels more real, more alive. If you want to see what I create next, follow me and stay with me.” Her facial expressions are subtle and natural, with soft eye contact, slight head movements, and small hand gestures near a coffee cup on the table. The motion is smooth and coherent, with stable facial structure and consistent identity throughout the clip. The café background remains steady and realistic, with minimal camera movement, no exaggerated motion, and no stylization. Natural daylight illuminates her face evenly, maintaining photorealistic skin texture, realistic lip movement, and believable human timing. The overall mood is warm, intimate, and conversational, as if she is casually talking to the viewer in real life.
若い女性が居心地の良いモダンなカフェに座り、カメラと目線を合わせています。彼女は優しく微笑み、穏やかで親しみやすい口調で視聴者に直接語りかけます。彼女は「LTX-2が発売されて、ようやくこうしてあなたと話せるようになったなんて、本当に素晴らしいわ。もっとリアルで、もっと生き生きしている感じ。私が次に何を作るか見たいなら、私をフォローして、一緒にいてください」と言いながら、唇の動きは自然に同期しています。彼女の表情は繊細で自然で、視線は柔らかく、頭はわずかに動き、テーブルの上のコーヒーカップの近くでは小さな手の動きが見られます。動きは滑らかで一貫性があり、顔の構造は安定しており、クリップ全体を通して一貫したアイデンティティが保たれています。カフェの背景は安定していてリアルで、カメラの動きは最小限で、誇張された動きや様式化はありません。自然光が彼女の顔を均一に照らし、フォトリアルな肌の質感、リアルな唇の動き、そして信憑性のある人間のタイミングを維持しています。全体的な雰囲気は温かく、親密で、会話的で、まるで彼女が現実世界で視聴者と気軽に話しているかのようです。