ComfyUI8 の履歴(No.4) - PukiWiki

[ トップ ] [ 一覧 | 検索 | 履歴 | ログイン ]

履歴一覧
差分を表示
現在との差分を表示
ソースを表示
ComfyUI8 へ行く。
- 1 (2026-01-11 (日) 17:05:00)
- 2 (2026-01-13 (火) 04:08:28)
- 3 (2026-01-13 (火) 14:30:19)
- 4 (2026-01-16 (金) 05:04:35)

私的AI研究会 > ComfyUI8

画像生成AI「ComfyUI」8（効果音編）　== 編集中 ==†

　「ComfyUI」を使ってローカル環境でのAI音楽生成を検証する

▲　目　次

画像生成AI「ComfyUI」8（効果音編）　== 編集中 ==
- 「ComfyUI」MMAudio による効果音生成
- 更新履歴
参考資料

※ 最終更新:2026/01/16　

「ComfyUI」MMAudio による効果音生成†

「MMAudio」を使って動画から効果音を作成してみる

プロジェクトで作成するワークフロー†

ComfyUI_proj.zip をダウンロード
・解凍してできるフォルダ

ComfyUI
├─input　　　　　　　　　　　　　　← ワークフローに含まれる入力画像
└─user
    └─default
        └─workflows　　　　　　　　← ワークフローの保存場所
            ├─_audio
            ├─_base
            ├─_base_i2i
            ├─_base_t2i
            ├─_prompt
            ├─_utility
            ├─_video
            └─test

・解凍してできる「ComfyUI/」フォルダを「StabilityMatrix/Data/Packages/ComfyUI」へ上書きコピーする

ワークフローと動作環境による生成時間（分：秒）

ワークフロー機　能 CPU CPU

RTX 4070 RTX 4060 RTX 4060L RTX 3050 GTX 1050 i7-1260P i7-1185G7

mmaudio_test.json 動画から効果音を生成 00:12.22 00:12.29 00:19.76 3:32.04 18:58.88 × ×

　※ 入力動画サイズ 768x512pixel モデルローディング済みの場合の時間

効果音生成のための環境構築†

モデルをダウンロードする
サイト：Kijai/MMAudio_safetensors on Hugging Face から次の 4つをダウンロード
・mmaudio_large_44k_v2_fp16.safetensors
・mmaudio_synchformer_fp16.safetensors
・mmaudio_vae_44k_fp16.safetensors
・apple_DFN5B-CLIP-ViT-H-14-384_fp16.safetensors
「/StabilityMatrix/Data/ComfyUI/models/mmaudio/」に配置する
※ 共有モデルのフォルダではなくパッケージ固有のモデルフォルダに入れる
「ComfyUI-MMAudio」カスタムノードをインストール

⑪「Manager」ボタンを押し「ComfyUI Manager」から「Custom Nodes Manager」を選択
②「 ComfyUI-MMAudio 」カスタムノードを検索。「Install」を押す
③ バージョン番号を選択
④ インストールが完了すると、ComfyUI の再起動を求められるので、下部の「Restart」ボタンを押す
⑤「Confirm」（確認）ボタンを押すとインストール開始する
⑥ インストール完了すると再度確認ウインドウが開く（別のウインドウがオープンするので閉じる）
⑦ 前のブラウザ画面で「Confirm」（確認）ボタンを押す
⑧「ComfyUI Manager」でインストール完了を確認してブラウザと「Stability Matrix」を一旦終了する

← インストール完了の状態

最初のステップ：添付のサンプルを動かす†

ワークフローのテンプレートを読み出す
ワークフローのある場所：「ComfyUI/custom_nodes/comfyui-mmaudio/example_workflows/mmaudio_test.json」

⑪ カスタムノードのモデルを確認する
　・モデルをクリックして配置した 4つのモデルが表示されれば OK
② 入力動画をカスタムノードにドラッグ＆ドロップする
③「Run」を押して生成開始
・途中でダイアログ↓が表示された場合は閉じて再度「Run」を押す

④「mmaudio_test」としてワークフローを保存

　※ ワークフロー：「_audio/」mmaudio_test.json
生成結果（音声付き動画）

実行時間

RTX 4070	RTX 4060	RTX 4060L	RTX 3050	GTX 1050

	-	一度 Allocation error 発生する場合があるが再実行すると生成完了する

▼　エラーログ詳細 <RTX 4060L>

    :
Loading config.json from local directory
Loading weights from local directory
Removing weight norm...
Parsing tokenizer identifier. Schema: None, Identifier: ViT-H-14-378-quickgelu
Attempting to load config from built-in: ViT-H-14-378-quickgelu
Using default SimpleTokenizer.
Loaded MMAudio model weights from D:\StabilityMatrix\Data\Packages\ComfyUI\models\mmaudio\mmaudio_large_44k_v2_fp16.safetensors
clip_frames torch.Size([77, 3, 384, 384]) sync_frames torch.Size([240, 3, 224, 224]) duration 9.633333333333333
!!! Exception during processing !!! Allocation on device 
Traceback (most recent call last):
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 518, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 329, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 303, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\execution.py", line 291, in process_inputs
    result = f(**inputs)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\comfyui-mmaudio\nodes.py", line 347, in sample
    audios = generate(clip_frames,
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\comfyui-mmaudio\mmaudio\eval_utils.py", line 53, in generate
    sync_features = feature_utils.encode_video_with_sync(sync_video, batch_size=bs)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\utils\_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\comfyui-mmaudio\mmaudio\model\utils\features_utils.py", line 118, in encode_video_with_sync
    outputs.append(self.synchformer(x[i:i + batch_size]))
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\comfyui-mmaudio\mmaudio\ext\synchformer\synchformer.py", line 34, in forward
    vis = self.vfeat_extractor(vis)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\comfyui-mmaudio\mmaudio\ext\synchformer\motionformer.py", line 211, in forward
    x = self.forward_segments(x, orig_shape=orig_shape)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\comfyui-mmaudio\mmaudio\ext\synchformer\motionformer.py", line 220, in forward_segments
    x, x_mask = self.forward_features(x)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\comfyui-mmaudio\mmaudio\ext\synchformer\video_model_builder.py", line 245, in forward_features
    x = blk(x,
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\comfyui-mmaudio\mmaudio\ext\synchformer\vit_helper.py", line 177, in forward
    space_output = self.attn(self.norm1(time_residual),
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\venv\lib\site-packages\torch\nn\modules\module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\StabilityMatrix\Data\Packages\ComfyUI\custom_nodes\comfyui-mmaudio\mmaudio\ext\synchformer\vit_helper.py", line 89, in forward
    k_ = torch.cat((cls_k, k_), dim=1)
torch.OutOfMemoryError: Allocation on device 

Memory summary: 
|===========================================================================|
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |   7148 MiB |   7180 MiB |      0 B   |      0 B   |
|       from large pool |      0 MiB |      0 MiB |      0 B   |      0 B   |
|       from small pool |      0 MiB |      0 MiB |      0 B   |      0 B   |
|---------------------------------------------------------------------------|
| Active memory         |   7148 MiB |   7180 MiB |      0 B   |      0 B   |
|       from large pool |      0 MiB |      0 MiB |      0 B   |      0 B   |
|       from small pool |      0 MiB |      0 MiB |      0 B   |      0 B   |
|---------------------------------------------------------------------------|
| Requested memory      |      0 B   |      0 B   |      0 B   |      0 B   |
|       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
|       from small pool |      0 B   |      0 B   |      0 B   |      0 B   |
|---------------------------------------------------------------------------|
| GPU reserved memory   |   7488 MiB |   7488 MiB |      0 B   |      0 B   |
|       from large pool |      0 MiB |      0 MiB |      0 B   |      0 B   |
|       from small pool |      0 MiB |      0 MiB |      0 B   |      0 B   |
|---------------------------------------------------------------------------|
| Non-releasable memory |      0 B   |      0 B   |      0 B   |      0 B   |
|       from large pool |      0 B   |      0 B   |      0 B   |      0 B   |
|       from small pool |      0 B   |      0 B   |      0 B   |      0 B   |
|---------------------------------------------------------------------------|
| Allocations           |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Active allocs         |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       0    |       0    |       0    |       0    |
|       from large pool |       0    |       0    |       0    |       0    |
|       from small pool |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Oversize allocations  |       0    |       0    |       0    |       0    |
|---------------------------------------------------------------------------|
| Oversize GPU segments |       0    |       0    |       0    |       0    |
|===========================================================================|

Got an OOM, unloading all loaded models.
Prompt executed in 146.94 seconds
got prompt
clip_frames torch.Size([77, 3, 384, 384]) sync_frames torch.Size([240, 3, 224, 224]) duration 9.633333333333333
Flow Matching: 100%|██████████| 25/25 [00:08<00:00,  2.90it/s]
Prompt executed in 29.74 seconds
got prompt
clip_frames torch.Size([38, 3, 384, 384]) sync_frames torch.Size([120, 3, 224, 224]) duration 4.833333333333333
Flow Matching: 100%|██████████| 25/25 [00:08<00:00,  3.02it/s]
Prompt executed in 23.24 seconds

静止画から音声付き動画を作成†

「Framepack F1」「MMAudio」での生成使用例

元画像静止画 → 動画生成動画 → 効果音生成生成動画

ワークフロー mmaudio_test.json mmaudio_test.json RTX-4060

生成時間（分:秒） 26:04.57 00:45.54

ここまでの検証結果†

GPU 必須（PU のみの構成では「MMAudio Sampler」ノードが動作しない
「MMAudio Sampler」ノードで GPU メモリーアロケーションエラーが発生する場合がある（メモリー不足）
・分割処理とかの方法がありそうだが今のところ対処方法不明（2026/01/07 現在）

更新履歴†

2026/01/06 初版

参考資料†

ComfyUI
- GitHub: comfyanonymous/ComfyUI

MMAudio
- GitHub: kijai/ComfyUI-MMAudio
- Hugging Face: Kijai/MMAudio_safetensors on Hugging Face