Run HunyuanVideo on a 12GB or 10GB VRAM GPU ! Tested on my machine

Detailed guide on how to run HunyuanVideo on a low VRAM GPU (RTX3080, 3060, etc)

Dec 19, 2024

HunyuanVideo is an amazing open source text to image model from Tencent.

I recent shared one Hunyuan generated video on Reddit and it generated lots of interests (see below for the post’s statistics). People are excited about the capability of the model to generate high quality videos.

Initially it requires at least a 48GB VRAM GPU, thanks to kijai’s ComfyUI-HunyuanVideoWrapper, it can be run easily in ComfyUI. However, it still requires big VRAM (≤24 GB).

But here’s the good news, if you have an GPU with VRAM of 12GB or even 10GB, you are able to run it now!

The minimum requirements:

GPU: VRAM of 12GB or 10GB.
For convenience, here is a list of recent GPU with at least 10GB VRAM: Desktop GPU: RTX 3060, 3080, 3080TI, 3090, 3090Ti, most of the 4000 series RTX GPU
Mobile GPU: 3080, 3080Ti (both have 16GB VRAM)

RAM: 30GB of memory is need to load the model files.

Disk(SSD): need at least 100GB for model storage.

It is still a high requirement for ram. If you can not run it locally, I was able to deploy an API based on it so people can try it easily. The frontend is hosted at https://agireact.com/t2v

Steps to run:

Ensure you have the latest ComfyUI (either fresh install or upgrade an old install). I made a video guide on installing it.

Check the example page for the model download instructions. Here is a summary:
Download the hunyuan_video_t2v_720p_bf16.safetensors file and put it in your ComfyUI/models/diffusion_models folder.
Download the clip_l.safetensors and llava_llama3_fp8_scaled.safetensors files from here and put them in your ComfyUI/models/text_encoders directory.
Download the hunyuan_video_vae_bf16.safetensors file and put it in your ComfyUI/models/vae folder.

Then download and load the workflow to ComfyUI . The workflow needs a ffmpeg tool for video processing. Please install it if it gives error.

My run results:

I tested on my 3080Ti GPU (12GB VRAM) on a 32GB RAM machine (Ubuntu OS). Below is the VRAM usage info and performance (848x480 73 frames, 20 steps):

It uses most of the availabe VRAM. Around 6 Minutes is needed for 20 steps. Quality is quite nice.

For 10GB VRAM, it also works “RTX 3080 10gb, 512x416, 61 length, 30 steps took around 4 minutes”.

I highly recommend you to give it a try!

If you GPU has mroe VRAM, you may consider trying a custom node called ComfyUI-HunyuanVideoWrapper. I made a video tutorial for it

Thanks for reading!

Please subscribe for future contents!

TechPractice’s Substack

Discussion about this post