FIFO-Diffusion: Generating Infinite Videos from Text without Training
vladbogo.substack.com
Today's paper proposes FIFO-Diffusion, a new inference technique that enables generating infinitely long videos from text without any additional training, based solely on a pretrained diffusion model for short video clips. Method Overview The key idea is diagonal denoising, which processes a sequence of consecutive video frames with increasing noise levels in a first-in-first-out queue. Then, iteratively, the method dequeues a fully denoised frame at the head while enqueuing a new random noise frame at the tail. However, diagonal denoising causes a training-inference gap since the model was trained on frames with equal noise levels, and now, it needs to denoise frames at different noise levels.
FIFO-Diffusion: Generating Infinite Videos from Text without Training
FIFO-Diffusion: Generating Infinite Videos…
FIFO-Diffusion: Generating Infinite Videos from Text without Training
Today's paper proposes FIFO-Diffusion, a new inference technique that enables generating infinitely long videos from text without any additional training, based solely on a pretrained diffusion model for short video clips. Method Overview The key idea is diagonal denoising, which processes a sequence of consecutive video frames with increasing noise levels in a first-in-first-out queue. Then, iteratively, the method dequeues a fully denoised frame at the head while enqueuing a new random noise frame at the tail. However, diagonal denoising causes a training-inference gap since the model was trained on frames with equal noise levels, and now, it needs to denoise frames at different noise levels.