Flexible Diffusion Modeling of Long Videos

Comparisons of FDM with baselines on each dataset

Or see

Or see the below videos over an hour long:

These are sampled on GQN-Mazes and MineRL by iterated application of our Hierarchy-2 sampling scheme, and on CARLA Town 01 with an autoregressive sampling scheme. We condition on the first 36 frames of a test video. Observed frames are shown with a red border, and we mark the end of the video with a checkerboard pattern.