Flexible Diffusion Modeling of Long Videos

Comparisons of FDM with baselines on each dataset

GQN-Mazes
MineRL
CARLA Town01

Or see

unconditional samples
a comparison of completions of training set videos with completions of test set videos
the results of running a model trained on CARLA Town01 on out-of-distribution videos from Town02

Or see the below videos over an hour long:

These are sampled on GQN-Mazes and MineRL by iterated application of our Hierarchy-2 sampling scheme, and on CARLA Town 01 with an autoregressive sampling scheme. We condition on the first 36 frames of a test video. Observed frames are shown with a red border, and we mark the end of the video with a checkerboard pattern.