These are sampled on GQN-Mazes and MineRL by iterated application of our Hierarchy-2 sampling scheme, and on CARLA Town 01 with an autoregressive sampling scheme.
We condition on the first 36 frames of a test video. Observed frames are shown with a red border, and we mark the end of the video with a checkerboard pattern.