Learning Library

← Back to Papers
Research Paper

PlenopticDreamer: Coherent Multi‑View Video Synthesis

Authors: Xiao Fu,
Organization: Hugging Face
Published: 2026-01-09 • Added: 2026-01-09

Key Insights

  • Introduces a camera‑guided retrieval module that pulls relevant latent frames from a pre‑built spatio‑temporal memory, ensuring consistent geometry across different viewpoints.
  • Employs progressive training (stage‑wise spatial then temporal finetuning) to stabilize GAN learning and significantly boost temporal coherence without sacrificing spatial detail.
  • Uses synchronized generative hallucination, where the generator receives both the current view’s pose and the retrieved multi‑view context, enabling faithful re‑rendering of dynamic scenes.
  • Demonstrates that jointly optimizing view consistency and motion continuity yields higher visual fidelity than treating each view or frame independently.

Abstract

PlenopticDreamer enables consistent multi-view video re-rendering through synchronized generative hallucinations, leveraging camera-guided retrieval and progressive training mechanisms for improved temporal coherence and visual fidelity.

Full Analysis

# PlenopticDreamer: Coherent Multi‑View Video Synthesis **Authors:** Xiao Fu, **Source:** [HuggingFace](https://huggingface.co/papers/2601.05239) | [arXiv](https://arxiv.org/abs/2601.05239) **Published:** 2026-01-09 **Organization:** Hugging Face ## Summary - Introduces a camera‑guided retrieval module that pulls relevant latent frames from a pre‑built spatio‑temporal memory, ensuring consistent geometry across different viewpoints. - Employs progressive training (stage‑wise spatial then temporal finetuning) to stabilize GAN learning and significantly boost temporal coherence without sacrificing spatial detail. - Uses synchronized generative hallucination, where the generator receives both the current view’s pose and the retrieved multi‑view context, enabling faithful re‑rendering of dynamic scenes. - Demonstrates that jointly optimizing view consistency and motion continuity yields higher visual fidelity than treating each view or frame independently. ## Abstract PlenopticDreamer enables consistent multi-view video re-rendering through synchronized generative hallucinations, leveraging camera-guided retrieval and progressive training mechanisms for improved temporal coherence and visual fidelity. --- *Topics: computer-vision, multimodal* *Difficulty: advanced* *Upvotes: 6*