Visual Identity Prompted Multi‑View Video Augmentation for Robotics

← Back to Papers

Research Paper

Visual Identity Prompted Multi‑View Video Augmentation for Robotics

Authors: Boyang Wang,

robotics computer-vision reinforcement-learning advanced ▲ 19 • arXiv ↗ • HuggingFace ↗

Organization: Hugging Face

Published: 2026-01-09 • Added: 2026-01-09

Key Insights

Introducing “visual identity prompting” supplies diffusion models with explicit object cues, enabling generation of consistent multi‑view videos that preserve object appearance across frames.
The generated videos serve as high‑fidelity data augmentations, enriching the visual diversity of manipulation datasets without manual collection.
Training robot policies on this augmented data yields measurable gains in success rates and robustness, both in simulation and on real‑world robot platforms.
The approach is model‑agnostic: any diffusion‑based video generator can be guided by identity prompts, making it easy to integrate with existing simulation pipelines.

Abstract

Visual identity prompting enhances manipulation data augmentation for robot policies by providing explicit visual guidance to diffusion models, improving policy performance in both simulation and real-world settings.

Full Analysis

# Visual Identity Prompted Multi‑View Video Augmentation for Robotics **Authors:** Boyang Wang, **Source:** [HuggingFace](https://huggingface.co/papers/2601.05241) | [arXiv](https://arxiv.org/abs/2601.05241) **Published:** 2026-01-09 **Organization:** Hugging Face ## Summary - Introducing “visual identity prompting” supplies diffusion models with explicit object cues, enabling generation of consistent multi‑view videos that preserve object appearance across frames. - The generated videos serve as high‑fidelity data augmentations, enriching the visual diversity of manipulation datasets without manual collection. - Training robot policies on this augmented data yields measurable gains in success rates and robustness, both in simulation and on real‑world robot platforms. - The approach is model‑agnostic: any diffusion‑based video generator can be guided by identity prompts, making it easy to integrate with existing simulation pipelines. ## Abstract Visual identity prompting enhances manipulation data augmentation for robot policies by providing explicit visual guidance to diffusion models, improving policy performance in both simulation and real-world settings. --- *Topics: robotics, computer-vision, reinforcement-learning* *Difficulty: advanced* *Upvotes: 19*