Learning Library

← Back to Library

Robotics

3 items in this topic

Paper

Pixel‑Perfect Diffusion Transformers for Depth Estimation

  • Introduces **Pixel‑Perfect Depth (PPD)**, a monocular depth model that operates directly in pixel space using diffusion transformers, eliminating flying pixels and preserving fine scene details.
  • **Semantics‑Prompted DiT** injects high‑level semantic embeddings from large vision foundation models into the diffusion process, guiding global structure while still allowing the model to recover sharp local geometry.
Paper

One‑Shot Functional Dexterous Grasp Learning via Synthetic Transfer

  • A correspondence‑based data engine turns a single human demonstration into thousands of high‑quality, category‑wide synthetic training examples by morphing object meshes, transferring the expert grasp, and locally optimizing it.
  • The generated dataset encodes both semantic (tool function) and geometric cues, enabling a multimodal network to predict grasps that respect the intended usage (e.g., pulling, cutting).
Paper

Visual Identity Prompted Multi‑View Video Augmentation for Robotics

  • Introducing “visual identity prompting” supplies diffusion models with explicit object cues, enabling generation of consistent multi‑view videos that preserve object appearance across frames.
  • The generated videos serve as high‑fidelity data augmentations, enriching the visual diversity of manipulation datasets without manual collection.