TR2026-096
MIRROR: Multisensory Implicit Rejection-sampled RObotic policy
-
- , "MIRROR: Multisensory Implicit Rejection-sampled RObotic policy", ICML 2026 Workshop on Structured Probabilistic Inference & Generative Modeling, July 2026.BibTeX TR2026-096 PDF
- @inproceedings{Bhaskar2026jul,
- author = {Bhaskar, Amisha and Tokekar, Pratap and {Di Cairano}, Stefano and Schperberg, Alexander},
- title = {{MIRROR: Multisensory Implicit Rejection-sampled RObotic policy}},
- booktitle = {ICML 2026 Workshop on Structured Probabilistic Inference \& Generative Modeling},
- year = 2026,
- month = jul,
- url = {https://www.merl.com/publications/TR2026-096}
- }
- , "MIRROR: Multisensory Implicit Rejection-sampled RObotic policy", ICML 2026 Workshop on Structured Probabilistic Inference & Generative Modeling, July 2026.
-
MERL Contacts:
-
Research Areas:
Artificial Intelligence, Control, Dynamical Systems, Machine Learning, Robotics
Abstract:
Robotic imitation learning typically requires models that capture multimodal action distributions while operating at real-time control rates and accommodating multiple sensing modalities. Al- though recent generative approaches such as diffusion models, flow matching, and Implicit Maximum Likelihood Estimation (IMLE) have achieved promising results, they often satisfy only a subset of these requirements. To address this, we introduce MIRROR, a single-pass policy based on a batch-global rejection-sampling variant of IMLE. MIRROR couples a temporal multisensory encoder (integrating RGB, Depth, tactile, audio, and proprioception) with a linear-attention generator using a Performer architecture. We demonstrate the efficacy of MIRROR on a di- verse real-world hardware suite, including loco- manipulation using a Unitree GO2 with a 7- DoF arm D1 and tabletop manipulation with a UR5 manipulator. Across challenging physi- cal tasks such as pre-manipulation parking, high- precision insertion, and multi-object pick-and- place, MIRROR outperforms state-of-the-art dif- fusion policies by 10–25% in success rate while maintaining high-frequency (30–50 Hz) closed- loop control. We further validate our approach on large-scale simulation benchmarks, including CALVIN, MetaWorld, and Robomimic. In CALVIN (10% data split), MIRROR improves success rates by ∼25% over diffusion and ∼20% over flow matching, while simultaneously reducing trajectory jerk by 20×–50×. These results position MIRROR as a fast, accurate, and multi- sensory imitation policy that retains multimodal action coverage without the latency of iterative sampling

