Beyond the Patch:
Exploring Vulnerabilities of Visuomotor Policies via Viewpoint-Consistent 3D Adversarial Object

SGVR Lab at KAIST
ICRA 2026

Visuomotor policy deception via viewpoint-consistent 3D adversarial object in the real-world. Left: The policy successfully guides the robot to its target $O_\text{goal}$. Right: Our adversarial object $O_\text{adv}$ manipulates the visual input, compelling the policy to misguide the robot towards itself instead of the true target $O_\text{goal}$.

Overview

Visuomotor policies, while powerful, exhibit critical vulnerabilities when faced with physical-world adversarial perturbations. These vulnerabilities are particularly pronounced in wrist-mounted camera setups, where continuous robot movements and varying poses induce significant viewpoint shifts that neutralize conventional 2D patches through perspective distortion. We address these limitations by introducing a viewpoint-consistent 3D adversarial object designed to systematically mislead robot manipulation tasks in real-world environments.


Main Contributions

  • Viewpoint-Consistent 3D Attack: A novel method to optimize textures over 3D meshes that maintain attack efficacy despite continuous camera movement.
  • C2F & Saliency-Guided Optimization: Dual strategies to ensure robustness across distances and effectively redirect policy attention from the true target.
  • First Systematic Analysis: The first comprehensive study on how 3D physical adversarial objects affect end-to-end visuomotor manipulation policies.

Methodology

Method Pipeline
Overview of the proposed method. (a) C2F Pose Scheduling: Viewpoint sampling via Beta distribution to shift focus from Coarse to Fine stages. (b) Optimization Pipeline: Texture optimization using EOT and policy rollouts to generate the 3D adversarial object.

Experimental Results

Effect of our 3D object

Our 3D adversarial object provides viewpoint-consistent perturbations that significantly outperform 2D patches, especially at oblique angles where 2D efficacy collapses due to perspective distortion.




Real world experiments

We demonstrate the robust sim-to-real transferability of our attack using a Fetch robot and a wrist-mounted RealSense camera. The attack maintains high success rates under various challenging scenarios: Dynamic Objects, Occluded Scenes, and Dynamic Lighting.

Supplementary Video

This video provides a comprehensive overview of our method and experimental results in both simulation and real-world environments.

BibTeX


        @misc{lee2026patchexploringvulnerabilitiesvisuomotor,
              title={Beyond the Patch: Exploring Vulnerabilities of Visuomotor Policies via Viewpoint-Consistent {3D} Adversarial Object}, 
              author={Chanmi Lee and Minsung Yoon and Woojae Kim and Sebin Lee and Sung-eui Yoon},
              year={2026},
              eprint={2603.04913},
              archivePrefix={arXiv},
              primaryClass={cs.RO},
              url={https://arxiv.org/abs/2603.04913}, 
        }