TR2026-091

Partial Ring Scan: Revisiting Scan Order in Vision State Space Models


    •  Hsieh, Y.-K., Peng, K.-C., Li, X., Chang, M.-C., Tseng, Y.-C., Hsieh, J.-W., "Partial Ring Scan: Revisiting Scan Order in Vision State Space Models", International Conference on Machine Learning (ICML), July 2026.
      BibTeX TR2026-091 PDF
      • @inproceedings{Hsieh2026jul,
      • author = {Hsieh, Yi-Kuan and Peng, Kuan-Chuan and Li, Xin and Chang, Ming-Ching and Tseng, Yu-Chee and Hsieh, Jun-Wei},
      • title = {{Partial Ring Scan: Revisiting Scan Order in Vision State Space Models}},
      • booktitle = {International Conference on Machine Learning (ICML)},
      • year = 2026,
      • month = jul,
      • url = {https://www.merl.com/publications/TR2026-091}
      • }
  • MERL Contact:
  • Research Areas:

    Artificial Intelligence, Computer Vision, Machine Learning

Abstract:

State Space Models (SSMs) provide linear-time alternatives to attention for vision, but require serializing 2D images into 1D sequences using a predefined scan order. We identify scan order as a previously underexplored inductive bias that fundamentally shapes spatial dependency modeling in Vision SSMs. Fixed scan paths distort local adjacency, fragment object structure, and induce anisotropic representations that are brittle under geometric transformations such as rotation. We propose Partial RIng Scan Mamba (PRIS-Mamba), a rotation-robust traversal that decomposes images into concentric rings, per- forms permutation-invariant aggregation within each ring, and models cross-ring dependencies via short radial SSMs. This design induces a struc- tured factorization of spatial dependencies that preserves isotropy while maintaining linear complexity. To improve efficiency without sacrificing expressivity, we introduce partial channel filtering, selectively applying recurrent modeling to in- formative channels while routing others through a residual pathway. Empirically, PRIS-Mamba improves accuracy, efficiency, and rotation robustness over prior Vision SSMs on ImageNet-1K. Our results position scan-order design as a core representational choice in Vision SSMs, with implications for robustness and generalization beyond architectural scaling. The code will be re- leased upon paper acceptance.