SLAM-Former

Putting SLAM into One Transformer

IIIS, Tsinghua University

Abstract

We present SLAM-Former, a novel neural approach that integrates full SLAM capabilities into a single transformer. Similar to traditional SLAM systems, SLAM-Former comprises both a frontend and a backend that operate in tandem. The frontend processes sequential monocular images in real-time for incremental mapping and tracking, while the backend performs global refinement to ensure a geometrically consistent result. This alternating execution allows the frontend and backend to mutually promote one another, enhancing overall system performance. Comprehensive experimental results demonstrate that SLAM-Former achieves superior or highly competitive performance compared to state-of-the-art dense SLAM methods.

SLAM-Former

SLAM-Former consists of a frontend and a backend within the same Transformer architecture, working in cooperation.
The training strategy is designed to enable a single trans- former to handle both frontend and backend SLAM func- tionalities.

Demonstration

BibTeX

@article{slam-former,
      title={SLAM-Former: Putting SLAM into One Transformer}, 
      author={Yijun Yuan, Zhuoguang Chen, Kenan Li, Weibang Wang, and Hang Zhao},
      journal={arXiv preprint arXiv:2509.xxxxx},
      year={2025}
}