FUTR3D: A Unified Sensor Fusion Framework for 3D Detection
Xuanyao Chen*Tianyuan Zhang*Yue Wang3Yilun Wang4Hang Zhao5 
1Fudan University, 2CMU, 3MIT,
4Li Auto, 5Tsinghua University
Webpage | Code | Paper

A unified sensor fusion framework that works with arbitrary sensor combinations and performs competitively with various customized state-of-the-art models. FUTR3D can work with camera-LiDAR fusion, camera-radar fusion, camera-LiDAR-radar fusion.


Sensor fusion is an essential topic in many perception systems, such as autonomous driving and robotics. Existing multi-modal 3D detection models usually involve customized designs depending on the sensor combinations or setups. In this work, we propose the first unified end-to-end sensor fusion framework for 3D detection, named FUTR3D, which can be used in (almost) any sensor configuration. FUTR3D employs a query-based Modality-Agnostic Feature Sampler (MAFS), together with a transformer decoder with a set-to-set loss for 3D detection, thus avoiding using late fusion heuristics and post-processing tricks. We validate the effectiveness of our framework on various combinations of cameras, low-resolution LiDARs, high-resolution LiDARs, and Radars. On NuScenes dataset, FUTR3D achieves better performance over specifically designed methods across different sensor combinations. Moreover, FUTR3D achieves great flexibility with different sensor configurations and enables low-cost autonomous driving. For example, only using a 4-beam LiDAR with cameras, FUTR3D (56.8 mAP) achieves on par performance with state-of-the-art 3D detection model CenterPoint (56.6 mAP) using a 32-beam LiDAR.


Unified sensor fusion framework. FUTR3D first encodes features for each modality individually, and then employs a query-based Modality-Agnostic Feature Sampler (MAFS) that works in a unified domain and extract features from different modalities. Finally, a transformer decoder operates on a set of 3D queries and performs set predictions of objects. The contributions of our work are the following:

Related Projects on VCAD (Vision-Centric Autonomous Driving)
BEV Mapping

BEV Vectorized Mapping

BEV Detection

BEV Tracking


If you find our work useful in your research, please cite our paper:

  title={FUTR3D: A Unified Sensor Fusion Framework for 3D Detection},
  author={Chen, Xuanyao and Zhang, Tianyuan and Wang, Yue and Wang, Yilun and Zhao, Hang},
  journal={arXiv preprint arXiv:2203.10642},