TR2025-078

Multimodal 3D Object Detection on Unseen Domains


    •  Hegde, D., Lohit, S., Peng, K.-C., Jones, M.J., Patel, V.M., "Multimodal 3D Object Detection on Unseen Domains", IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop, June 2025.
      BibTeX TR2025-078 PDF
      • @inproceedings{Hegde2025jun,
      • author = {Hegde, Deepti and Lohit, Suhas and Peng, Kuan-Chuan and Jones, Michael J. and Patel, Vishal M.},
      • title = {{Multimodal 3D Object Detection on Unseen Domains}},
      • booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop},
      • year = 2025,
      • month = jun,
      • url = {https://www.merl.com/publications/TR2025-078}
      • }
  • MERL Contacts:
  • Research Areas:

    Artificial Intelligence, Computer Vision, Machine Learning

Abstract:

LiDAR datasets for autonomous driving exhibit biases in properties such as point cloud density, range, and object dimensions. As a result, object detection networks trained and evaluated in different environments often experience performance degrada- tion. Domain adaptation approaches assume access to unannotated samples from the test distribution to address this problem. However, in the real world, the exact conditions of deployment and access to samples representative of the test dataset may be unavailable while training. We argue that the more realistic and challenging formulation is to require robustness in performance to unseen target domains. We propose to address this problem in a two-pronged manner. First, we leverage paired LiDAR-image data present in most autonomous driving datasets to perform multimodal object detection. We suggest that working with multimodal features by leveraging both images and LiDAR point clouds for scene understanding tasks results in object detectors more robust to unseen domain shifts. Second, we train a 3D object detector to learn multimodal object features across different distributions and promote feature invariance across these source domains to improve generalizability to unseen target domains. To this end, we propose CLIX3D, a multimodal fusion and supervised contrastive learning framework for 3D object detection that performs alignment of object features from same- class samples of different domains while pushing the features from different classes apart. We show that CLIX3Dyields state- of-the-art domain generalization performance under multiple dataset shifts.

 

  • Related Publication

  •  Hegde, D., Lohit, S., Peng, K.-C., Jones, M.J., Patel, V.M., "Multimodal 3D Object Detection on Unseen Domains", arXiv, April 2024.
    BibTeX arXiv
    • @article{Hegde2024apr,
    • author = {Hegde, Deepti and Lohit, Suhas and Peng, Kuan-Chuan and Jones, Michael J. and Patel, Vishal M.},
    • title = {{Multimodal 3D Object Detection on Unseen Domains}},
    • journal = {arXiv},
    • year = 2024,
    • month = apr,
    • url = {https://arxiv.org/abs/2404.11764}
    • }