Artificial Intelligence

Making machines smarter for improved safety, efficiency and comfort.

Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.

  • Researchers

  • Awards

    •  AWARD   Best Paper Award at the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
      Date: December 18, 2019
      Awarded to: Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe
      MERL Contact: Jonathan Le Roux
      Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
      Brief
      • MERL researcher Jonathan Le Roux and co-authors Xuankai Chang, Shinji Watanabe (Johns Hopkins University), Wangyou Zhang, and Yanmin Qian (Shanghai Jiao Tong University) won the Best Paper Award at the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019), for the paper "MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition". MIMO-Speech is a fully neural end-to-end framework that can transcribe the text of multiple speakers speaking simultaneously from multi-channel input. The system is comprised of a monaural masking network, a multi-source neural beamformer, and a multi-output speech recognition model, which are jointly optimized only via an automatic speech recognition (ASR) criterion. The award was received by lead author Xuankai Chang during the conference, which was held in Sentosa, Singapore from December 14-18, 2019.
    •  
    •  AWARD   MERL Researchers win Best Paper Award at ICCV 2019 Workshop on Statistical Deep Learning in Computer Vision
      Date: October 27, 2019
      Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
      MERL Contact: Tim Marks
      Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
      Brief
      • MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
    •  
    •  AWARD   MERL Researcher Devesh Jha Wins the Rudolf Kalman Best Paper Award 2019
      Date: October 10, 2019
      Awarded to: Devesh Jha, Nurali Virani, Zhenyuan Yuan, Ishana Shekhawat and Asok Ray
      MERL Contact: Devesh Jha
      Research Areas: Artificial Intelligence, Control, Data Analytics, Machine Learning, Robotics
      Brief
      • MERL researcher Devesh Jha has won the Rudolf Kalman Best Paper Award 2019 for the paper entitled "Imitation of Demonstrations Using Bayesian Filtering With Nonparametric Data-Driven Models". This paper, published in a Special Commemorative Issue for Rudolf E. Kalman in the ASME JDSMC in March 2018, uses Bayesian filtering for imitation learning in Hidden Mode Hybrid Systems. This award is given annually by the Dynamic Systems and Control Division of ASME to the authors of the best paper published in the ASME Journal of Dynamic Systems Measurement and Control during the preceding year.
    •  

    See All Awards for Artificial Intelligence
  • News & Events

    •  NEWS   MERL's Scene-Aware Interaction Technology Featured in Mitsubishi Electric Corporation Press Release
      Date: July 22, 2020
      Where: Tokyo, Japan
      MERL Contacts: Siheng Chen; Anoop Cherian; Bret Harsham; Chiori Hori; Takaaki Hori; Jonathan Le Roux; Tim Marks; Alan Sullivan; Anthony Vetro
      Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
      Brief
      • Mitsubishi Electric Corporation announced that the company has developed what it believes to be the world’s first technology capable of highly natural and intuitive interaction with humans based on a scene-aware capability to translate multimodal sensing information into natural language.

        The novel technology, Scene-Aware Interaction, incorporates Mitsubishi Electric’s proprietary Maisart® compact AI technology to analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.

        Scene-Aware Interaction for car navigation, one target application, will provide drivers with intuitive route guidance. The technology is also expected to have applicability to human-machine interfaces for in-vehicle infotainment, interaction with service robots in building and factory automation systems, systems that monitor the health and well-being of people, surveillance systems that interpret complex scenes for humans and encourage social distancing, support for touchless operation of equipment in public areas, and much more. The technology is based on recent research by MERL's Speech & Audio and Computer Vision groups.


        Demonstration Video:



        Link:

        Mitsubishi Electric Corporation Press Release
    •  
    •  NEWS   Jonathan Le Roux gives Plenary Lecture at the JSALT 2020 Summer Workshop
      Date: July 10, 2020
      Where: Virtual Baltimore, MD
      MERL Contact: Jonathan Le Roux
      Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
      Brief
      • MERL Senior Principal Research Scientist and Speech and Audio Senior Team Leader Jonathan Le Roux was invited by the Center for Language and Speech Processing at Johns Hopkins University to give a plenary lecture at the 2020 Frederick Jelinek Memorial Summer Workshop on Speech and Language Technology (JSALT). The talk, entitled "Deep Learning for Multifarious Speech Processing: Tackling Multiple Speakers, Microphones, and Languages", presented an overview of deep learning techniques developed at MERL towards the goal of cracking the Tower of Babel version of the cocktail party problem, that is, separating and/or recognizing the speech of multiple unknown speakers speaking simultaneously in multiple languages, in both single-channel and multi-channel scenarios: from deep clustering to chimera networks, phasebook and friends, and from seamless ASR to MIMO-Speech and Transformer-based multi-speaker ASR.

        JSALT 2020 is the seventh in a series of six-week-long research workshops on Machine Learning for Speech Language and Computer Vision Technology. A continuation of the well known Johns Hopkins University summer workshops, these workshops bring together diverse "dream teams" of leading professionals, graduate students, and undergraduates, in a truly cooperative, intensive, and substantive effort to advance the state of the science. MERL researchers led such teams in the JSALT 2015 workshop, on "Far-Field Speech Enhancement and Recognition in Mismatched Settings", and the JSALT 2018 workshop, on "Multi-lingual End-to-End Speech Recognition for Incomplete Data".
    •  

    See All News & Events for Artificial Intelligence
  • Research Highlights

  • Internships

    • SP1424: Advanced computational sensing technologies

      The Computational Sensing team at MERL is seeking motivated and qualified individuals to develop computational imaging algorithms for a variety of sensing applications. Ideal candidates should be Ph.D. students and have solid background and publication record in any of the following, or related areas: imaging inverse problems, learning for inverse problems, large-scale optimization, blind inverse scattering, radar/lidar/sonar imaging, or wave-based inversion. Experience with experimentally measured data is desirable. Publication of the results produced during our internships is expected. The duration of the internships is anticipated to be 3-6 months. Start date is flexible.

    • SP1419: Simulation of Multimodal Sensors

      MERL is seeking a motivated intern to assist in generating simulated multimodal data for machine learning applications. The project involves integrating several existing software components to generate optical and radar data in a variety of sensing scenarios, and executing the simulations under a variety of conditions. The ideal candidate should have experience with C++, Python, and scripting methods. Some knowledge or experience with Blender, computer graphics, and computer vision would be preferred, but is not required. Project duration is flexible in the range of 1-2 months. Intern has the choice of part-time or full-time occupation and may start immediately.

    • MD1441: Advanced Phased Array Transceiver

      MERL is looking for a highly motivated, and qualified individual to join our internship program of advanced phased array research. The ideal candidate should be a senior Ph.D. student with rich experience in beam forming technologies. Knowledge of wireless communication, transceiver architecture, and digital signal processing, FPGA and/or Matlab programming skills are required. RF circuits knowledge will be a plus. Duration is 3-6 months with a flexible start date.


    See All Internships for Artificial Intelligence
  • Recent Publications

    •  Han, M., Ozdenizci, O., Wang, Y., Koike-Akino, T., Erdogmus, D., "Disentangled Adversarial Transfer Learning for Physiological Biosignals", International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), July 2020.
      BibTeX TR2020-109 PDF Video
      • @inproceedings{Han2020jul,
      • author = {Han, Mo and Ozdenizci, Ozan and Wang, Ye and Koike-Akino, Toshiaki and Erdogmus, Deniz},
      • title = {Disentangled Adversarial Transfer Learning for Physiological Biosignals},
      • booktitle = {International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)},
      • year = 2020,
      • month = jul,
      • url = {https://www.merl.com/publications/TR2020-109}
      • }
    •  Seetharaman, P., Wichern, G., Le Roux, J., Pardo, B., "Bootstrapping Unsupervised Deep Music Separation from Primitive Auditory Grouping Principles", ICML 2020 Workshop on Self-supervision in Audio and Speech, July 2020.
      BibTeX TR2020-111 PDF
      • @inproceedings{Seetharaman2020jul,
      • author = {Seetharaman, Prem and Wichern, Gordon and Le Roux, Jonathan and Pardo, Bryan},
      • title = {Bootstrapping Unsupervised Deep Music Separation from Primitive Auditory Grouping Principles},
      • booktitle = {ICML 2020 Workshop on Self-supervision in Audio and Speech},
      • year = 2020,
      • month = jul,
      • url = {https://www.merl.com/publications/TR2020-111}
      • }
    •  Cherian, A., Aeron, S., "Representation Learning via Adversarially-Contrastive Optimal Transport", International Conference on Machine Learning (ICML), July 2020.
      BibTeX TR2020-093 PDF
      • @inproceedings{Cherian2020jul,
      • author = {Cherian, Anoop and Aeron, Shuchin},
      • title = {Representation Learning via Adversarially-Contrastive Optimal Transport},
      • booktitle = {International Conference on Machine Learning (ICML)},
      • year = 2020,
      • month = jul,
      • url = {https://www.merl.com/publications/TR2020-093}
      • }
    •  Koike-Akino, T., Wang, Y., "Stochastic Bottleneck: Rateless Auto-Encoder for Flexible Dimensionality Reduction", IEEE International Symposium on Information Theory (ISIT), June 2020.
      BibTeX TR2020-075 PDF Video Presentation
      • @inproceedings{Koike-Akino2020jun,
      • author = {Koike-Akino, Toshiaki and Wang, Ye},
      • title = {Stochastic Bottleneck: Rateless Auto-Encoder for Flexible Dimensionality Reduction},
      • booktitle = {IEEE International Symposium on Information Theory (ISIT)},
      • year = 2020,
      • month = jun,
      • url = {https://www.merl.com/publications/TR2020-075}
      • }
    •  Hu, Y., Chen, S., Zhang, Y., Gu, X., "Collaborative Motion Prediction via Neural Motion Message Passing", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
      BibTeX TR2020-072 PDF
      • @inproceedings{Hu2020jun,
      • author = {Hu, Yue and Chen, Siheng and Zhang, Ya and Gu, Xiao},
      • title = {Collaborative Motion Prediction via Neural Motion Message Passing},
      • booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      • year = 2020,
      • month = jun,
      • url = {https://www.merl.com/publications/TR2020-072}
      • }
    •  Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., Tia, Q., "Dynamic Multiscale Graph Neural Networks for 3D Skeleton-Based Human Motion Prediction", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
      BibTeX TR2020-073 PDF
      • @inproceedings{Li2020jun,
      • author = {Li, Maosen and Chen, Sihen and Zhao, Yangheng and Zhang, Ya and Wang, Yanfeng and Tia, Qi},
      • title = {Dynamic Multiscale Graph Neural Networks for 3D Skeleton-Based Human Motion Prediction},
      • booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      • year = 2020,
      • month = jun,
      • url = {https://www.merl.com/publications/TR2020-073}
      • }
    •  Wang, Y., Koike-Akino, T., "Learning to Modulate for Non-coherent MIMO", IEEE International Conference on Communications (ICC), June 2020.
      BibTeX TR2020-071 PDF Video Presentation
      • @inproceedings{Wang2020jun,
      • author = {Wang, Ye and Koike-Akino, Toshiaki},
      • title = {Learning to Modulate for Non-coherent MIMO},
      • booktitle = {IEEE International Conference on Communications (ICC)},
      • year = 2020,
      • month = jun,
      • url = {https://www.merl.com/publications/TR2020-071}
      • }
    •  Kumar, A., Marks, T., Mou, W., Wang, Y., Cherian, A., Jones, M.J., Liu, X., Koike-Akino, T., Feng, C., "LUVLi Face Alignment: Estimating Landmarks’ Location, Uncertainty, and Visibility Likelihood", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2020.
      BibTeX TR2020-067 PDF Video Data
      • @inproceedings{Kumar2020jun,
      • author = {Kumar, Abhinav and Marks, Tim and Mou, Wenxuan and Wang, Ye and Cherian, Anoop and Jones, Michael J. and Liu, Xiaoming and Koike-Akino, Toshiaki and Feng, Chen},
      • title = {LUVLi Face Alignment: Estimating Landmarks’ Location, Uncertainty, and Visibility Likelihood},
      • booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
      • year = 2020,
      • month = jun,
      • url = {https://www.merl.com/publications/TR2020-067}
      • }
    See All Publications for Artificial Intelligence
  • Software Downloads