Artificial Intelligence

Making machines smarter for improved safety, efficiency and comfort.

Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.

  • Researchers

  • Awards

    •  AWARD   Best Poster Award and Best Video Award at the International Society for Music Information Retrieval Conference (ISMIR) 2020
      Date: October 15, 2020
      Awarded to: Ethan Manilow, Gordon Wichern, Jonathan Le Roux
      MERL Contacts: Jonathan Le Roux; Gordon Wichern
      Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
      Brief
      • Former MERL intern Ethan Manilow and MERL researchers Gordon Wichern and Jonathan Le Roux won Best Poster Award and Best Video Award at the 2020 International Society for Music Information Retrieval Conference (ISMIR 2020) for the paper "Hierarchical Musical Source Separation". The conference was held October 11-14 in a virtual format. The Best Poster Awards and Best Video Awards were awarded by popular vote among the conference attendees.

        The paper proposes a new method for isolating individual sounds in an audio mixture that accounts for the hierarchical relationship between sound sources. Many sounds we are interested in analyzing are hierarchical in nature, e.g., during a music performance, a hi-hat note is one of many such hi-hat notes, which is one of several parts of a drumkit, itself one of many instruments in a band, which might be playing in a bar with other sounds occurring. Inspired by this, the paper re-frames the audio source separation problem as hierarchical, combining similar sounds together at certain levels while separating them at other levels, and shows on a musical instrument separation task that a hierarchical approach outperforms non-hierarchical models while also requiring less training data. The paper, poster, and video can be seen on the paper page on the ISMIR website.
    •  
    •  AWARD   Best Paper Award at the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) 2019
      Date: December 18, 2019
      Awarded to: Xuankai Chang, Wangyou Zhang, Yanmin Qian, Jonathan Le Roux, Shinji Watanabe
      MERL Contact: Jonathan Le Roux
      Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
      Brief
      • MERL researcher Jonathan Le Roux and co-authors Xuankai Chang, Shinji Watanabe (Johns Hopkins University), Wangyou Zhang, and Yanmin Qian (Shanghai Jiao Tong University) won the Best Paper Award at the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019), for the paper "MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition". MIMO-Speech is a fully neural end-to-end framework that can transcribe the text of multiple speakers speaking simultaneously from multi-channel input. The system is comprised of a monaural masking network, a multi-source neural beamformer, and a multi-output speech recognition model, which are jointly optimized only via an automatic speech recognition (ASR) criterion. The award was received by lead author Xuankai Chang during the conference, which was held in Sentosa, Singapore from December 14-18, 2019.
    •  
    •  AWARD   MERL Researchers win Best Paper Award at ICCV 2019 Workshop on Statistical Deep Learning in Computer Vision
      Date: October 27, 2019
      Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
      MERL Contact: Tim Marks
      Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
      Brief
      • MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
    •  

    See All Awards for Artificial Intelligence
  • News & Events


    See All News & Events for Artificial Intelligence
  • Research Highlights

  • Internships

    • CA1400: Autonomous Vehicle Planning and Control

      The Control and Dynamical Systems (CD) group at MERL is seeking highly motivated interns at different levels of expertise to conduct research on planning and control for autonomous vehicles. The research domain includes algorithms for path planning, vehicle control, high level decision making, sensor-based navigation, driver-vehicle interaction. PhD students will be considered for algorithm development and analysis, and property proving. Master students will be considered for development and implementation in a scaled robotic test bench for autonomous vehicles. For algorithm development and analysis it is highly desirable to have deep background in one or more among: sampling-based planning methods, particle filtering, model predictive control, reachability methods, formal methods and abstractions of dynamical systems, and experience with their implementation in Matlab/Python/C++. For algorithm implementation, it is required to have working knowledge of Matlab, C++, and ROS, and it is a plus to have background in some of the above mentioned methods. The expected duration of the internship is 3-6 months.

    • SP1468: Quantum Machine Learning

      MERL is seeking an intern to work on research for quantum machine learning (QML). The ideal candidate is an experienced PhD student or post-graduate researcher having an excellent background in quantum computing, deep learning, and signal processing. Proficient programming skills with PyTorch, Qiskit, and PennyLane will be additional assets to this position. Given the current situation with COVID-19 pandemic, this internship will be done remotely from where you live. Also note that we wish to fill this position as soon as possible and expect that the candidate will be available during this fall/winter.

    • SP1460: Advanced Vehicular Technologies

      MERL is seeking a highly motivated, qualified intern to collaborate with the Signal Processing group and the Control for Autonomy team in developing technologies for Connected Automated Vehicles. The ideal candidate is expected to be involved in research on collaborative learning between infrastructure and vehicles. The candidate is expected to develop learning-based technologies to achieve vehicle coordination, estimation and GNSS-based localization using data and computation sharing between vehicle and infrastructure. The candidates should have knowledge of machine learning, connected vehicles and V2X communications. Knowledge of one or more traffic and/or multi-vehicle simulators (SUMO, Vissim, etc.) and GNSS is a plus. Candidates in their junior or senior years of a Ph.D. program are encouraged to apply. The expected duration of the internship is 3-6 months, with start date in September/October 2020.


    See All Internships for Artificial Intelligence
  • Recent Publications

    •  Hori, T., Moritz, N., Hori, C., Le Roux, J., "Transformer-based Long-context End-to-end Speech Recognition", Annual Conference of the International Speech Communication Association (Interspeech), October 2020.
      BibTeX TR2020-139 PDF
      • @inproceedings{Hori2020oct,
      • author = {Hori, Takaaki and Moritz, Niko and Hori, Chiori and Le Roux, Jonathan},
      • title = {Transformer-based Long-context End-to-end Speech Recognition},
      • booktitle = {Annual Conference of the International Speech Communication Association (Interspeech)},
      • year = 2020,
      • month = oct,
      • url = {https://www.merl.com/publications/TR2020-139}
      • }
    •  Jayashankar, T., Le Roux, J., Moulin, P., "Detecting Audio Attacks on ASR Systems with Dropout Uncertainty", Annual Conference of the International Speech Communication Association (Interspeech), October 2020.
      BibTeX TR2020-137 PDF
      • @inproceedings{Jayashankar2020oct,
      • author = {Jayashankar, Tejas and Le Roux, Jonathan and Moulin, Pierre},
      • title = {Detecting Audio Attacks on ASR Systems with Dropout Uncertainty},
      • booktitle = {Annual Conference of the International Speech Communication Association (Interspeech)},
      • year = 2020,
      • month = oct,
      • url = {https://www.merl.com/publications/TR2020-137}
      • }
    •  Moritz, N., Wichern, G., Hori, T., Le Roux, J., "All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection", Annual Conference of the International Speech Communication Association (Interspeech), October 2020.
      BibTeX TR2020-138 PDF
      • @inproceedings{Moritz2020oct,
      • author = {Moritz, Niko and Wichern, Gordon and Hori, Takaaki and Le Roux, Jonathan},
      • title = {All-in-One Transformer: Unifying Speech Recognition, Audio Tagging, and Event Detection},
      • booktitle = {Annual Conference of the International Speech Communication Association (Interspeech)},
      • year = 2020,
      • month = oct,
      • url = {https://www.merl.com/publications/TR2020-138}
      • }
    •  Manilow, E., Wichern, G., Le Roux, J., "Hierarchical Musical Instrument Separation", International Society for Music Information Retrieval (ISMIR) Conference, October 2020.
      BibTeX TR2020-136 PDF
      • @inproceedings{Manilow2020oct,
      • author = {Manilow, Ethan and Wichern, Gordon and Le Roux, Jonathan},
      • title = {Hierarchical Musical Instrument Separation},
      • booktitle = {International Society for Music Information Retrieval (ISMIR) Conference},
      • year = 2020,
      • month = oct,
      • url = {https://www.merl.com/publications/TR2020-136}
      • }
    •  Tang, Y., Kojima, K., Koike-Akino, T., Wang, Y., Wu, P., TaherSima, M., Jha, D., Parsons, K., Qi, M., "Generative Deep Learning Model for Inverse Design of Integrated Nanophotonic Devices", Lasers and Photonics Reviews, DOI: 10.1002/lpor.202000287, Vol. 2020, pp. 2000287, October 2020.
      BibTeX TR2020-135 PDF
      • @article{Tang2020oct,
      • author = {Tang, Yingheng and Kojima, Keisuke and Koike-Akino, Toshiaki and Wang, Ye and Wu, Pengxiang and TaherSima, Mohammad and Jha, Devesh and Parsons, Kieran and Qi, Minghao},
      • title = {Generative Deep Learning Model for Inverse Design of Integrated Nanophotonic Devices},
      • journal = {Lasers and Photonics Reviews},
      • year = 2020,
      • volume = 2020,
      • pages = 2000287,
      • month = oct,
      • doi = {10.1002/lpor.202000287},
      • url = {https://www.merl.com/publications/TR2020-135}
      • }
    •  Seetharaman, P., Wichern, G., Pardo, B., Le Roux, J., "Autoclip: Adaptive Gradient Clipping For Source Separation Networks", IEEE International Workshop on Machine Learning for Signal Processing (MLSP), September 2020.
      BibTeX TR2020-132 PDF
      • @inproceedings{Seetharaman2020sep,
      • author = {Seetharaman, Prem and Wichern, Gordon and Pardo, Bryan and Le Roux, Jonathan},
      • title = {Autoclip: Adaptive Gradient Clipping For Source Separation Networks},
      • booktitle = {IEEE International Workshop on Machine Learning for Signal Processing (MLSP)},
      • year = 2020,
      • month = sep,
      • url = {https://www.merl.com/publications/TR2020-132}
      • }
    •  Kojima, K., Tang, Y., Koike-Akino, T., Wang, Y., Jha, D., Parsons, K., TaherSima, M., Sang, F., Klamkin, J., Qi, M., "Inverse Design of Nanophotonic Devices using Deep Neural Networks", Asia Communications and Photonics Conference (ACP), September 2020.
      BibTeX TR2020-130 PDF
      • @inproceedings{Kojima2020sep,
      • author = {Kojima, Keisuke and Tang, Yingheng and Koike-Akino, Toshiaki and Wang, Ye and Jha, Devesh and Parsons, Kieran and TaherSima, Mohammad and Sang, Fengqiao and Klamkin, Jonathan and Qi, Minghao},
      • title = {Inverse Design of Nanophotonic Devices using Deep Neural Networks},
      • booktitle = {Asia Communications and Photonics Conference (ACP)},
      • year = 2020,
      • month = sep,
      • url = {https://www.merl.com/publications/TR2020-130}
      • }
    •  Han, M., Ozdenizci, O., Wang, Y., Koike-Akino, T., Erdogmus, D., "Disentangled Adversarial Autoencoder for Subject-Invariant Physiological Feature Extraction", IEEE Signal Processing Letters, DOI: 10.1109/LSP.2020.3020215, Vol. 27, pp. 1565-1569, September 2020.
      BibTeX TR2020-128 PDF
      • @article{Han2020sep,
      • author = {Han, Mo and Ozdenizci, Ozan and Wang, Ye and Koike-Akino, Toshiaki and Erdogmus, Deniz},
      • title = {Disentangled Adversarial Autoencoder for Subject-Invariant Physiological Feature Extraction},
      • journal = {IEEE Signal Processing Letters},
      • year = 2020,
      • volume = 27,
      • pages = {1565--1569},
      • month = sep,
      • doi = {10.1109/LSP.2020.3020215},
      • issn = {1558-2361},
      • url = {https://www.merl.com/publications/TR2020-128}
      • }
    See All Publications for Artificial Intelligence
  • Software Downloads