Artificial Intelligence
Making machines smarter for improved safety, efficiency and comfort.
Our AI research encompasses advances in computer vision, speech and audio processing, as well as data analytics. Key research themes include improved perception based on machine learning techniques, learning control policies through model-based reinforcement learning, as well as cognition and reasoning based on learned semantic representations. We apply our work to a broad range of automotive and robotics applications, as well as building and home systems.
Quick Links
-
Researchers
Jonathan
Le Roux
Toshiaki
Koike-Akino
Ye
Wang
Gordon
Wichern
Anoop
Cherian
Tim K.
Marks
Chiori
Hori
Michael J.
Jones
Daniel N.
Nikovski
Kieran
Parsons
Devesh K.
Jha
Philip V.
Orlik
Petros T.
Boufounos
Matthew
Brand
Suhas
Lohit
Hassan
Mansour
Diego
Romeres
Siddarth
Jain
William S.
Yerazunis
Pu
(Perry)
WangMouhacine
Benosman
Francois
Germain
Arvind
Raghunathan
Radu
Corcodel
Kuan-Chuan
Peng
Hongbo
Sun
Yebin
Wang
Jianlin
Guo
Chungwei
Lin
Bingnan
Wang
Stefano
Di Cairano
Yanting
Ma
Anthony
Vetro
Jinyun
Zhang
Jose
Amaya
Karl
Berntorp
Ankush
Chakrabarty
Vedang M.
Deshpande
Marcus
Greiff
Dehong
Liu
Wataru
Tsujita
Abraham P.
Vinod
Jing
Liu
Zexu
Pan
-
Awards
-
AWARD Joint University of Padua-MERL team wins Challenge 'AI Olympics With RealAIGym' Date: August 25, 2023
Awarded to: Alberto Dalla Libera, Niccolo' Turcato, Giulio Giacomuzzo, Ruggero Carli, Diego Romeres
MERL Contact: Diego Romeres
Research Areas: Artificial Intelligence, Machine Learning, RoboticsBrief- A joint team consisting of members of University of Padua and MERL ranked 1st in the IJCAI2023 Challenge "Al Olympics With RealAlGym: Is Al Ready for Athletic Intelligence in the Real World?". The team was composed by MERL researcher Diego Romeres and a team from University Padua (UniPD) consisting of Alberto Dalla Libera, Ph.D., Ph.D. Candidates: Niccolò Turcato, Giulio Giacomuzzo and Prof. Ruggero Carli from University of Padua.
The International Joint Conference on Artificial Intelligence (IJCAI) is a premier gathering for AI researchers and organizes several competitions. This year the competition CC7 "AI Olympics With RealAIGym: Is AI Ready for Athletic Intelligence in the Real World?" consisted of two stages: simulation and real-robot experiments on two under-actuated robotic systems. The two robotics systems were treated as separate tracks and one final winner was selected for each track based on specific performance criteria in the control tasks.
The UniPD-MERL team competed and won in both tracks. The team's system made strong use of a Model-based Reinforcement Learning algorithm called (MC-PILCO) that we recently published in the journal IEEE Transaction on Robotics.
- A joint team consisting of members of University of Padua and MERL ranked 1st in the IJCAI2023 Challenge "Al Olympics With RealAlGym: Is Al Ready for Athletic Intelligence in the Real World?". The team was composed by MERL researcher Diego Romeres and a team from University Padua (UniPD) consisting of Alberto Dalla Libera, Ph.D., Ph.D. Candidates: Niccolò Turcato, Giulio Giacomuzzo and Prof. Ruggero Carli from University of Padua.
-
AWARD MERL Intern and Researchers Win ICASSP 2023 Best Student Paper Award Date: June 9, 2023
Awarded to: Darius Petermann, Gordon Wichern, Aswin Subramanian, Jonathan Le Roux
MERL Contacts: Jonathan Le Roux; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & AudioBrief- Former MERL intern Darius Petermann (Ph.D. Candidate at Indiana University) has received a Best Student Paper Award at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023) for the paper "Hyperbolic Audio Source Separation", co-authored with MERL researchers Gordon Wichern and Jonathan Le Roux, and former MERL researcher Aswin Subramanian. The paper presents work performed during Darius's internship at MERL in the summer 2022. The paper introduces a framework for audio source separation using embeddings on a hyperbolic manifold that compactly represent the hierarchical relationship between sound sources and time-frequency features. Additionally, the code associated with the paper is publicly available at https://github.com/merlresearch/hyper-unmix.
ICASSP is the flagship conference of the IEEE Signal Processing Society (SPS). ICASSP 2023 was held in the Greek island of Rhodes from June 04 to June 10, 2023, and it was the largest ICASSP in history, with more than 4000 participants, over 6128 submitted papers and 2709 accepted papers. Darius’s paper was first recognized as one of the Top 3% of all papers accepted at the conference, before receiving one of only 5 Best Student Paper Awards during the closing ceremony.
- Former MERL intern Darius Petermann (Ph.D. Candidate at Indiana University) has received a Best Student Paper Award at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023) for the paper "Hyperbolic Audio Source Separation", co-authored with MERL researchers Gordon Wichern and Jonathan Le Roux, and former MERL researcher Aswin Subramanian. The paper presents work performed during Darius's internship at MERL in the summer 2022. The paper introduces a framework for audio source separation using embeddings on a hyperbolic manifold that compactly represent the hierarchical relationship between sound sources and time-frequency features. Additionally, the code associated with the paper is publicly available at https://github.com/merlresearch/hyper-unmix.
-
AWARD MERL’s Paper on Wi-Fi Sensing Earns Top 3% Paper Recognition at ICASSP 2023, Selected as a Best Student Paper Award Finalist Date: June 9, 2023
Awarded to: Cristian J. Vaca-Rubio, Pu Wang, Toshiaki Koike-Akino, Ye Wang, Petros Boufounos and Petar Popovski
MERL Contacts: Petros T. Boufounos; Toshiaki Koike-Akino; Pu (Perry) Wang; Ye Wang
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Dynamical Systems, Machine Learning, Signal ProcessingBrief- A MERL Paper on Wi-Fi sensing was recognized as a Top 3% Paper among all 2709 accepted papers at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023). Co-authored by Cristian Vaca-Rubio and Petar Popovski from Aalborg University, Denmark, and MERL researchers Pu Wang, Toshiaki Koike-Akino, Ye Wang, and Petros Boufounos, the paper "MmWave Wi-Fi Trajectory Estimation with Continous-Time Neural Dynamic Learning" was also a Best Student Paper Award finalist.
Performed during Cristian’s stay at MERL first as a visiting Marie Skłodowska-Curie Fellow and then as a full-time intern in 2022, this work capitalizes on standards-compliant Wi-Fi signals to perform indoor localization and sensing. The paper uses a neural dynamic learning framework to address technical issues such as low sampling rate and irregular sampling intervals.
ICASSP, a flagship conference of the IEEE Signal Processing Society (SPS), was hosted on the Greek island of Rhodes from June 04 to June 10, 2023. ICASSP 2023 marked the largest ICASSP in history, boasting over 4000 participants and 6128 submitted papers, out of which 2709 were accepted.
- A MERL Paper on Wi-Fi sensing was recognized as a Top 3% Paper among all 2709 accepted papers at the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2023). Co-authored by Cristian Vaca-Rubio and Petar Popovski from Aalborg University, Denmark, and MERL researchers Pu Wang, Toshiaki Koike-Akino, Ye Wang, and Petros Boufounos, the paper "MmWave Wi-Fi Trajectory Estimation with Continous-Time Neural Dynamic Learning" was also a Best Student Paper Award finalist.
See All Awards for Artificial Intelligence -
-
News & Events
-
NEWS MERL researchers present 3 papers on Dexterous Manipulation at RSS 23. Date: July 11, 2023
Where: Daegu, Korea
MERL Contacts: Siddarth Jain; Devesh K. Jha; Arvind Raghunathan
Research Areas: Artificial Intelligence, Machine Learning, RoboticsBrief- MERL researchers presented 3 papers at the 19th edition of Robotics:Science and Systems Conference in Daegu, Korea. RSS is the flagship conference of the RSS foundation and is run as a single track conference presenting a limited number of high-quality papers. This year the main conference had a total of 112 papers presented. MERL researchers presented 2 papers in the main conference on planning and perception for dexterous manipulation. Another paper was presented in a workshop of learning for dexterous manipulation. More details can be found here https://roboticsconference.org.
-
NEWS MERL researchers presenting four papers and co-organizing a workshop at CVPR 2023 Date: June 18, 2023 - June 22, 2023
Where: Vancouver/Canada
MERL Contacts: Anoop Cherian; Michael J. Jones; Suhas Lohit; Kuan-Chuan Peng
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- MERL researchers are presenting 4 papers and co-organizing a workshop at the CVPR 2023 conference, which will be held in Vancouver, Canada June 18-22. CVPR is one of the most prestigious and competitive international conferences in computer vision. Details are provided below.
1. “Are Deep Neural Networks SMARTer than Second Graders,” by Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Kevin Smith, and Joshua B. Tenenbaum
We present SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed for children in the 6-8 age group. Our experiments using SMART-101 reveal that powerful deep models are not better than random accuracy when analyzed for generalization. We also evaluate large language models (including ChatGPT) on a subset of SMART-101 and find that while these models show convincing reasoning abilities, their answers are often incorrect.
Paper: https://arxiv.org/abs/2212.09993
2. “EVAL: Explainable Video Anomaly Localization,” by Ashish Singh, Michael J. Jones, and Erik Learned-Miller
This work presents a method for detecting unusual activities in videos by building a high-level model of activities found in nominal videos of a scene. The high-level features used in the model are human understandable and include attributes such as the object class and the directions and speeds of motion. Such high-level features allow our method to not only detect anomalous activity but also to provide explanations for why it is anomalous.
Paper: https://arxiv.org/abs/2212.07900
3. "Aligning Step-by-Step Instructional Diagrams to Video Demonstrations," by Jiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez, and Stephen Gould
The rise of do-it-yourself (DIY) videos on the web has made it possible even for an unskilled person (or a skilled robot) to imitate and follow instructions to complete complex real world tasks. In this paper, we consider the novel problem of aligning instruction steps that are depicted as assembly diagrams (commonly seen in Ikea assembly manuals) with video segments from in-the-wild videos. We present a new dataset: Ikea Assembly in the Wild (IAW) and propose a contrastive learning framework for aligning instruction diagrams with video clips.
Paper: https://arxiv.org/pdf/2303.13800.pdf
4. "HaLP: Hallucinating Latent Positives for Skeleton-Based Self-Supervised Learning of Actions," by Anshul Shah, Aniket Roy, Ketul Shah, Shlok Kumar Mishra, David Jacobs, Anoop Cherian, and Rama Chellappa
In this work, we propose a new contrastive learning approach to train models for skeleton-based action recognition without labels. Our key contribution is a simple module, HaLP: Hallucinating Latent Positives for contrastive learning. HaLP explores the latent space of poses in suitable directions to generate new positives. Our experiments using HaLP demonstrates strong empirical improvements.
Paper: https://arxiv.org/abs/2304.00387
The 4th Workshop on Fair, Data-Efficient, and Trusted Computer Vision
MERL researcher Kuan-Chuan Peng is co-organizing the fourth Workshop on Fair, Data-Efficient, and Trusted Computer Vision (https://fadetrcv.github.io/2023/) in conjunction with CVPR 2023 on June 18, 2023. This workshop provides a focused venue for discussing and disseminating research in the areas of fairness, bias, and trust in computer vision, as well as adjacent domains such as computational social science and public policy.
- MERL researchers are presenting 4 papers and co-organizing a workshop at the CVPR 2023 conference, which will be held in Vancouver, Canada June 18-22. CVPR is one of the most prestigious and competitive international conferences in computer vision. Details are provided below.
See All News & Events for Artificial Intelligence -
-
Research Highlights
-
Internships
-
ST1750: THz (Terahertz) Sensing
The Signal Processing (SP) group at MERL is seeking a highly motivated intern to conduct fundamental research in THz (Terahertz) sensing. Expertise in statistical inference, unsupervised anomaly detection, and deep learning (spatial-temporal representation learning) is required. Previous hands-on experience in THz data analysis is a plus. Familiarity with python and deep learning libraries is a must. The intern will collaborate with a small group of MERL researchers to develop novel algorithms, design experiments with collaborators, and prepare results for patents and publication. The expected duration of the internship is 3 months with a flexible start date.
-
CI1950: Quantum Machine Learning
MERL is seeking an intern to work on research for quantum machine learning (QML). The ideal candidate is an experienced PhD student or post-graduate researcher having an excellent background in quantum computing, deep learning, and signal processing. Proficient programming skills with PyTorch and PennyLane will be additional assets to this position.
-
CI2049: Efficient/Green AI
MERL is seeking highly motivated and qualified interns to work on efficient machine learning techniques. The ideal candidates would have significant research experience in federated learning, generative large language models, and efficient/green AI. A mature understanding of modern machine learning methods, proficiency with Python, and familiarity with deep learning frameworks are expected. Candidates at or beyond the middle of their Ph.D. program are encouraged to apply. The expected duration is 3 months long with flexible start dates.
See All Internships for Artificial Intelligence -
-
Recent Publications
- "EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation", 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2023.BibTeX TR2023-118 PDF
- @inproceedings{Huang2023oct,
- author = {Huang, Baichuan and Yu, Jingjin and Jain, Siddarth},
- title = {EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation},
- booktitle = {2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
- year = 2023,
- month = oct,
- url = {https://www.merl.com/publications/TR2023-118}
- }
, - "Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks", IEEE/ACM Transactions on Audio, Speech, and Language Processing, September 2023.BibTeX TR2023-113 PDF
- @article{Petermann2023sep,
- author = {Petermann, Darius and Wichern, Gordon and Subramanian, Aswin Shanmugam and Wang, Zhong-Qiu and Le Roux, Jonathan},
- title = {Tackling the Cocktail Fork Problem for Separation and Transcription of Real-World Soundtracks},
- journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
- year = 2023,
- month = sep,
- url = {https://www.merl.com/publications/TR2023-113}
- }
, - "Location as supervision for weakly supervised multi-channel source separation of machine sounds", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), August 2023.BibTeX TR2023-119 PDF
- @inproceedings{FalconPerez2023aug,
- author = {Falcon Perez, Ricardo and Wichern, Gordon and Germain, Francois and Le Roux, Jonathan},
- title = {Location as supervision for weakly supervised multi-channel source separation of machine sounds},
- booktitle = {IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
- year = 2023,
- month = aug,
- url = {https://www.merl.com/publications/TR2023-119}
- }
, - "Hyperbolic Unsupervised Anomalous Sound Detection", IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), August 2023.BibTeX TR2023-108 PDF
- @inproceedings{Germain2023aug,
- author = {Germain, Francois and Wichern, Gordon and Le Roux, Jonathan},
- title = {Hyperbolic Unsupervised Anomalous Sound Detection},
- booktitle = {IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
- year = 2023,
- month = aug,
- url = {https://www.merl.com/publications/TR2023-108}
- }
, - "Overview of the Tenth Dialog System Technology Challenge: DSTC10", IEE/ACM Transactions on Audio, Speech, and Language Processing, August 2023.BibTeX TR2023-109 PDF
- @article{Yoshino2023aug,
- author = {Yoshino, Koichiro and Chen, Yun-Nung and Crook, Paul and Kottur, Satwik and Li, Jinchao and Hedayatnia, Behnam and Moon, Seungwhan and Fe, Zhengcong and Li, Zekang and Zhang, Jinchao and Fen, Yang and Zhou, Jie and Kim, Seokhwan and Liu, Yang and Jin, Di and Papangelis, Alexandros and Gopalakrishnan, Karthik and Hakkani-Tur, Dilek and Damavandi, Babak and Geramifard, Alborz and
Hori, Chiori and Shah, Ankit and Zhang, Chen and Li, Haizhou and Sedoc, João and D’Haro, Luis F. and Banchs, Rafael and Rudnicky, Alexander}, - title = {Overview of the Tenth Dialog System Technology Challenge: DSTC10},
- journal = {IEE/ACM Transactions on Audio, Speech, and Language Processing},
- year = 2023,
- month = aug,
- url = {https://www.merl.com/publications/TR2023-109}
- }
, - "Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos", Interspeech, August 2023.BibTeX TR2023-104 PDF
- @inproceedings{Hori2023aug,
- author = {Hori, Chiori and Peng, Puyuang and Harwath, David and Liu, Xinyu and Ota, Kei and Jain, Siddarth and Corcodel, Radu and Jha, Devesh K. and Romeres, Diego and Le Roux, Jonathan},
- title = {Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos},
- booktitle = {Interspeech},
- year = 2023,
- month = aug,
- url = {https://www.merl.com/publications/TR2023-104}
- }
, - "Joint Software-Hardware Design for Green AI", International Midwest Symposium on Circuits and Systems (MWSCAS), August 2023.BibTeX TR2023-096 PDF
- @inproceedings{Ahmed2023aug,
- author = {Ahmed, Md Rubel and Koike-Akino, Toshiaki and Parsons, Kieran and Wang, Ye},
- title = {Joint Software-Hardware Design for Green AI},
- booktitle = {International Midwest Symposium on Circuits and Systems (MWSCAS)},
- year = 2023,
- month = aug,
- url = {https://www.merl.com/publications/TR2023-096}
- }
, - "AutoHLS: Learning to Accelerate Design Space Exploration for HLS Designs", International Midwest Symposium on Circuits and Systems (MWSCAS), August 2023.BibTeX TR2023-097 PDF
- @inproceedings{Ahmed2023aug2,
- author = {Ahmed, Md Rubel and Koike-Akino, Toshiaki and Parsons, Kieran and Wang, Ye},
- title = {AutoHLS: Learning to Accelerate Design Space Exploration for HLS Designs},
- booktitle = {International Midwest Symposium on Circuits and Systems (MWSCAS)},
- year = 2023,
- month = aug,
- url = {https://www.merl.com/publications/TR2023-097}
- }
,
- "EARL: Eye-on-Hand Reinforcement Learner for Dynamic Grasping with Active Pose Estimation", 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), October 2023.
-
Videos
-
Are Deep Neural Networks SMARTer than Second Graders?
-
[MERL Seminar Series Spring 2023] Fine-grained wildlife sound recognition: Towards the accuracy of a naturalist
-
[MERL Seminar Series Spring 2023] Pitfalls and Opportunities in Interpretable Machine Learning
-
Human Perspective Scene Understanding via Multimodal Sensing
-
[MERL Seminar Series Spring 2022] Self-Supervised Scene Representation Learning
-
[MERL Seminar Series Spring 2022] Learning Speech Representations with Multimodal Self-Supervision
-
[MERL Seminar Series 2021] Deep probabilistic regression
-
[MERL Seminar Series 2021] Learning to See by Moving: Self-supervising 3D scene representations for perception, control, and visual reasoning
-
[MERL Seminar Series 2021] Look and Listen: From Semantic to Spatial Audio-Visual Perception
-
Application of Deep Learning for Nanophotonic Device Design (Invited)
-
Machine Learning Power Amplifier
-
Scene-Aware Interaction Technology
-
-
Downloads
-
DeepBornFNO
-
Hyperbolic Audio Source Separation
-
Simple Multimodal Algorithmic Reasoning Task Dataset
-
SOurce-free Cross-modal KnowledgE Transfer
-
Audio-Visual-Language Embodied Navigation in 3D Environments
-
Nonparametric Score Estimators
-
Instance Segmentation GAN
-
Audio Visual Scene-Graph Segmentor
-
Generalized One-class Discriminative Subspaces
-
Goal directed RL with Safety Constraints
-
Hierarchical Musical Instrument Separation
-
Generating Visual Dynamics from Sound and Context
-
Adversarially-Contrastive Optimal Transport
-
Online Feature Extractor Network
-
MotionNet
-
FoldingNet++
-
Quasi-Newton Trust Region Policy Optimization
-
Landmarks’ Location, Uncertainty, and Visibility Likelihood
-
Robust Iterative Data Estimation
-
Gradient-based Nikaido-Isoda
-
Discriminative Subspace Pooling
-
Partial Group Convolutional Neural Networks
-