- Date: June 22, 2025
Where: IEEE International Symposium on Information Theory (ISIT)
MERL Contact: Toshiaki Koike-Akino
Research Areas: Artificial Intelligence, Communications, Data Analytics, Machine Learning, Optimization, Signal Processing, Human-Computer Interaction, Information Security
Brief - Toshiaki Koike-Akino is invited to present a tutorial talk at IEEE ISIT 2025 Quantum Hackathon, to be held at Ann Arbor, Michigan, USA. The talk, entitled "Emerging Quantum AI Technology", will discuss the recent trends, challenges, and applications of quantum artificial intelligence (QAI) technologies.
The ISIT 2025 Quantum Hackathon invites participants to explore the intersection of quantum computing and information theory. Participants will work with quantum simulators, available quantum hardware, and state-of-the-art development kits to create innovative solutions that connect quantum advancements with challenges in communication and signal processing.
The IEEE International Symposium on Information Theory (ISIT) is the flagship conference of the IEEE Information Theory Society. The symposium centers around the presentation in all of the areas of information theory, including source and channel coding, communication theory and systems, cryptography and security, detection and estimation, networks, pattern recognition and learning, statistics, stochastic processes and complexity, and signal processing.
-
- Date: June 11, 2025 - June 15, 2025
Where: Nashville, TN, USA
MERL Contacts: Matthew Brand; Moitreya Chatterjee; Anoop Cherian; François Germain; Michael J. Jones; Toshiaki Koike-Akino; Jing Liu; Suhas Lohit; Tim K. Marks; Pedro Miraldo; Kuan-Chuan Peng; Naoko Sawada; Pu (Perry) Wang; Ye Wang
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
Brief - MERL researchers are presenting 2 conference papers, co-organizing two workshops, and presenting 7 workshop papers at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025 conference, which will be held in Nashville, TN, USA from June 11-15, 2025. CVPR is one of the most prestigious and competitive international conferences in the area of computer vision. Details of MERL contributions are provided below:
Main Conference Papers:
1. "UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing" by Y.H. Lai, J. Ebbers, Y. F. Wang, F. Germain, M. J. Jones, M. Chatterjee
This work deals with the task of weakly‑supervised Audio-Visual Video Parsing (AVVP) and proposes a novel, uncertainty-aware algorithm called UWAV towards that end. UWAV works by producing more reliable segment‑level pseudo‑labels while explicitly weighting each label by its prediction uncertainty. This uncertainty‑aware training, combined with a feature‑mixup regularization scheme, promotes inter‑segment consistency in the pseudo-labels. As a result, UWAV achieves state‑of‑the‑art performance on two AVVP datasets across multiple metrics, demonstrating both effectiveness and strong generalizability.
Paper: https://www.merl.com/publications/TR2025-072
2. "TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection" by Y. G. Jung, J. Park, J. Yoon, K.-C. Peng, W. Kim, A. B. J. Teoh, and O. Camps.
This work tackles unsupervised anomaly detection in complex scenarios where normal data is noisy and has an unknown, imbalanced class distribution. Existing models face a trade-off between robustness to noise and performance on rare (tail) classes. To address this, the authors propose TailSampler, which estimates class sizes from embedding similarities to isolate tail samples. Using TailSampler, they develop TailedCore, a memory-based model that effectively captures tail class features while remaining noise-robust, outperforming state-of-the-art methods in extensive evaluations.
paper: https://www.merl.com/publications/TR2025-077
MERL Co-Organized Workshops:
1. Multimodal Algorithmic Reasoning (MAR) Workshop, organized by A. Cherian, K.-C. Peng, S. Lohit, H. Zhou, K. Smith, L. Xue, T. K. Marks, and J. Tenenbaum.
Workshop link: https://marworkshop.github.io/cvpr25/
2. The 6th Workshop on Fair, Data-Efficient, and Trusted Computer Vision, organized by N. Ratha, S. Karanam, Z. Wu, M. Vatsa, R. Singh, K.-C. Peng, M. Merler, and K. Varshney.
Workshop link: https://fadetrcv.github.io/2025/
Workshop Papers:
1. "FreBIS: Frequency-Based Stratification for Neural Implicit Surface Representations" by N. Sawada, P. Miraldo, S. Lohit, T.K. Marks, and M. Chatterjee (Oral)
With their ability to model object surfaces in a scene as a continuous function, neural implicit surface reconstruction methods have made remarkable strides recently, especially over classical 3D surface reconstruction methods, such as those that use voxels or point clouds. Towards this end, we propose FreBIS - a neural implicit‑surface framework that avoids overloading a single encoder with every surface detail. It divides a scene into several frequency bands and assigns a dedicated encoder (or group of encoders) to each band, then enforces complementary feature learning through a redundancy‑aware weighting module. Swapping this frequency‑stratified stack into an off‑the‑shelf reconstruction pipeline markedly boosts 3D surface accuracy and view‑consistent rendering on the challenging BlendedMVS dataset.
paper: https://www.merl.com/publications/TR2025-074
2. "Multimodal 3D Object Detection on Unseen Domains" by D. Hegde, S. Lohit, K.-C. Peng, M. J. Jones, and V. M. Patel.
LiDAR-based object detection models often suffer performance drops when deployed in unseen environments due to biases in data properties like point density and object size. Unlike domain adaptation methods that rely on access to target data, this work tackles the more realistic setting of domain generalization without test-time samples. We propose CLIX3D, a multimodal framework that uses both LiDAR and image data along with supervised contrastive learning to align same-class features across domains and improve robustness. CLIX3D achieves state-of-the-art performance across various domain shifts in 3D object detection.
paper: https://www.merl.com/publications/TR2025-078
3. "Improving Open-World Object Localization by Discovering Background" by A. Singh, M. J. Jones, K.-C. Peng, M. Chatterjee, A. Cherian, and E. Learned-Miller.
This work tackles open-world object localization, aiming to detect both seen and unseen object classes using limited labeled training data. While prior methods focus on object characterization, this approach introduces background information to improve objectness learning. The proposed framework identifies low-information, non-discriminative image regions as background and trains the model to avoid generating object proposals there. Experiments on standard benchmarks show that this method significantly outperforms previous state-of-the-art approaches.
paper: https://www.merl.com/publications/TR2025-058
4. "PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector" by K. Li, T. Zhang, K.-C. Peng, and G. Wang.
This work addresses challenges in 3D object detection for autonomous driving by improving the fusion of LiDAR and camera data, which is often hindered by domain gaps and limited labeled data. Leveraging advances in foundation models and prompt engineering, the authors propose PF3Det, a multi-modal detector that uses foundation model encoders and soft prompts to enhance feature fusion. PF3Det achieves strong performance even with limited training data. It sets new state-of-the-art results on the nuScenes dataset, improving NDS by 1.19% and mAP by 2.42%.
paper: https://www.merl.com/publications/TR2025-076
5. "Noise Consistency Regularization for Improved Subject-Driven Image Synthesis" by Y. Ni., S. Wen, P. Konius, A. Cherian
Fine-tuning Stable Diffusion enables subject-driven image synthesis by adapting the model to generate images containing specific subjects. However, existing fine-tuning methods suffer from two key issues: underfitting, where the model fails to reliably capture subject identity, and overfitting, where it memorizes the subject image and reduces background diversity. To address these challenges, two auxiliary consistency losses are porposed for diffusion fine-tuning. First, a prior consistency regularization loss ensures that the predicted diffusion noise for prior (non- subject) images remains consistent with that of the pretrained model, improving fidelity. Second, a subject consistency regularization loss enhances the fine-tuned model’s robustness to multiplicative noise modulated latent code, helping to preserve subject identity while improving diversity. Our experimental results demonstrate the effectiveness of our approach in terms of image diversity, outperforming DreamBooth in terms of CLIP scores, background variation, and overall visual quality.
paper: https://www.merl.com/publications/TR2025-073
6. "LatentLLM: Attention-Aware Joint Tensor Compression" by T. Koike-Akino, X. Chen, J. Liu, Y. Wang, P. Wang, M. Brand
We propose a new framework to convert a large foundation model such as large language models (LLMs)/large multi- modal models (LMMs) into a reduced-dimension latent structure. Our method uses a global attention-aware joint tensor decomposition to significantly improve the model efficiency. We show the benefit on several benchmark including multi-modal reasoning tasks.
paper: https://www.merl.com/publications/TR2025-075
7. "TuneComp: Joint Fine-Tuning and Compression for Large Foundation Models" by T. Koike-Akino, X. Chen, J. Liu, Y. Wang, P. Wang, M. Brand
To reduce model size during post-training, compression methods, including knowledge distillation, low-rank approximation, and pruning, are often applied after fine- tuning the model. However, sequential fine-tuning and compression sacrifices performance, while creating a larger than necessary model as an intermediate step. In this work, we aim to reduce this gap, by directly constructing a smaller model while guided by the downstream task. We propose to jointly fine-tune and compress the model by gradually distilling it to a pruned low-rank structure. Experiments demonstrate that joint fine-tuning and compression significantly outperforms other sequential compression methods.
paper: https://www.merl.com/publications/TR2025-079
-
- Date: May 19, 2025 - May 23, 2025
Where: IEEE ICRA
MERL Contacts: Stefano Di Cairano; Jianlin Guo; Chiori Hori; Siddarth Jain; Devesh K. Jha; Toshiaki Koike-Akino; Philip V. Orlik; Arvind Raghunathan; Diego Romeres; Yuki Shirai; Abraham P. Vinod; Yebin Wang
Research Areas: Artificial Intelligence, Computer Vision, Control, Dynamical Systems, Machine Learning, Optimization, Robotics, Human-Computer Interaction
Brief - MERL made significant contributions to both the organization and the technical program of the International Conference on Robotics and Automation (ICRA) 2025, which was held in Atlanta, Georgia, USA, from May 19th to May 23rd.
MERL was a Bronze sponsor of the conference, and MERL researchers chaired four sessions in the areas of Manipulation Planning, Human-Robot Collaboration, Diffusion Policy, and Learning for Robot Control.
MERL researchers presented four papers in the main conference on the topics of contact-implicit trajectory optimization, proactive robotic assistance in human-robot collaboration, diffusion policy with human preferences, and dynamic and model learning of robotic manipulators. In addition, five more papers were presented in the workshops: “Structured Learning for Efficient, Reliable, and Transparent Robots,” “Safely Leveraging Vision-Language Foundation Models in Robotics: Challenges and Opportunities,” “Long-term Human Motion Prediction,” and “The Future of Intelligent Manufacturing: From Innovation to Implementation.”
MERL researcher Diego Romeres delivered an invited talk titled “Dexterous Robotics: From Multimodal Sensing to Real-World Physical Interactions.”
MERL also collaborated with the University of Padua on one of the conference’s challenges: the “3rd AI Olympics with RealAIGym” (https://ai-olympics.dfki-bremen.de).
During the conference, MERL researchers received the IEEE Transactions on Automation Science and Engineering Best New Application Paper Award for their paper titled “Smart Actuation for End-Edge Industrial Control Systems.”
About ICRA
The IEEE International Conference on Robotics and Automation (ICRA) is the flagship conference of the IEEE Robotics and Automation Society and the world’s largest and most comprehensive technical conference focused on research advances and the latest technological developments in robotics. The event attracts over 7,000 participants, 143 partners and exhibitors, and receives more than 4,000 paper submissions.
-
- Date: March 31, 2025
Where: Northeastern University, Boston, MA
MERL Contact: Suhas Lohit
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Brief - MERL researcher Suhas Lohit was an invited speaker at Boston Symmetry Day, held at Northeastern University. Boston Symmetry Day, an annual workshop organized by researchers at MIT and Northeastern, brought together attendees interested in symmetry-informed machine learning and its applications. Suhas' talk, titled “Efficiency for Equivariance, and Efficiency through Equivariance” discussed recent MERL works that show how to build general and efficient equivariant neural networks, and how equivariance can be utilized in self-supervised learning to yield improved 3D object detection. The abstract and slides can be found in the link below.
-
- Date: March 4, 2025
Where: IEEE Robotics and Automation Society (RAS)
MERL Contact: Yuki Shirai
Research Areas: Artificial Intelligence, Optimization, Robotics
Brief - MERL researcher, Yuki Shirai, has been appointed to the editorial board of the IEEE Robotics and Automation Letters (RA-L) as an Associate Editor. IEEE RA-L publishes peer-reviewed articles in the areas of robotics and automation which can also be presented at the annual flagship conferences of IEEE Robotics and Automation Society (RAS), including IEEE International Conference on Robotics and Automation (ICRA) and International Conference on Intelligent Robots and Systems (IROS).
-
- Date: February 25, 2025 - March 4, 2025
Where: The Association for the Advancement of Artificial Intelligence (AAAI)
MERL Contacts: Ankush Chakrabarty; Toshiaki Koike-Akino; Jing Liu; Kuan-Chuan Peng; Diego Romeres; Ye Wang
Research Areas: Artificial Intelligence, Machine Learning, Optimization
Brief - MERL researchers presented 2 conference papers, 2 workshop papers, and co-organized 1 workshop at the AAAI 2025 conference, which was held in Philadelphia from Feb. 25 to Mar. 4, 2025. AAAI is one of the most prestigious and competitive international conferences in artificial intelligence (AI). Details of MERL contributions are provided below.
- AAAI Papers in Main Tracks:
1. "Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage" by M.R.U. Rashid, J. Liu, T. Koike-Akino, Y. Wang, and S. Mehnaz. [Oral Presentation]
This work proposes a novel unlearning-based model poisoning method that amplifies privacy breaches during fine-tuning. Extensive empirical studies show the proposed method’s efficacy on both membership inference and data extraction attacks. The attack is stealthy enough to bypass detection based defenses, and differential privacy cannot effectively defend against the attacks without significantly impacting model utility.
Paper: https://www.merl.com/publications/TR2025-017
2. "User-Preference Meets Pareto-Optimality: Multi-Objective Bayesian Optimization with Local Gradient Search" by J.H.S. Ip, A. Chakrabarty, A. Mesbah, and D. Romeres. [Poster Presentation]
This paper introduces a sample-efficient multi-objective Bayesian optimization method that integrates user preferences with gradient-based search to find near-Pareto optimal solutions. The proposed method achieves high utility and reduces distance to Pareto-front solutions across both synthetic and real-world problems, underscoring the importance of minimizing gradient uncertainty during gradient-based optimization. Additionally, the study introduces a novel utility function that respects Pareto dominance and effectively captures diverse user preferences.
Paper: https://www.merl.com/publications/TR2025-018
- AAAI Workshop Papers:
1. "Quantum Diffusion Models for Few-Shot Learning" by R. Wang, Y. Wang, J. Liu, and T. Koike-Akino.
This work presents the quantum diffusion model (QDM) as an approach to overcome the challenges of quantum few-shot learning (QFSL). It introduces three novel algorithms developed from complementary data-driven and algorithmic perspectives to enhance the performance of QFSL tasks. The extensive experiments demonstrate that these algorithms achieve significant performance gains over traditional baselines, underscoring the potential of QDM to advance QFSL by effectively leveraging quantum noise modeling and label guidance.
Paper: https://www.merl.com/publications/TR2025-025
2. "Quantum Implicit Neural Compression", by T. Fujihashi and T., Koike-Akino.
This work introduces a quantum counterpart of implicit neural representation (quINR) which leverages the exponentially rich expressivity of quantum neural networks to improve the classical INR-based signal compression methods. Evaluations using some benchmark datasets show that the proposed quINR-based compression could improve rate-distortion performance in image compression compared with traditional codecs and classic INR-based coding methods.
Paper: https://www.merl.com/publications/TR2025-024
- AAAI Workshops Contributed by MERL:
1. "Scalable and Efficient Artificial Intelligence Systems (SEAS)"
K.-C. Peng co-organized this workshop, which offers a timely forum for experts to share their perspectives in designing and developing robust computer vision (CV), machine learning (ML), and artificial intelligence (AI) algorithms, and translating them into real-world solutions.
Workshop link: https://seasworkshop.github.io/aaai25/index.html
2. "Quantum Computing and Artificial Intelligence"
T. Koike-Akino served a session chair of Quantum Neural Network in this workshop, which focuses on seeking contributions encompassing theoretical and applied advances in quantum AI, quantum computing (QC) to enhance classical AI, and classical AI to tackle various aspects of QC.
Workshop link: https://sites.google.com/view/qcai2025/
-
- Date: December 16, 2024 - December 19, 2024
Where: Milan, Italy
MERL Contacts: Ankush Chakrabarty; Vedang M. Deshpande; Stefano Di Cairano; Abraham P. Vinod; Avishai Weiss; Gordon Wichern
Research Areas: Artificial Intelligence, Control, Dynamical Systems, Machine Learning, Multi-Physical Modeling, Optimization, Robotics
Brief - MERL researchers presented 7 papers at the recently concluded Conference on Decision and Control (CDC) 2024 in Milan, Italy. The papers covered a wide range of topics including safety shielding for stochastic model predictive control, reinforcement learning using expert observations, physics-constrained meta learning for positioning, variational-Bayes Kalman filtering, Bayesian measurement masks for GNSS positioning, divert-feasible lunar landing, and centering and stochastic control using constrained zonotopes.
As a sponsor of the conference, MERL maintained a booth for open discussions with researchers and students, and hosted a special session to discuss highlights of MERL research and work philosophy.
In addition, Ankush Chakrabarty (Principal Research Scientist, Multiphysical Systems Team) was an invited speaker in the pre-conference Workshop on "Learning Dynamics From Data" where he gave a talk on few-shot meta-learning for black-box identification using data from similar systems.
-
- Date: December 10, 2024 - December 15, 2024
Where: Advances in Neural Processing Systems (NeurIPS)
MERL Contacts: Petros T. Boufounos; Matthew Brand; Ankush Chakrabarty; Anoop Cherian; François Germain; Toshiaki Koike-Akino; Christopher R. Laughman; Jonathan Le Roux; Jing Liu; Suhas Lohit; Tim K. Marks; Yoshiki Masuyama; Kieran Parsons; Kuan-Chuan Peng; Diego Romeres; Pu (Perry) Wang; Ye Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio, Human-Computer Interaction, Information Security
Brief - MERL researchers will attend and present the following papers at the 2024 Advances in Neural Processing Systems (NeurIPS) Conference and Workshops.
1. "RETR: Multi-View Radar Detection Transformer for Indoor Perception" by Ryoma Yataka (Mitsubishi Electric), Adriano Cardace (Bologna University), Perry Wang (Mitsubishi Electric Research Laboratories), Petros Boufounos (Mitsubishi Electric Research Laboratories), Ryuhei Takahashi (Mitsubishi Electric). Main Conference. https://neurips.cc/virtual/2024/poster/95530
2. "Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads" by Anoop Cherian (Mitsubishi Electric Research Laboratories), Kuan-Chuan Peng (Mitsubishi Electric Research Laboratories), Suhas Lohit (Mitsubishi Electric Research Laboratories), Joanna Matthiesen (Math Kangaroo USA), Kevin Smith (Massachusetts Institute of Technology), Josh Tenenbaum (Massachusetts Institute of Technology). Main Conference, Datasets and Benchmarks track. https://neurips.cc/virtual/2024/poster/97639
3. "Probabilistic Forecasting for Building Energy Systems: Are Time-Series Foundation Models The Answer?" by Young-Jin Park (Massachusetts Institute of Technology), Jing Liu (Mitsubishi Electric Research Laboratories), François G Germain (Mitsubishi Electric Research Laboratories), Ye Wang (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Gordon Wichern (Mitsubishi Electric Research Laboratories), Navid Azizan (Massachusetts Institute of Technology), Christopher R. Laughman (Mitsubishi Electric Research Laboratories), Ankush Chakrabarty (Mitsubishi Electric Research Laboratories). Time Series in the Age of Large Models Workshop.
4. "Forget to Flourish: Leveraging Model-Unlearning on Pretrained Language Models for Privacy Leakage" by Md Rafi Ur Rashid (Penn State University), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Shagufta Mehnaz (Penn State University), Ye Wang (Mitsubishi Electric Research Laboratories). Workshop on Red Teaming GenAI: What Can We Learn from Adversaries?
5. "Spatially-Aware Losses for Enhanced Neural Acoustic Fields" by Christopher Ick (New York University), Gordon Wichern (Mitsubishi Electric Research Laboratories), Yoshiki Masuyama (Mitsubishi Electric Research Laboratories), François G Germain (Mitsubishi Electric Research Laboratories), Jonathan Le Roux (Mitsubishi Electric Research Laboratories). Audio Imagination Workshop.
6. "FV-NeRV: Neural Compression for Free Viewpoint Videos" by Sorachi Kato (Osaka University), Takuya Fujihashi (Osaka University), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Takashi Watanabe (Osaka University). Machine Learning and Compression Workshop.
7. "GPT Sonography: Hand Gesture Decoding from Forearm Ultrasound Images via VLM" by Keshav Bimbraw (Worcester Polytechnic Institute), Ye Wang (Mitsubishi Electric Research Laboratories), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories). AIM-FM: Advancements In Medical Foundation Models: Explainability, Robustness, Security, and Beyond Workshop.
8. "Smoothed Embeddings for Robust Language Models" by Hase Ryo (Mitsubishi Electric), Md Rafi Ur Rashid (Penn State University), Ashley Lewis (Ohio State University), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Kieran Parsons (Mitsubishi Electric Research Laboratories), Ye Wang (Mitsubishi Electric Research Laboratories). Safe Generative AI Workshop.
9. "Slaying the HyDRA: Parameter-Efficient Hyper Networks with Low-Displacement Rank Adaptation" by Xiangyu Chen (University of Kansas), Ye Wang (Mitsubishi Electric Research Laboratories), Matthew Brand (Mitsubishi Electric Research Laboratories), Pu Wang (Mitsubishi Electric Research Laboratories), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories). Workshop on Adaptive Foundation Models.
10. "Preference-based Multi-Objective Bayesian Optimization with Gradients" by Joshua Hang Sai Ip (University of California Berkeley), Ankush Chakrabarty (Mitsubishi Electric Research Laboratories), Ali Mesbah (University of California Berkeley), Diego Romeres (Mitsubishi Electric Research Laboratories). Workshop on Bayesian Decision-Making and Uncertainty. Lightning talk spotlight.
11. "TR-BEACON: Shedding Light on Efficient Behavior Discovery in High-Dimensions with Trust-Region-based Bayesian Novelty Search" by Wei-Ting Tang (Ohio State University), Ankush Chakrabarty (Mitsubishi Electric Research Laboratories), Joel A. Paulson (Ohio State University). Workshop on Bayesian Decision-Making and Uncertainty.
12. "MEL-PETs Joint-Context Attack for the NeurIPS 2024 LLM Privacy Challenge Red Team Track" by Ye Wang (Mitsubishi Electric Research Laboratories), Tsunato Nakai (Mitsubishi Electric), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Kento Oonishi (Mitsubishi Electric), Takuya Higashi (Mitsubishi Electric). LLM Privacy Challenge. Special Award for Practical Attack.
13. "MEL-PETs Defense for the NeurIPS 2024 LLM Privacy Challenge Blue Team Track" by Jing Liu (Mitsubishi Electric Research Laboratories), Ye Wang (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Tsunato Nakai (Mitsubishi Electric), Kento Oonishi (Mitsubishi Electric), Takuya Higashi (Mitsubishi Electric). LLM Privacy Challenge. Won 3rd Place Award.
MERL members also contributed to the organization of the Multimodal Algorithmic Reasoning (MAR) Workshop (https://marworkshop.github.io/neurips24/). Organizers: Anoop Cherian (Mitsubishi Electric Research Laboratories), Kuan-Chuan Peng (Mitsubishi Electric Research Laboratories), Suhas Lohit (Mitsubishi Electric Research Laboratories), Honglu Zhou (Salesforce Research), Kevin Smith (Massachusetts Institute of Technology), Tim K. Marks (Mitsubishi Electric Research Laboratories), Juan Carlos Niebles (Salesforce AI Research), Petar Veličković (Google DeepMind).
-
- Date: November 14, 2024 - November 22, 2024
Where: Italian Consulate
MERL Contact: Diego Romeres
Research Area: Robotics
Brief - Prof. Zunino from the University of Genoa, with support from MERL Researcher Diego Romeres, organized a robotic workshop that introduced 6th-8th grade students from the greater Boston area to the fundamentals of robotics. The workshop provided students with hands-on experience in robotic technology using LEGO systems. Participants learned key principles of robotics, teamwork, and project planning. They worked collaboratively to design, program using visual-based software, and solve challenges as field engineers.
The workshop event was part of the Festival of Italian Creativity organized by the Italian consulate to honor the naming of Boston as a Capital of Italian Creativity.
-
- Date: July 10, 2024 - July 12, 2024
Where: Toronto, Canada
MERL Contacts: Ankush Chakrabarty; Vedang M. Deshpande; Stefano Di Cairano; Christopher R. Laughman; Arvind Raghunathan; Abraham P. Vinod; Yebin Wang; Avishai Weiss
Research Areas: Artificial Intelligence, Control, Dynamical Systems, Machine Learning, Multi-Physical Modeling, Optimization, Robotics
Brief - MERL researchers presented 9 papers at the recently concluded American Control Conference (ACC) 2024 in Toronto, Canada. The papers covered a wide range of topics including data-driven spatial monitoring using heterogenous robots, aircraft approach management near airports, computation fluid dynamics-based motion planning for drones facing winds, trajectory planning for coordinated monitoring using a team of drones and a ground carrier vehicle, ensemble Kalman smoothing-based model predictive control for motion planning for autonomous vehicles, system identification for Lithium-ion batteries, physics-constrained deep Kalman filters for vapor compression systems, switched reference governors for constrained systems, and distributed road-map monitoring using onboard sensors.
As a sponsor of the conference, MERL maintained a booth for open discussions with researchers and students, and hosted a special session to discuss highlights of MERL research and work philosophy.
In addition, Abraham Vinod served as a panelist at the Student Networking Event at the conference. The student networking event provides an opportunity for all interested students to network with professionals working in industry, academia, and national laboratories during a structured event, and encourages their continued participation as the future leaders in the field.
-
- Date: June 13, 2024
Where: IEEE International Conference on Communications (ICC)
MERL Contacts: Jianlin Guo; Philip V. Orlik; Kieran Parsons; Pu (Perry) Wang
Research Areas: Communications, Machine Learning, Signal Processing
Brief - Jianlin Guo delivered a keynote titled "Private IoT Networks" in the IEEE International Conference on Communications (ICC) 2024 Workshop "Industrial Private 5G-and-Beyond Wireless Networks", held in Denver, Colorado from June 9-13. The ICC is one of two IEEE Communications Society’s flagship conferences.
Abstract: With the advent of private 5G-and-Beyond communication technologies, private IoT networks have been emerging. In private IoT networks, network owners have full control on the network resource management. However, to fully realize private IoT networks, the upper layer technologies need to be developed as well. This keynote presents machine learning based anomaly detection in manufacturing systems, innovative multipath TCP technologies over heterogeneous wireless IoT networks, novel channel resource scheduling in private 5G networks and efficient wireless coexistence of the heterogeneous wireless systems.
-
- Date: May 13, 2024 - May 17, 2024
Where: Yokohama, Japan
MERL Contacts: Anoop Cherian; Radu Corcodel; Stefano Di Cairano; Chiori Hori; Siddarth Jain; Devesh K. Jha; Jonathan Le Roux; Diego Romeres; William S. Yerazunis
Research Areas: Artificial Intelligence, Machine Learning, Optimization, Robotics, Speech & Audio
Brief - MERL made significant contributions to both the organization and the technical program of the International Conference on Robotics and Automation (ICRA) 2024, which was held in Yokohama, Japan from May 13th to May 17th.
MERL was a Bronze sponsor of the conference, and exhibited a live robotic demonstration, which attracted a large audience. The demonstration showcased an Autonomous Robotic Assembly technology executed on MELCO's Assista robot arm and was the collaborative effort of the Optimization and Robotics Team together with the Advanced Technology department at Mitsubishi Electric.
MERL researchers from the Optimization and Robotics, Speech & Audio, and Control for Autonomy teams also presented 8 papers and 2 invited talks covering topics on robotic assembly, applications of LLMs to robotics, human robot interaction, safe and robust path planning for autonomous drones, transfer learning, perception and tactile sensing.
-
- Date: May 22, 2024
MERL Contact: Toshiaki Koike-Akino
Research Areas: Artificial Intelligence, Machine Learning
Brief - Toshiaki Koike-Akino is invited to present a seminar talk at EPFL, Switzerland. The talk, entitled "Post-Deep Learning: Emerging Quantum AI Technology", will discuss the recent trends, challenges, and applications of quantum machine learning (QML) technologies. The seminar is organized by Prof. Volkan Cevher and Prof. Giovanni De Micheli. The event invites students, researchers, scholars and professors through EPFL departments including School of Engineering, Communication Science, Life Science, Machine Learning and AI Center.
-
- Date: June 17, 2024 - June 21, 2024
Where: Seattle, WA
MERL Contacts: Petros T. Boufounos; Moitreya Chatterjee; Anoop Cherian; Michael J. Jones; Toshiaki Koike-Akino; Jonathan Le Roux; Suhas Lohit; Tim K. Marks; Pedro Miraldo; Jing Liu; Kuan-Chuan Peng; Pu (Perry) Wang; Ye Wang; Matthew Brand
Research Areas: Artificial Intelligence, Computational Sensing, Computer Vision, Machine Learning, Speech & Audio
Brief - MERL researchers are presenting 5 conference papers, 3 workshop papers, and are co-organizing two workshops at the CVPR 2024 conference, which will be held in Seattle, June 17-21. CVPR is one of the most prestigious and competitive international conferences in computer vision. Details of MERL contributions are provided below.
CVPR Conference Papers:
1. "TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models" by H. Ni, B. Egger, S. Lohit, A. Cherian, Y. Wang, T. Koike-Akino, S. X. Huang, and T. K. Marks
This work enables a pretrained text-to-video (T2V) diffusion model to be additionally conditioned on an input image (first video frame), yielding a text+image to video (TI2V) model. Other than using the pretrained T2V model, our method requires no ("zero") training or fine-tuning. The paper uses a "repeat-and-slide" method and diffusion resampling to synthesize videos from a given starting image and text describing the video content.
Paper: https://www.merl.com/publications/TR2024-059
Project page: https://merl.com/research/highlights/TI2V-Zero
2. "Long-Tailed Anomaly Detection with Learnable Class Names" by C.-H. Ho, K.-C. Peng, and N. Vasconcelos
This work aims to identify defects across various classes without relying on hard-coded class names. We introduce the concept of long-tailed anomaly detection, addressing challenges like class imbalance and dataset variability. Our proposed method combines reconstruction and semantic modules, learning pseudo-class names and utilizing a variational autoencoder for feature synthesis to improve performance in long-tailed datasets, outperforming existing methods in experiments.
Paper: https://www.merl.com/publications/TR2024-040
3. "Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling" by X. Liu, Y-W. Tai, C-T. Tang, P. Miraldo, S. Lohit, and M. Chatterjee
This work presents a new strategy for rendering dynamic scenes from novel viewpoints. Our approach is based on stratifying the scene into regions based on the extent of motion of the region, which is automatically determined. Regions with higher motion are permitted a denser spatio-temporal sampling strategy for more faithful rendering of the scene. Additionally, to the best of our knowledge, ours is the first work to enable tracking of objects in the scene from novel views - based on the preferences of a user, provided by a click.
Paper: https://www.merl.com/publications/TR2024-042
4. "SIRA: Scalable Inter-frame Relation and Association for Radar Perception" by R. Yataka, P. Wang, P. T. Boufounos, and R. Takahashi
Overcoming the limitations on radar feature extraction such as low spatial resolution, multipath reflection, and motion blurs, this paper proposes SIRA (Scalable Inter-frame Relation and Association) for scalable radar perception with two designs: 1) extended temporal relation, generalizing the existing temporal relation layer from two frames to multiple inter-frames with temporally regrouped window attention for scalability; and 2) motion consistency track with a pseudo-tracklet generated from observational data for better object association.
Paper: https://www.merl.com/publications/TR2024-041
5. "RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation" by Z. Yang, J. Liu, P. Chen, A. Cherian, T. K. Marks, J. L. Roux, and C. Gan
We leverage Large Language Models (LLM) for zero-shot semantic audio visual navigation. Specifically, by employing multi-modal models to process sensory data, we instruct an LLM-based planner to actively explore the environment by adaptively evaluating and dismissing inaccurate perceptual descriptions.
Paper: https://www.merl.com/publications/TR2024-043
CVPR Workshop Papers:
1. "CoLa-SDF: Controllable Latent StyleSDF for Disentangled 3D Face Generation" by R. Dey, B. Egger, V. Boddeti, Y. Wang, and T. K. Marks
This paper proposes a new method for generating 3D faces and rendering them to images by combining the controllability of nonlinear 3DMMs with the high fidelity of implicit 3D GANs. Inspired by StyleSDF, our model uses a similar architecture but enforces the latent space to match the interpretable and physical parameters of the nonlinear 3D morphable model MOST-GAN.
Paper: https://www.merl.com/publications/TR2024-045
2. “Tracklet-based Explainable Video Anomaly Localization” by A. Singh, M. J. Jones, and E. Learned-Miller
This paper describes a new method for localizing anomalous activity in video of a scene given sample videos of normal activity from the same scene. The method is based on detecting and tracking objects in the scene and estimating high-level attributes of the objects such as their location, size, short-term trajectory and object class. These high-level attributes can then be used to detect unusual activity as well as to provide a human-understandable explanation for what is unusual about the activity.
Paper: https://www.merl.com/publications/TR2024-057
MERL co-organized workshops:
1. "Multimodal Algorithmic Reasoning Workshop" by A. Cherian, K-C. Peng, S. Lohit, M. Chatterjee, H. Zhou, K. Smith, T. K. Marks, J. Mathissen, and J. Tenenbaum
Workshop link: https://marworkshop.github.io/cvpr24/index.html
2. "The 5th Workshop on Fair, Data-Efficient, and Trusted Computer Vision" by K-C. Peng, et al.
Workshop link: https://fadetrcv.github.io/2024/
3. "SuperLoRA: Parameter-Efficient Unified Adaptation for Large Vision Models" by X. Chen, J. Liu, Y. Wang, P. Wang, M. Brand, G. Wang, and T. Koike-Akino
This paper proposes a generalized framework called SuperLoRA that unifies and extends different variants of low-rank adaptation (LoRA). Introducing new options with grouping, folding, shuffling, projection, and tensor decomposition, SuperLoRA offers high flexibility and demonstrates superior performance up to 10-fold gain in parameter efficiency for transfer learning tasks.
Paper: https://www.merl.com/publications/TR2024-062
-
- Date: April 9, 2024
MERL Contact: Diego Romeres
Research Areas: Artificial Intelligence, Dynamical Systems, Machine Learning, Optimization, Robotics
Brief - Diego Romeres, Principal Research Scientist and Team Leader in the Optimization and Robotics Team, was invited to speak as a guest lecturer in the seminar series on "AI in Action" in the Department of Management and Engineering, at the University of Padua.
The talk, entitled "Machine Learning for Robotics and Automation" described MERL's recent research on machine learning and model-based reinforcement learning applied to robotics and automation.
-
- Date: April 12, 2024
MERL Contact: Saviz Mowlavi
Research Areas: Control, Dynamical Systems, Machine Learning, Optimization
Brief - Saviz Mowlavi was invited to present remotely at the Computational and Applied Mathematics seminar series in the Department of Mathematics at North Carolina State University.
The talk, entitled "Model-based and data-driven prediction and control of spatio-temporal systems", described the use of temporal smoothness to regularize the training of fast surrogate models for PDEs, user-friendly methods for PDE-constrained optimization, and efficient strategies for learning feedback controllers for PDEs.
-
- Date: December 9, 2024 - December 15, 2024
Where: NeurIPS 2024
MERL Contact: Devesh K. Jha
Research Areas: Artificial Intelligence, Machine Learning
Brief - Devesh Jha, a Principal Research Scientist in the Optimization & Intelligent Robtics team, has been appointed as an area chair for Conference on Neural Information Processing Systems (NeurIPS) 2024. NeurIPS is the premier Machine Learning (ML) and Artificial Intelligence (AI) conference that includes invited talks, demonstrations, symposia, and oral and poster presentations of refereed papers.
-
- Date: March 20, 2024
Where: Austin, TX
MERL Contact: Ankush Chakrabarty
Research Areas: Artificial Intelligence, Control, Data Analytics, Dynamical Systems, Machine Learning, Multi-Physical Modeling, Optimization
Brief - Ankush Chakrabarty, Principal Research Scientist in the Multiphysical Systems Team, was invited to speak as a guest lecturer in the seminar series on "Occupant-Centric Grid Interactive Buildings" in the Department of Civil, Architectural and Environmental Engineering (CAEE) at the University of Texas at Austin.
The talk, entitled "Deep Generative Networks and Fine-Tuning for Net-Zero Energy Buildings" described lessons learned from MERL's recent research on generative models for building simulation and control, along with meta-learning for on-the-fly fine-tuning to adapt and optimize energy expenditure.
-
- Date: December 7, 2023
Research Areas: Control, Dynamical Systems
Brief - Karl Berntorp has joined the Editorial Board of the IEEE Transactions on Control Systems Technology (T-CST) as an Associate Editor. The IEEE T-CST publishes peer-reviewed papers on technological advances in the design, realization, and operation of control systems, and bridges the gap between the theory and practice of control engineering.
-
- Date: November 14, 2023
Where: Istanbul, Turkey
MERL Contact: Ankush Chakrabarty
Research Areas: Control, Data Analytics, Machine Learning, Multi-Physical Modeling, Optimization
Brief - Ankush Chakrabarty, Principal Research Scientist in the Multiphysical Systems team at MERL, served as Co-Chair at the 3rd ACM International Workshop on Big Data and Machine Learning for Smart Buildings and Cities (BALANCES'23). The workshop places spotlights on two different IEA EBC Annexes: the Annex 81 - Data-Driven Smart Buildings and Annex 82 - Energy Flexible Buildings Towards Resilient Low Carbon Energy Systems.
-
- Date: November 28, 2023 - November 30, 2023
Where: Virtual
MERL Contacts: Toshiaki Koike-Akino; Pu (Perry) Wang
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Machine Learning, Signal Processing
Brief - On November 28, 2023, MERL researchers Toshiaki Koike-Akino and Pu (Perry) Wang will give a 3-hour tutorial presentation at the first IEEE Virtual Conference on Communications (VCC). The talk, titled "Post-Deep Learning Era: Emerging Quantum Machine Learning for Sensing and Communications," addresses recent trends, challenges, and advances in sensing and communications. P. Wang presents use cases, industry trends, signal processing, and deep learning for Wi-Fi integrated sensing and communications (ISAC), while T. Koike-Akino discusses the future of deep learning, giving a comprehensive overview of artificial intelligence (AI) technologies, natural computing, emerging quantum AI, and their diverse applications. The tutorial is conducted virtually.
IEEE VCC is a new fully virtual conference launched from the IEEE Communications Society, gathering researchers from academia and industry who are unable to travel but wish to present their recent scientific results and engage in conducive interactive discussions with fellow researchers working in their fields. It is designed to resolve potential hardship such as pandemic restrictions, visa issues, travel problems, or financial difficulties.
-
- Date: September 26, 2023
Where: Virtual
MERL Contact: Anoop Cherian
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Brief - Anoop Cherian, a Senior Principal Research Scientist in the Computer Vision team at MERL, gave a podcast interview with award-winning journalist, Deborah Yao. Deborah is the editor of AI Business -- a leading content platform for artificial intelligence and its applications in the real world, delivering its readers up-to-the-minute insights into how AI technologies are currently affecting the global economy and society. The podcast was based on the recent research that Anoop and his colleagues did at MERL with his collaborators at MIT; this research attempts to objectively answer the pertinent question: are current deep neural networks smarter than second graders? The podcast discusses shortcomings in the recent artificial general intelligence systems with regard to their capabilities for knowledge abstraction, learning, and generalization, which are brought out by this research.
-
- Date: November 1, 2023
MERL Contact: Diego Romeres
Research Areas: Artificial Intelligence, Machine Learning, Robotics
Brief - Principal Research Scientist and Team Leader Diego Romeres gave an invited talk with title 'Applications of Machine Learning to Robotics' in the Machine Learning graduate course at Bentley University. The presentation focused mainly on Reinforcement Learning research applied to robotics. The audience consisted mostly of Master’s in Business Analytics (MSBA) students and students in the MBA w/ Business Analytics Concentration program.
-
- Date: January 23, 2023 - November 4, 2023
Where: International Symposium of Music Information Retrieval (ISMR)
MERL Contacts: Jonathan Le Roux; Gordon Wichern
Research Areas: Artificial Intelligence, Machine Learning, Speech & Audio
Brief - MERL Speech & Audio team members Gordon Wichern and Jonathan Le Roux co-organized the 2023 Sound Demixing Challenge along with researchers from Sony, Moises AI, Audioshake, and Meta.
The SDX2023 Challenge was hosted on the AI Crowd platform and had a prize pool of $42,000 distributed to the winning teams across two tracks: Music Demixing and Cinematic Sound Demixing. A unique aspect of this challenge was the ability to test the audio source separation models developed by challenge participants on non-public songs from Sony Music Entertainment Japan for the music demixing track, and movie soundtracks from Sony Pictures for the cinematic sound demixing track. The challenge ran from January 23rd to May 1st, 2023, and had 884 participants distributed across 68 teams submitting 2828 source separation models. The winners will be announced at the SDX2023 Workshop, which will take place as a satellite event at the International Symposium of Music Information Retrieval (ISMR) in Milan, Italy on November 4, 2023.
MERL’s contribution to SDX2023 focused mainly on the cinematic demixing track. In addition to sponsoring the prizes awarded to the winning teams for that track, the baseline system and initial training data were MERL’s Cocktail Fork separation model and Divide and Remaster dataset, respectively. MERL researchers also contributed to a Town Hall kicking off the challenge, co-authored a scientific paper describing the challenge outcomes, and co-organized the SDX2023 Workshop.
-
- Date: October 2, 2023 - October 6, 2023
Where: Paris/France
MERL Contacts: Moitreya Chatterjee; Anoop Cherian; Michael J. Jones; Toshiaki Koike-Akino; Suhas Lohit; Tim K. Marks; Pedro Miraldo; Kuan-Chuan Peng; Ye Wang
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Brief - MERL researchers are presenting 4 papers and organizing the VLAR-SMART-101 workshop at the ICCV 2023 conference, which will be held in Paris, France October 2-6. ICCV is one of the most prestigious and competitive international conferences in computer vision. Details are provided below.
1. Conference paper: “Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis,” by Nithin Gopalakrishnan Nair, Anoop Cherian, Suhas Lohit, Ye Wang, Toshiaki Koike-Akino, Vishal Patel, and Tim K. Marks
Conditional generative models typically demand large annotated training sets to achieve high-quality synthesis. As a result, there has been significant interest in plug-and-play generation, i.e., using a pre-defined model to guide the generative process. In this paper, we introduce Steered Diffusion, a generalized framework for fine-grained photorealistic zero-shot conditional image generation using a diffusion model trained for unconditional generation. The key idea is to steer the image generation of the diffusion model during inference via designing a loss using a pre-trained inverse model that characterizes the conditional task. Our model shows clear qualitative and quantitative improvements over state-of-the-art diffusion-based plug-and-play models, while adding negligible computational cost.
2. Conference paper: "BANSAC: A dynamic BAyesian Network for adaptive SAmple Consensus," by Valter Piedade and Pedro Miraldo
We derive a dynamic Bayesian network that updates individual data points' inlier scores while iterating RANSAC. At each iteration, we apply weighted sampling using the updated scores. Our method works with or without prior data point scorings. In addition, we use the updated inlier/outlier scoring for deriving a new stopping criterion for the RANSAC loop. Our method outperforms the baselines in accuracy while needing less computational time.
3. Conference paper: "Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes," by Fabien Delattre, David Dirnfeld, Phat Nguyen, Stephen Scarano, Michael J. Jones, Pedro Miraldo, and Erik Learned-Miller
We present a novel approach to estimating camera rotation in crowded, real-world scenes captured using a handheld monocular video camera. Our method uses a novel generalization of the Hough transform on SO3 to efficiently find the camera rotation most compatible with the optical flow. Because the setting is not addressed well by other data sets, we provide a new dataset and benchmark, with high-accuracy and rigorously annotated ground truth on 17 video sequences. Our method is more accurate by almost 40 percent than the next best method.
4. Workshop paper: "Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection" by Manish Sharma*, Moitreya Chatterjee*, Kuan-Chuan Peng, Suhas Lohit, and Michael Jones
While state-of-the-art object detection methods for RGB images have reached some level of maturity, the same is not true for Infrared (IR) images. The primary bottleneck towards bridging this gap is the lack of sufficient labeled training data in the IR images. Towards addressing this issue, we present TensorFact, a novel tensor decomposition method which splits the convolution kernels of a CNN into low-rank factor matrices with fewer parameters. This compressed network is first pre-trained on RGB images and then augmented with only a few parameters. This augmented network is then trained on IR images, while freezing the weights trained on RGB. This prevents it from over-fitting, allowing it to generalize better. Experiments show that our method outperforms state-of-the-art.
5. “Vision-and-Language Algorithmic Reasoning (VLAR) Workshop and SMART-101 Challenge” by Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Tim K. Marks, Ram Ramrakhya, Honglu Zhou, Kevin A. Smith, Joanna Matthiesen, and Joshua B. Tenenbaum
MERL researchers along with researchers from MIT, GeorgiaTech, Math Kangaroo USA, and Rutgers University are jointly organizing a workshop on vision-and-language algorithmic reasoning at ICCV 2023 and conducting a challenge based on the SMART-101 puzzles described in the paper: Are Deep Neural Networks SMARTer than Second Graders?. A focus of this workshop is to bring together outstanding faculty/researchers working at the intersections of vision, language, and cognition to provide their opinions on the recent breakthroughs in large language models and artificial general intelligence, as well as showcase their cutting edge research that could inspire the audience to search for the missing pieces in our quest towards solving the puzzle of artificial intelligence.
Workshop link: https://wvlar.github.io/iccv23/
-