News & Events

NEWS MERL Papers and Workshops at CVPR 2025
Date: June 11, 2025 - June 15, 2025
Where: Nashville, TN, USA
MERL Contacts: Matthew Brand; Moitreya Chatterjee; Anoop Cherian; François Germain; Michael J. Jones; Toshiaki Koike-Akino; Jing Liu; Suhas Lohit; Tim K. Marks; Pedro Miraldo; Kuan-Chuan Peng; Pu (Perry) Wang; Ye Wang
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Signal Processing, Speech & Audio
Brief
- MERL researchers are presenting 2 conference papers, co-organizing two workshops, and presenting 7 workshop papers at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025 conference, which will be held in Nashville, TN, USA from June 11-15, 2025. CVPR is one of the most prestigious and competitive international conferences in the area of computer vision. Details of MERL contributions are provided below:
  
  Main Conference Papers:
  
  1. "UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing" by Y.H. Lai, J. Ebbers, Y. F. Wang, F. Germain, M. J. Jones, M. Chatterjee
  
  This work deals with the task of weakly‑supervised Audio-Visual Video Parsing (AVVP) and proposes a novel, uncertainty-aware algorithm called UWAV towards that end. UWAV works by producing more reliable segment‑level pseudo‑labels while explicitly weighting each label by its prediction uncertainty. This uncertainty‑aware training, combined with a feature‑mixup regularization scheme, promotes inter‑segment consistency in the pseudo-labels. As a result, UWAV achieves state‑of‑the‑art performance on two AVVP datasets across multiple metrics, demonstrating both effectiveness and strong generalizability.
  
  Paper: https://www.merl.com/publications/TR2025-072
  
  2. "TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection" by Y. G. Jung, J. Park, J. Yoon, K.-C. Peng, W. Kim, A. B. J. Teoh, and O. Camps.
  
  This work tackles unsupervised anomaly detection in complex scenarios where normal data is noisy and has an unknown, imbalanced class distribution. Existing models face a trade-off between robustness to noise and performance on rare (tail) classes. To address this, the authors propose TailSampler, which estimates class sizes from embedding similarities to isolate tail samples. Using TailSampler, they develop TailedCore, a memory-based model that effectively captures tail class features while remaining noise-robust, outperforming state-of-the-art methods in extensive evaluations.
  
  paper: https://www.merl.com/publications/TR2025-077
  
  MERL Co-Organized Workshops:
  
  1. Multimodal Algorithmic Reasoning (MAR) Workshop, organized by A. Cherian, K.-C. Peng, S. Lohit, H. Zhou, K. Smith, L. Xue, T. K. Marks, and J. Tenenbaum.
  
  Workshop link: https://marworkshop.github.io/cvpr25/
  
  2. The 6th Workshop on Fair, Data-Efficient, and Trusted Computer Vision, organized by N. Ratha, S. Karanam, Z. Wu, M. Vatsa, R. Singh, K.-C. Peng, M. Merler, and K. Varshney.
  
  Workshop link: https://fadetrcv.github.io/2025/
  
  Workshop Papers:
  
  1. "FreBIS: Frequency-Based Stratification for Neural Implicit Surface Representations" by N. Sawada, P. Miraldo, S. Lohit, T.K. Marks, and M. Chatterjee (Oral)
  
  With their ability to model object surfaces in a scene as a continuous function, neural implicit surface reconstruction methods have made remarkable strides recently, especially over classical 3D surface reconstruction methods, such as those that use voxels or point clouds. Towards this end, we propose FreBIS - a neural implicit‑surface framework that avoids overloading a single encoder with every surface detail. It divides a scene into several frequency bands and assigns a dedicated encoder (or group of encoders) to each band, then enforces complementary feature learning through a redundancy‑aware weighting module. Swapping this frequency‑stratified stack into an off‑the‑shelf reconstruction pipeline markedly boosts 3D surface accuracy and view‑consistent rendering on the challenging BlendedMVS dataset.
  
  paper: https://www.merl.com/publications/TR2025-074
  
  2. "Multimodal 3D Object Detection on Unseen Domains" by D. Hegde, S. Lohit, K.-C. Peng, M. J. Jones, and V. M. Patel.
  
  LiDAR-based object detection models often suffer performance drops when deployed in unseen environments due to biases in data properties like point density and object size. Unlike domain adaptation methods that rely on access to target data, this work tackles the more realistic setting of domain generalization without test-time samples. We propose CLIX3D, a multimodal framework that uses both LiDAR and image data along with supervised contrastive learning to align same-class features across domains and improve robustness. CLIX3D achieves state-of-the-art performance across various domain shifts in 3D object detection.
  
  paper: https://www.merl.com/publications/TR2025-078
  
  3. "Improving Open-World Object Localization by Discovering Background" by A. Singh, M. J. Jones, K.-C. Peng, M. Chatterjee, A. Cherian, and E. Learned-Miller.
  
  This work tackles open-world object localization, aiming to detect both seen and unseen object classes using limited labeled training data. While prior methods focus on object characterization, this approach introduces background information to improve objectness learning. The proposed framework identifies low-information, non-discriminative image regions as background and trains the model to avoid generating object proposals there. Experiments on standard benchmarks show that this method significantly outperforms previous state-of-the-art approaches.
  
  paper: https://www.merl.com/publications/TR2025-058
  
  4. "PF3Det: A Prompted Foundation Feature Assisted Visual LiDAR 3D Detector" by K. Li, T. Zhang, K.-C. Peng, and G. Wang.
  
  This work addresses challenges in 3D object detection for autonomous driving by improving the fusion of LiDAR and camera data, which is often hindered by domain gaps and limited labeled data. Leveraging advances in foundation models and prompt engineering, the authors propose PF3Det, a multi-modal detector that uses foundation model encoders and soft prompts to enhance feature fusion. PF3Det achieves strong performance even with limited training data. It sets new state-of-the-art results on the nuScenes dataset, improving NDS by 1.19% and mAP by 2.42%.
  
  paper: https://www.merl.com/publications/TR2025-076
  
  5. "Noise Consistency Regularization for Improved Subject-Driven Image Synthesis" by Y. Ni., S. Wen, P. Konius, A. Cherian
  
  Fine-tuning Stable Diffusion enables subject-driven image synthesis by adapting the model to generate images containing specific subjects. However, existing fine-tuning methods suffer from two key issues: underfitting, where the model fails to reliably capture subject identity, and overfitting, where it memorizes the subject image and reduces background diversity. To address these challenges, two auxiliary consistency losses are porposed for diffusion fine-tuning. First, a prior consistency regularization loss ensures that the predicted diffusion noise for prior (non- subject) images remains consistent with that of the pretrained model, improving fidelity. Second, a subject consistency regularization loss enhances the fine-tuned model’s robustness to multiplicative noise modulated latent code, helping to preserve subject identity while improving diversity. Our experimental results demonstrate the effectiveness of our approach in terms of image diversity, outperforming DreamBooth in terms of CLIP scores, background variation, and overall visual quality.
  
  paper: https://www.merl.com/publications/TR2025-073
  
  6. "LatentLLM: Attention-Aware Joint Tensor Compression" by T. Koike-Akino, X. Chen, J. Liu, Y. Wang, P. Wang, M. Brand
  
  We propose a new framework to convert a large foundation model such as large language models (LLMs)/large multi- modal models (LMMs) into a reduced-dimension latent structure. Our method uses a global attention-aware joint tensor decomposition to significantly improve the model efficiency. We show the benefit on several benchmark including multi-modal reasoning tasks.
  
  paper: https://www.merl.com/publications/TR2025-075
  
  7. "TuneComp: Joint Fine-Tuning and Compression for Large Foundation Models" by T. Koike-Akino, X. Chen, J. Liu, Y. Wang, P. Wang, M. Brand
  
  To reduce model size during post-training, compression methods, including knowledge distillation, low-rank approximation, and pruning, are often applied after fine- tuning the model. However, sequential fine-tuning and compression sacrifices performance, while creating a larger than necessary model as an intermediate step. In this work, we aim to reduce this gap, by directly constructing a smaller model while guided by the downstream task. We propose to jointly fine-tune and compress the model by gradually distilling it to a pruned low-rank structure. Experiments demonstrate that joint fine-tuning and compression significantly outperforms other sequential compression methods.
  
  paper: https://www.merl.com/publications/TR2025-079
NEWS MERL Papers and Workshops at AAAI 2025
Date: February 25, 2025 - March 4, 2025
Where: The Association for the Advancement of Artificial Intelligence (AAAI)
MERL Contacts: Ankush Chakrabarty; Toshiaki Koike-Akino; Jing Liu; Kuan-Chuan Peng; Diego Romeres; Ye Wang
Research Areas: Artificial Intelligence, Machine Learning, Optimization
Brief
- MERL researchers presented 2 conference papers, 2 workshop papers, and co-organized 1 workshop at the AAAI 2025 conference, which was held in Philadelphia from Feb. 25 to Mar. 4, 2025. AAAI is one of the most prestigious and competitive international conferences in artificial intelligence (AI). Details of MERL contributions are provided below.
  
  - AAAI Papers in Main Tracks:
  
  1. "Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage" by M.R.U. Rashid, J. Liu, T. Koike-Akino, Y. Wang, and S. Mehnaz. [Oral Presentation]
  
  This work proposes a novel unlearning-based model poisoning method that amplifies privacy breaches during fine-tuning. Extensive empirical studies show the proposed method’s efficacy on both membership inference and data extraction attacks. The attack is stealthy enough to bypass detection based defenses, and differential privacy cannot effectively defend against the attacks without significantly impacting model utility.
  
  Paper: https://www.merl.com/publications/TR2025-017
  
  2. "User-Preference Meets Pareto-Optimality: Multi-Objective Bayesian Optimization with Local Gradient Search" by J.H.S. Ip, A. Chakrabarty, A. Mesbah, and D. Romeres. [Poster Presentation]
  
  This paper introduces a sample-efficient multi-objective Bayesian optimization method that integrates user preferences with gradient-based search to find near-Pareto optimal solutions. The proposed method achieves high utility and reduces distance to Pareto-front solutions across both synthetic and real-world problems, underscoring the importance of minimizing gradient uncertainty during gradient-based optimization. Additionally, the study introduces a novel utility function that respects Pareto dominance and effectively captures diverse user preferences.
  
  Paper: https://www.merl.com/publications/TR2025-018
  
  - AAAI Workshop Papers:
  
  1. "Quantum Diffusion Models for Few-Shot Learning" by R. Wang, Y. Wang, J. Liu, and T. Koike-Akino.
  
  This work presents the quantum diffusion model (QDM) as an approach to overcome the challenges of quantum few-shot learning (QFSL). It introduces three novel algorithms developed from complementary data-driven and algorithmic perspectives to enhance the performance of QFSL tasks. The extensive experiments demonstrate that these algorithms achieve significant performance gains over traditional baselines, underscoring the potential of QDM to advance QFSL by effectively leveraging quantum noise modeling and label guidance.
  
  Paper: https://www.merl.com/publications/TR2025-025
  
  2. "Quantum Implicit Neural Compression", by T. Fujihashi and T., Koike-Akino.
  
  This work introduces a quantum counterpart of implicit neural representation (quINR) which leverages the exponentially rich expressivity of quantum neural networks to improve the classical INR-based signal compression methods. Evaluations using some benchmark datasets show that the proposed quINR-based compression could improve rate-distortion performance in image compression compared with traditional codecs and classic INR-based coding methods.
  
  Paper: https://www.merl.com/publications/TR2025-024
  
  - AAAI Workshops Contributed by MERL:
  
  1. "Scalable and Efficient Artificial Intelligence Systems (SEAS)"
  
  K.-C. Peng co-organized this workshop, which offers a timely forum for experts to share their perspectives in designing and developing robust computer vision (CV), machine learning (ML), and artificial intelligence (AI) algorithms, and translating them into real-world solutions.
  
  Workshop link: https://seasworkshop.github.io/aaai25/index.html
  
  2. "Quantum Computing and Artificial Intelligence"
  
  T. Koike-Akino served a session chair of Quantum Neural Network in this workshop, which focuses on seeking contributions encompassing theoretical and applied advances in quantum AI, quantum computing (QC) to enhance classical AI, and classical AI to tackle various aspects of QC.
  
  Workshop link: https://sites.google.com/view/qcai2025/
NEWS MERL Researchers to Present 2 Conference and 11 Workshop Papers at NeurIPS 2024
Date: December 10, 2024 - December 15, 2024
Where: Advances in Neural Processing Systems (NeurIPS)
MERL Contacts: Petros T. Boufounos; Matthew Brand; Ankush Chakrabarty; Anoop Cherian; François Germain; Toshiaki Koike-Akino; Christopher R. Laughman; Jonathan Le Roux; Jing Liu; Suhas Lohit; Tim K. Marks; Yoshiki Masuyama; Kieran Parsons; Kuan-Chuan Peng; Diego Romeres; Pu (Perry) Wang; Ye Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & Audio, Human-Computer Interaction, Information Security
Brief
- MERL researchers will attend and present the following papers at the 2024 Advances in Neural Processing Systems (NeurIPS) Conference and Workshops.
  
  1. "RETR: Multi-View Radar Detection Transformer for Indoor Perception" by Ryoma Yataka (Mitsubishi Electric), Adriano Cardace (Bologna University), Perry Wang (Mitsubishi Electric Research Laboratories), Petros Boufounos (Mitsubishi Electric Research Laboratories), Ryuhei Takahashi (Mitsubishi Electric). Main Conference. https://neurips.cc/virtual/2024/poster/95530
  
  2. "Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads" by Anoop Cherian (Mitsubishi Electric Research Laboratories), Kuan-Chuan Peng (Mitsubishi Electric Research Laboratories), Suhas Lohit (Mitsubishi Electric Research Laboratories), Joanna Matthiesen (Math Kangaroo USA), Kevin Smith (Massachusetts Institute of Technology), Josh Tenenbaum (Massachusetts Institute of Technology). Main Conference, Datasets and Benchmarks track. https://neurips.cc/virtual/2024/poster/97639
  
  3. "Probabilistic Forecasting for Building Energy Systems: Are Time-Series Foundation Models The Answer?" by Young-Jin Park (Massachusetts Institute of Technology), Jing Liu (Mitsubishi Electric Research Laboratories), François G Germain (Mitsubishi Electric Research Laboratories), Ye Wang (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Gordon Wichern (Mitsubishi Electric Research Laboratories), Navid Azizan (Massachusetts Institute of Technology), Christopher R. Laughman (Mitsubishi Electric Research Laboratories), Ankush Chakrabarty (Mitsubishi Electric Research Laboratories). Time Series in the Age of Large Models Workshop.
  
  4. "Forget to Flourish: Leveraging Model-Unlearning on Pretrained Language Models for Privacy Leakage" by Md Rafi Ur Rashid (Penn State University), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Shagufta Mehnaz (Penn State University), Ye Wang (Mitsubishi Electric Research Laboratories). Workshop on Red Teaming GenAI: What Can We Learn from Adversaries?
  
  5. "Spatially-Aware Losses for Enhanced Neural Acoustic Fields" by Christopher Ick (New York University), Gordon Wichern (Mitsubishi Electric Research Laboratories), Yoshiki Masuyama (Mitsubishi Electric Research Laboratories), François G Germain (Mitsubishi Electric Research Laboratories), Jonathan Le Roux (Mitsubishi Electric Research Laboratories). Audio Imagination Workshop.
  
  6. "FV-NeRV: Neural Compression for Free Viewpoint Videos" by Sorachi Kato (Osaka University), Takuya Fujihashi (Osaka University), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Takashi Watanabe (Osaka University). Machine Learning and Compression Workshop.
  
  7. "GPT Sonography: Hand Gesture Decoding from Forearm Ultrasound Images via VLM" by Keshav Bimbraw (Worcester Polytechnic Institute), Ye Wang (Mitsubishi Electric Research Laboratories), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories). AIM-FM: Advancements In Medical Foundation Models: Explainability, Robustness, Security, and Beyond Workshop.
  
  8. "Smoothed Embeddings for Robust Language Models" by Hase Ryo (Mitsubishi Electric), Md Rafi Ur Rashid (Penn State University), Ashley Lewis (Ohio State University), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Kieran Parsons (Mitsubishi Electric Research Laboratories), Ye Wang (Mitsubishi Electric Research Laboratories). Safe Generative AI Workshop.
  
  9. "Slaying the HyDRA: Parameter-Efficient Hyper Networks with Low-Displacement Rank Adaptation" by Xiangyu Chen (University of Kansas), Ye Wang (Mitsubishi Electric Research Laboratories), Matthew Brand (Mitsubishi Electric Research Laboratories), Pu Wang (Mitsubishi Electric Research Laboratories), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories). Workshop on Adaptive Foundation Models.
  
  10. "Preference-based Multi-Objective Bayesian Optimization with Gradients" by Joshua Hang Sai Ip (University of California Berkeley), Ankush Chakrabarty (Mitsubishi Electric Research Laboratories), Ali Mesbah (University of California Berkeley), Diego Romeres (Mitsubishi Electric Research Laboratories). Workshop on Bayesian Decision-Making and Uncertainty. Lightning talk spotlight.
  
  11. "TR-BEACON: Shedding Light on Efficient Behavior Discovery in High-Dimensions with Trust-Region-based Bayesian Novelty Search" by Wei-Ting Tang (Ohio State University), Ankush Chakrabarty (Mitsubishi Electric Research Laboratories), Joel A. Paulson (Ohio State University). Workshop on Bayesian Decision-Making and Uncertainty.
  
  12. "MEL-PETs Joint-Context Attack for the NeurIPS 2024 LLM Privacy Challenge Red Team Track" by Ye Wang (Mitsubishi Electric Research Laboratories), Tsunato Nakai (Mitsubishi Electric), Jing Liu (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Kento Oonishi (Mitsubishi Electric), Takuya Higashi (Mitsubishi Electric). LLM Privacy Challenge. Special Award for Practical Attack.
  
  13. "MEL-PETs Defense for the NeurIPS 2024 LLM Privacy Challenge Blue Team Track" by Jing Liu (Mitsubishi Electric Research Laboratories), Ye Wang (Mitsubishi Electric Research Laboratories), Toshiaki Koike-Akino (Mitsubishi Electric Research Laboratories), Tsunato Nakai (Mitsubishi Electric), Kento Oonishi (Mitsubishi Electric), Takuya Higashi (Mitsubishi Electric). LLM Privacy Challenge. Won 3rd Place Award.
  
  MERL members also contributed to the organization of the Multimodal Algorithmic Reasoning (MAR) Workshop (https://marworkshop.github.io/neurips24/). Organizers: Anoop Cherian (Mitsubishi Electric Research Laboratories), Kuan-Chuan Peng (Mitsubishi Electric Research Laboratories), Suhas Lohit (Mitsubishi Electric Research Laboratories), Honglu Zhou (Salesforce Research), Kevin Smith (Massachusetts Institute of Technology), Tim K. Marks (Mitsubishi Electric Research Laboratories), Juan Carlos Niebles (Salesforce AI Research), Petar Veličković (Google DeepMind).
NEWS MERL Papers and Workshops at CVPR 2024
Date: June 17, 2024 - June 21, 2024
Where: Seattle, WA
MERL Contacts: Petros T. Boufounos; Moitreya Chatterjee; Anoop Cherian; Michael J. Jones; Toshiaki Koike-Akino; Jonathan Le Roux; Suhas Lohit; Tim K. Marks; Pedro Miraldo; Jing Liu; Kuan-Chuan Peng; Pu (Perry) Wang; Ye Wang; Matthew Brand
Research Areas: Artificial Intelligence, Computational Sensing, Computer Vision, Machine Learning, Speech & Audio
Brief
- MERL researchers are presenting 5 conference papers, 3 workshop papers, and are co-organizing two workshops at the CVPR 2024 conference, which will be held in Seattle, June 17-21. CVPR is one of the most prestigious and competitive international conferences in computer vision. Details of MERL contributions are provided below.
  
  CVPR Conference Papers:
  
  1. "TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models" by H. Ni, B. Egger, S. Lohit, A. Cherian, Y. Wang, T. Koike-Akino, S. X. Huang, and T. K. Marks
  
  This work enables a pretrained text-to-video (T2V) diffusion model to be additionally conditioned on an input image (first video frame), yielding a text+image to video (TI2V) model. Other than using the pretrained T2V model, our method requires no ("zero") training or fine-tuning. The paper uses a "repeat-and-slide" method and diffusion resampling to synthesize videos from a given starting image and text describing the video content.
  
  Paper: https://www.merl.com/publications/TR2024-059
  Project page: https://merl.com/research/highlights/TI2V-Zero
  
  2. "Long-Tailed Anomaly Detection with Learnable Class Names" by C.-H. Ho, K.-C. Peng, and N. Vasconcelos
  
  This work aims to identify defects across various classes without relying on hard-coded class names. We introduce the concept of long-tailed anomaly detection, addressing challenges like class imbalance and dataset variability. Our proposed method combines reconstruction and semantic modules, learning pseudo-class names and utilizing a variational autoencoder for feature synthesis to improve performance in long-tailed datasets, outperforming existing methods in experiments.
  
  Paper: https://www.merl.com/publications/TR2024-040
  
  3. "Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling" by X. Liu, Y-W. Tai, C-T. Tang, P. Miraldo, S. Lohit, and M. Chatterjee
  
  This work presents a new strategy for rendering dynamic scenes from novel viewpoints. Our approach is based on stratifying the scene into regions based on the extent of motion of the region, which is automatically determined. Regions with higher motion are permitted a denser spatio-temporal sampling strategy for more faithful rendering of the scene. Additionally, to the best of our knowledge, ours is the first work to enable tracking of objects in the scene from novel views - based on the preferences of a user, provided by a click.
  
  Paper: https://www.merl.com/publications/TR2024-042
  
  4. "SIRA: Scalable Inter-frame Relation and Association for Radar Perception" by R. Yataka, P. Wang, P. T. Boufounos, and R. Takahashi
  
  Overcoming the limitations on radar feature extraction such as low spatial resolution, multipath reflection, and motion blurs, this paper proposes SIRA (Scalable Inter-frame Relation and Association) for scalable radar perception with two designs: 1) extended temporal relation, generalizing the existing temporal relation layer from two frames to multiple inter-frames with temporally regrouped window attention for scalability; and 2) motion consistency track with a pseudo-tracklet generated from observational data for better object association.
  
  Paper: https://www.merl.com/publications/TR2024-041
  
  5. "RILA: Reflective and Imaginative Language Agent for Zero-Shot Semantic Audio-Visual Navigation" by Z. Yang, J. Liu, P. Chen, A. Cherian, T. K. Marks, J. L. Roux, and C. Gan
  
  We leverage Large Language Models (LLM) for zero-shot semantic audio visual navigation. Specifically, by employing multi-modal models to process sensory data, we instruct an LLM-based planner to actively explore the environment by adaptively evaluating and dismissing inaccurate perceptual descriptions.
  
  Paper: https://www.merl.com/publications/TR2024-043
  
  CVPR Workshop Papers:
  
  1. "CoLa-SDF: Controllable Latent StyleSDF for Disentangled 3D Face Generation" by R. Dey, B. Egger, V. Boddeti, Y. Wang, and T. K. Marks
  
  This paper proposes a new method for generating 3D faces and rendering them to images by combining the controllability of nonlinear 3DMMs with the high fidelity of implicit 3D GANs. Inspired by StyleSDF, our model uses a similar architecture but enforces the latent space to match the interpretable and physical parameters of the nonlinear 3D morphable model MOST-GAN.
  
  Paper: https://www.merl.com/publications/TR2024-045
  
  2. “Tracklet-based Explainable Video Anomaly Localization” by A. Singh, M. J. Jones, and E. Learned-Miller
  
  This paper describes a new method for localizing anomalous activity in video of a scene given sample videos of normal activity from the same scene. The method is based on detecting and tracking objects in the scene and estimating high-level attributes of the objects such as their location, size, short-term trajectory and object class. These high-level attributes can then be used to detect unusual activity as well as to provide a human-understandable explanation for what is unusual about the activity.
  
  Paper: https://www.merl.com/publications/TR2024-057
  
  MERL co-organized workshops:
  
  1. "Multimodal Algorithmic Reasoning Workshop" by A. Cherian, K-C. Peng, S. Lohit, M. Chatterjee, H. Zhou, K. Smith, T. K. Marks, J. Mathissen, and J. Tenenbaum
  
  Workshop link: https://marworkshop.github.io/cvpr24/index.html
  
  2. "The 5th Workshop on Fair, Data-Efficient, and Trusted Computer Vision" by K-C. Peng, et al.
  
  Workshop link: https://fadetrcv.github.io/2024/
  
  3. "SuperLoRA: Parameter-Efficient Unified Adaptation for Large Vision Models" by X. Chen, J. Liu, Y. Wang, P. Wang, M. Brand, G. Wang, and T. Koike-Akino
  
  This paper proposes a generalized framework called SuperLoRA that unifies and extends different variants of low-rank adaptation (LoRA). Introducing new options with grouping, folding, shuffling, projection, and tensor decomposition, SuperLoRA offers high flexibility and demonstrates superior performance up to 10-fold gain in parameter efficiency for transfer learning tasks.
  
  Paper: https://www.merl.com/publications/TR2024-062
AWARD Best paper award at PHMAP 2023
Date: September 14, 2023
Awarded to: Dehong Liu, Anantaram Varatharajan, and Abraham Goldsmith
MERL Contacts: Abraham Goldsmith; Dehong Liu
Research Areas: Electric Systems, Signal Processing
Brief
- MERL researchers Dehong Liu, Anantaram Varatharajan, and Abraham Goldsmith were awarded one of three best paper awards at Asia Pacific Conference of the Prognostics and Health Management Society 2023 (PHMAP23) held in Tokyo from September 11th to 14th, 2023, for their co-authored paper titled 'Extracting Broken-Rotor-Bar Fault Signature of Varying-Speed Induction Motors.'
  
  PHMAP is a biennial international conference specialized in prognostics and health management. PHMAP23 attracted more than 300 attendees from worldwide and published more than 160 regular papers from academia and industry including aerospace, production, civil engineering, electronics, and so on.
NEWS MERL researchers presenting four papers and co-organizing a workshop at CVPR 2023
Date: June 18, 2023 - June 22, 2023
Where: Vancouver/Canada
MERL Contacts: Anoop Cherian; Michael J. Jones; Suhas Lohit; Kuan-Chuan Peng
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Brief
- MERL researchers are presenting 4 papers and co-organizing a workshop at the CVPR 2023 conference, which will be held in Vancouver, Canada June 18-22. CVPR is one of the most prestigious and competitive international conferences in computer vision. Details are provided below.
  
  1. “Are Deep Neural Networks SMARTer than Second Graders,” by Anoop Cherian, Kuan-Chuan Peng, Suhas Lohit, Kevin Smith, and Joshua B. Tenenbaum
  
  We present SMART: a Simple Multimodal Algorithmic Reasoning Task and the associated SMART-101 dataset for evaluating the abstraction, deduction, and generalization abilities of neural networks in solving visuo-linguistic puzzles designed for children in the 6-8 age group. Our experiments using SMART-101 reveal that powerful deep models are not better than random accuracy when analyzed for generalization. We also evaluate large language models (including ChatGPT) on a subset of SMART-101 and find that while these models show convincing reasoning abilities, their answers are often incorrect.
  
  Paper: https://arxiv.org/abs/2212.09993
  
  2. “EVAL: Explainable Video Anomaly Localization,” by Ashish Singh, Michael J. Jones, and Erik Learned-Miller
  
  This work presents a method for detecting unusual activities in videos by building a high-level model of activities found in nominal videos of a scene. The high-level features used in the model are human understandable and include attributes such as the object class and the directions and speeds of motion. Such high-level features allow our method to not only detect anomalous activity but also to provide explanations for why it is anomalous.
  
  Paper: https://arxiv.org/abs/2212.07900
  
  3. "Aligning Step-by-Step Instructional Diagrams to Video Demonstrations," by Jiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez, and Stephen Gould
  
  The rise of do-it-yourself (DIY) videos on the web has made it possible even for an unskilled person (or a skilled robot) to imitate and follow instructions to complete complex real world tasks. In this paper, we consider the novel problem of aligning instruction steps that are depicted as assembly diagrams (commonly seen in Ikea assembly manuals) with video segments from in-the-wild videos. We present a new dataset: Ikea Assembly in the Wild (IAW) and propose a contrastive learning framework for aligning instruction diagrams with video clips.
  
  Paper: https://arxiv.org/pdf/2303.13800.pdf
  
  4. "HaLP: Hallucinating Latent Positives for Skeleton-Based Self-Supervised Learning of Actions," by Anshul Shah, Aniket Roy, Ketul Shah, Shlok Kumar Mishra, David Jacobs, Anoop Cherian, and Rama Chellappa
  
  In this work, we propose a new contrastive learning approach to train models for skeleton-based action recognition without labels. Our key contribution is a simple module, HaLP: Hallucinating Latent Positives for contrastive learning. HaLP explores the latent space of poses in suitable directions to generate new positives. Our experiments using HaLP demonstrates strong empirical improvements.
  
  Paper: https://arxiv.org/abs/2304.00387
  
  The 4th Workshop on Fair, Data-Efficient, and Trusted Computer Vision
  
  MERL researcher Kuan-Chuan Peng is co-organizing the fourth Workshop on Fair, Data-Efficient, and Trusted Computer Vision (https://fadetrcv.github.io/2023/) in conjunction with CVPR 2023 on June 18, 2023. This workshop provides a focused venue for discussing and disseminating research in the areas of fairness, bias, and trust in computer vision, as well as adjacent domains such as computational social science and public policy.
NEWS MERL researchers presenting workshop papers at NeurIPS 2022
Date: December 2, 2022 - December 8, 2022
MERL Contacts: Matthew Brand; Toshiaki Koike-Akino; Jing Liu; Saviz Mowlavi; Kieran Parsons; Ye Wang
Research Areas: Artificial Intelligence, Control, Dynamical Systems, Machine Learning, Signal Processing
Brief
- In addition to 5 papers in recent news (https://www.merl.com/news/news-20221129-1450), MERL researchers presented 2 papers at the NeurIPS Conference Workshop, which was held Dec. 2-8. NeurIPS is one of the most prestigious and competitive international conferences in machine learning.
  
  - “Optimal control of PDEs using physics-informed neural networks” by Saviz Mowlavi and Saleh Nabi
  
  Physics-informed neural networks (PINNs) have recently become a popular method for solving forward and inverse problems governed by partial differential equations (PDEs). By incorporating the residual of the PDE into the loss function of a neural network-based surrogate model for the unknown state, PINNs can seamlessly blend measurement data with physical constraints. Here, we extend this framework to PDE-constrained optimal control problems, for which the governing PDE is fully known and the goal is to find a control variable that minimizes a desired cost objective. We validate the performance of the PINN framework by comparing it to state-of-the-art adjoint-based optimization, which performs gradient descent on the discretized control variable while satisfying the discretized PDE.
  
  - “Learning with noisy labels using low-dimensional model trajectory” by Vasu Singla, Shuchin Aeron, Toshiaki Koike-Akino, Matthew E. Brand, Kieran Parsons, Ye Wang
  
  Noisy annotations in real-world datasets pose a challenge for training deep neural networks (DNNs), detrimentally impacting generalization performance as incorrect labels may be memorized. In this work, we probe the observations that early stopping and low-dimensional subspace learning can help address this issue. First, we show that a prior method is sensitive to the early stopping hyper-parameter. Second, we investigate the effectiveness of PCA, for approximating the optimization trajectory under noisy label information. We propose to estimate the low-rank subspace through robust and structured variants of PCA, namely Robust PCA, and Sparse PCA. We find that the subspace estimated through these variants can be less sensitive to early stopping, and can outperform PCA to achieve better test error when trained on noisy labels.
  
  - In addition, new MERL researcher, Jing Liu, also presented a paper entitled “CoPur: Certifiably Robust Collaborative Inference via Feature Purification" based on his previous work before joining MERL. His paper was elected as a spotlight paper to be highlighted in lightening talks and featured paper panel.
NEWS Bingnan Wang gave seminar talk at WEMPEC in University of Wisconsin-Madison
Date: October 28, 2022
MERL Contacts: Dehong Liu; Bingnan Wang; Jinyun Zhang
Research Areas: Applied Physics, Data Analytics, Multi-Physical Modeling
Brief
- MERL researcher Bingnan Wang gave seminar talk at Wisconsin Electric Machines and Power Electronics Consortium (WEMPEC), which is recognized globally for its sustained contributions to electric machines and power electronics technology. He gave an overview of MERL research, especially on electric machines, and introduced our recent work on quantitative eccentricity fault diagnosis technologies for electric motors, including physical-model approach using improved winding function theory, and data-driven approach using topological data analysis to effectively differentiate signals from different fault conditions.
  
  The seminar was given on Teams. MERL researchers Jin Zhang, Dehong Liu, Yusuke Sakamoto and Bingnan Wang held meetings with WEMPEC faculty members before the seminar to discuss various research topics, and met virtually with students after the talk.
NEWS MERL researchers presenting four papers and organizing two workshops at CVPR 2020 conference
Date: June 14, 2020 - June 19, 2020
MERL Contacts: Anoop Cherian; Michael J. Jones; Toshiaki Koike-Akino; Tim K. Marks; Kuan-Chuan Peng; Ye Wang
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Brief
- MERL researchers are presenting four papers (two oral papers and two posters) and organizing two workshops at the IEEE/CVF Computer Vision and Pattern Recognition (CVPR 2020) conference.
  
  CVPR 2020 Orals with MERL authors:
  1. "Dynamic Multiscale Graph Neural Networks for 3D Skeleton Based Human Motion Prediction," by Maosen Li, Siheng Chen, Yangheng Zhao, Ya Zhang, Yanfeng Wang, Qi Tian
  2. "Collaborative Motion Prediction via Neural Motion Message Passing," by Yue Hu, Siheng Chen, Ya Zhang, Xiao Gu
  
  CVPR 2020 Posters with MERL authors:
  3. "LUVLi Face Alignment: Estimating Landmarks’ Location, Uncertainty, and Visibility Likelihood," by Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Ye Wang, Michael Jones, Anoop Cherian, Toshiaki Koike-Akino, Xiaoming Liu, Chen Feng
  4. "MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps," by Pengxiang Wu, Siheng Chen, Dimitris N. Metaxas
  
  CVPR 2020 Workshops co-organized by MERL researchers:
  1. Fair, Data-Efficient and Trusted Computer Vision
  2. Deep Declarative Networks.
AWARD MERL Researchers win Best Paper Award at ICCV 2019 Workshop on Statistical Deep Learning in Computer Vision
Date: October 27, 2019
Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
MERL Contact: Tim K. Marks
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning
Brief
- MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
NEWS Scene interpretation results of SA group members are listed as the leader of benchmark competition
Date: July 13, 2015 - July 17, 2015
Research Area: Machine Learning
Brief
- SA group members (M. Liu, S. Lin (intern), S. Ramalingam, O. Tuzel) presented a paper at the Robotics Science and Systems Conference in Rome July 13-17 called 'Layered Interpretation of Street View Images'. The results they reported are now listed as the leader of the benchmark competition sponsored by Daimler. [Note that at that URL ref 2 is from collaboration with Daimler and it uses a FPGA for high speed, whereas MERL result is obtained with desktop computer and GPU.].
AWARD GRSS 2014 Symposium Prize Paper Award
Date: May 1, 2014
Awarded to: Dehong Liu and Petros T. Boufounos
Awarded for: "Synthetic Aperture Imaging Using a Randomly Steered Spotlight"
Awarded by: IEEE Geoscience and Remote Sensing Society (GRSS)
MERL Contacts: Dehong Liu; Petros T. Boufounos
Research Area: Computational Sensing
Brief
- Dehong Liu and Petros T. Boufounos are the recipients of the the IEEE Geoscience and Remote Sensing Society 2014 Symposium Prize Paper Award for their paper "Synthetic Aperture Imaging Using a Randomly Steered Spotlight," presented at IGARSS 2013 (TR2013-070).
NEWS International Conference on 3DTV-Conference: publication by Ming-Yu Liu and others
Date: June 29, 2013
Where: International Conference on 3DTV-Conference
Research Area: Computer Vision
Brief
- The paper "Model-Based Vehicle Pose Estimation and Tracking in Videos Using Random Forests" by Hodlmoser, M., Micusik, B., Pollegeys, M., Liu, M-Y. and Kampel, M. was presented at the International Conference on 3DTV-Conference.
NEWS CVPR 2013: 3 publications by Yuichi Taguchi, Srikumar Ramalingam, C. Oncel Tuzel, Amit K. Agrawal and Ming-Yu Liu
Date: June 23, 2013
Where: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Research Area: Computer Vision
Brief
- The papers "Single Image Calibration of Multi-Axial Imaging Systems" by Agrawal, A. and Ramalingam, S., "Joint Geodesic Upsampling of Depth Images" by Liu, M-Y, Tuzel, O. and Taguchi, Y. and "Manhattan Junction Catalogue for Spatial Reasoning of Indoor Scenes" by Ramalingam, S., Pillai, J.K., Jain, A. and Taguchi, Y. were presented at the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
NEWS ICASSP 2013: 9 publications by Jonathan Le Roux, Dehong Liu, Robert A. Cohen, Dong Tian, Shantanu D. Rane, Jianlin Guo, John R. Hershey, Shinji Watanabe, Petros T. Boufounos, Zafer Sahinoglu and Anthony Vetro
Date: May 26, 2013
Where: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
MERL Contacts: Dehong Liu; Jianlin Guo; Anthony Vetro; Petros T. Boufounos; Jonathan Le Roux
Brief
- The papers "Stereo-based Feature Enhancement Using Dictionary Learning" by Watanabe, S. and Hershey, J.R., "Effectiveness of Discriminative Training and Feature Transformation for Reverberated and Noisy Speech" by Tachioka, Y., Watanabe, S. and Hershey, J.R., "Non-negative Dynamical System with Application to Speech and Audio" by Fevotte, C., Le Roux, J. and Hershey, J.R., "Source Localization in Reverberant Environments using Sparse Optimization" by Le Roux, J., Boufounos, P.T., Kang, K. and Hershey, J.R., "A Keypoint Descriptor for Alignment-Free Fingerprint Matching" by Garg, R. and Rane, S., "Transient Disturbance Detection for Power Systems with a General Likelihood Ratio Test" by Song, JX., Sahinoglu, Z. and Guo, J., "Disparity Estimation of Misaligned Images in a Scanline Optimization Framework" by Rzeszutek, R., Tian, D. and Vetro, A., "Screen Content Coding for HEVC Using Edge Modes" by Hu, S., Cohen, R.A., Vetro, A. and Kuo, C.C.J. and "Random Steerable Arrays for Synthetic Aperture Imaging" by Liu, D. and Boufounos, P.T. were presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
NEWS WCNC 2013: publication by Philip V. Orlik, Kieran J. Parsons, Jianlin Guo and others
Date: April 7, 2013
Where: IEEE Wireless Communications and Networking Conference (WCNC)
MERL Contacts: Philip V. Orlik; Jianlin Guo; Kieran Parsons
Research Area: Communications
Brief
- The paper "Load Balanced Routing for Low Power and Lossy Networks" by Liu, X., Guo, J., Bhatti, G., Orlik, P. and Parsons, K. was presented at the IEEE Wireless Communications and Networking Conference (WCNC).
NEWS ICPR 2012: publication by Ming-Yu Liu and others
Date: November 11, 2012
Where: IEEE International Conference on Pattern Recognition (ICPR)
Research Area: Computer Vision
Brief
- The paper "A Grassmann Manifold-based Domain Adaptation Approach" by Zheng, J., Liu, M.-Y., Chellappa, R. and Phillips, P.J. was presented at the IEEE International Conference on Pattern Recognition (ICPR).
NEWS 3DIMPVT 2012: publication by Ming-Yu Liu and others
Date: October 13, 2012
Where: IEEE International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT)
Research Area: Machine Learning
Brief
- The paper "Classification and Pose Estimation of Vehicles in Videos by 3D Modeling within Discrete-Continuous Optimization" by Hodlmoser, M., Micusik, B., Liu, M.-Y., Pollefeys, M. and Kaampel, M. was presented at the IEEE International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission (3DIMPVT).
NEWS Journal of Communications: publication by Jinyun Zhang and others
Date: September 1, 2012
Where: Journal of Communications
MERL Contact: Jinyun Zhang
Research Area: Communications
Brief
- The article "Combating Interference: MU-MIMO, CoMP, and HetNet" by Liu, L., Zhang, J., Yi, Y., Li, H. and Zhang, J. was published in Journal of Communications.
NEWS IGARSS 2012: publication by Petros T. Boufounos and Dehong Liu
Date: July 22, 2012
Where: IEEE International Geoscience and Remote Sensing Symposium (IGARSS)
MERL Contacts: Dehong Liu; Petros T. Boufounos
Research Areas: Digital Video, Computational Sensing
Brief
- The paper "Pan-Sharpening with Multi-scale Wavelet Dictionary" by Liu, D. and Boufounos, P.T. was presented at the IEEE International Geoscience and Remote Sensing Symposium (IGARSS).
NEWS ICRA 2012: 3 publications by Yuichi Taguchi, Srikumar Ramalingam, Amit K. Agrawal, C. Oncel Tuzel and Ming-Yu Liu
Date: May 14, 2012
Where: IEEE International Conference on Robotics and Automation (ICRA)
Research Area: Computer Vision
Brief
- The papers "Voting-based Pose Estimation for Robotic Assembly Using a 3D Sensor" by Choi, C., Taguchi, Y., Tuzel, O., Liu, M.-Y. and Ramalingam, S., "Convex Bricks: A New Primitive for Visual Hull Modeling and Reconstruction" by Chari, V., Agrawal, A., Taguchi, Y. and Ramalingam, S. and "Coverage Optimized Active Learning for k - NN Classifiers" by Joshi, A.J., Porikli, F. and Papanikolopoulos, N. were presented at the IEEE International Conference on Robotics and Automation (ICRA).
NEWS The International Journal of Robotics Research: publication by Yuichi Taguchi, Tim K. Marks, C. Oncel Tuzel, Ming-Yu Liu and others
Date: May 8, 2012
Where: The International Journal of Robotics Research
MERL Contact: Tim K. Marks
Research Area: Computer Vision
Brief
- The article "Fast Object Localization and Pose Estimation in Heavy Clutter for Robotic Bin Picking" by Liu, M.-Y., Tuzel, O., Veeraraghavan, A., Taguchi, Y., Marks, T.K. and Chellappa, R. was published in The International Journal of Robotics Research.
NEWS ICASSP 2012: 8 publications by Petros T. Boufounos, Dehong Liu, John R. Hershey, Jonathan Le Roux and Zafer Sahinoglu
Date: March 25, 2012
Where: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
MERL Contacts: Dehong Liu; Jonathan Le Roux; Petros T. Boufounos
Brief
- The papers "Dictionary Learning Based Pan-Sharpening" by Liu, D. and Boufounos, P.T., "Multiple Dictionary Learning for Blocking Artifacts Reduction" by Wang, Y. and Porikli, F., "A Compressive Phase-Locked Loop" by Schnelle, S.R., Slavinsky, J.P., Boufounos, P.T., Davenport, M.A. and Baraniuk, R.G., "Indirect Model-based Speech Enhancement" by Le Roux, J. and Hershey, J.R., "A Clustering Approach to Optimize Online Dictionary Learning" by Rao, N. and Porikli, F., "Parametric Multichannel Adaptive Signal Detection: Exploiting Persymmetric Structure" by Wang, P., Sahinoglu, Z., Pun, M.-O. and Li, H., "Additive Noise Removal by Sparse Reconstruction on Image Affinity Nets" by Sundaresan, R. and Porikli, F. and "Depth Sensing Using Active Coherent Illumination" by Boufounos, P.T. were presented at the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
NEWS DEPEND 2011: publication by Chunjie Duan, Jinyun Zhang, Toshiaki Koike and others
Date: August 21, 2011
Where: International Conference on Dependability (DEPEND)
MERL Contacts: Jinyun Zhang; Toshiaki Koike-Akino
Research Area: Communications
Brief
- The paper "Secret Key Sharing and Rateless Coding for Practical Secure Wireless Transmission" by Liu, W., Duan, C., Wang, Y., Koike-Akino, T., Annavajjala, R. and Zhang, J. was presented at the International Conference on Dependability (DEPEND).
NEWS IGARSS 2011: publication by Petros T. Boufounos and Dehong Liu
Date: July 24, 2011
Where: IEEE International Geoscience and Remote Sensing Symposium (IGARSS)
MERL Contacts: Dehong Liu; Petros T. Boufounos
Research Area: Computational Sensing
Brief
- The paper "High Resolution SAR Imaging Using Random Pulse Timing" by Liu, D. and Boufounos, P.T. was presented at the IEEE International Geoscience and Remote Sensing Symposium (IGARSS).