Computer Vision
Extracting meaning and building representations of visual objects and events in the world.
Our main research themes cover the areas of deep learning and artificial intelligence for object and action detection, classification and scene understanding, robotic vision and object manipulation, 3D processing and computational geometry, as well as simulation of physical systems to enhance machine learning systems.
Quick Links
-
Researchers
Anoop
Cherian
Tim K.
Marks
Michael J.
Jones
Chiori
Hori
Suhas
Lohit
Hassan
Mansour
Matthew
Brand
Jonathan
Le Roux
Moitreya
Chatterjee
Devesh K.
Jha
Radu
Corcodel
Siddarth
Jain
Diego
Romeres
Petros T.
Boufounos
Anthony
Vetro
Pedro
Miraldo
Daniel N.
Nikovski
Kuan-Chuan
Peng
Ye
Wang
Dehong
Liu
Gordon
Wichern
Arvind
Raghunathan
William S.
Yerazunis
Stefano
Di Cairano
François
Germain
Sameer
Khurana
Toshiaki
Koike-Akino
Zexu
Pan
Abraham P.
Vinod
Avishai
Weiss
Jose
Amaya
Yanting
Ma
Philip V.
Orlik
Joshua
Rapp
Huifang
Sun
Pu
(Perry)
WangYebin
Wang
Jing
Liu
Ryoma
Yataka
-
Awards
-
AWARD Best Paper - Honorable Mention Award at WACV 2021 Date: January 6, 2021
Awarded to: Rushil Anirudh, Suhas Lohit, Pavan Turaga
MERL Contact: Suhas Lohit
Research Areas: Computational Sensing, Computer Vision, Machine LearningBrief- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
The paper proposes a novel model of natural images as a composition of small patches which are obtained from a deep generative network. This is unlike prior approaches where the networks attempt to model image-level distributions and are unable to generalize outside training distributions. The key idea in this paper is that learning patch-level statistics is far easier. As the authors demonstrate, this model can then be used to efficiently solve challenging inverse problems in imaging such as compressive image recovery and inpainting even from very few measurements for diverse natural scenes.
- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
-
AWARD MERL Researchers win Best Paper Award at ICCV 2019 Workshop on Statistical Deep Learning in Computer Vision Date: October 27, 2019
Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
MERL Contact: Tim K. Marks
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
-
AWARD CVPR 2011 Longuet-Higgins Prize Date: June 25, 2011
Awarded to: Paul A. Viola and Michael J. Jones
Awarded for: "Rapid Object Detection using a Boosted Cascade of Simple Features"
Awarded by: Conference on Computer Vision and Pattern Recognition (CVPR)
MERL Contact: Michael J. Jones
Research Area: Machine LearningBrief- Paper from 10 years ago with the largest impact on the field: "Rapid Object Detection using a Boosted Cascade of Simple Features", originally published at Conference on Computer Vision and Pattern Recognition (CVPR 2001).
See All Awards for MERL -
-
News & Events
-
TALK [MERL Seminar Series 2024] Melanie Mitchell presents talk titled "The Debate Over 'Understanding' in AI's Large Language Models" Date & Time: Tuesday, February 13, 2024; 1:00 PM
Speaker: Melanie Mitchell, Santa Fe Institute
MERL Host: Suhas Lohit
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Human-Computer InteractionAbstract- I will survey a current, heated debate in the AI research community on whether large pre-trained language models can be said to "understand" language -- and the physical and social situations language encodes -- in any important sense. I will describe arguments that have been made for and against such understanding, and, more generally, will discuss what methods can be used to fairly evaluate understanding and intelligence in AI systems. I will conclude with key questions for the broader sciences of intelligence that have arisen in light of these discussions.
-
TALK [MERL Seminar Series 2023] Dr. Kristina Monakhova presents talk titled Robust and Physics-informed machine learning for low light imaging Date & Time: Tuesday, November 28, 2023; 12:00 PM
Speaker: Kristina Monakhova, MIT and Cornell
MERL Host: Joshua Rapp
Research Areas: Computational Sensing, Computer Vision, Machine Learning, Signal ProcessingAbstract- Imaging in low light settings is extremely challenging due to low photon counts, both in photography and in microscopy. In photography, imaging under low light, high gain settings often results in highly structured, non-Gaussian sensor noise that’s hard to characterize or denoise. In this talk, we address this by developing a GAN-tuned physics-based noise model to more accurately represent camera noise at the lowest light, and highest gain settings. Using this noise model, we train a video denoiser using synthetic data and demonstrate photorealistic videography at starlight (submillilux levels of illumination) for the first time.
For multiphoton microscopy, which is a form a scanning microscopy, there’s a trade-off between field of view, phototoxicity, acquisition time, and image quality, often resulting in noisy measurements. While deep learning-based methods have shown compelling denoising performance, can we trust these methods enough for critical scientific and medical applications? In the second part of this talk, I’ll introduce a learned, distribution-free uncertainty quantification technique that can both denoise and predict pixel-wise uncertainty to gauge how much we can trust our denoiser’s performance. Furthermore, we propose to leverage this learned, pixel-wise uncertainty to drive an adaptive acquisition technique that rescans only the most uncertain regions of a sample. With our sample and algorithm-informed adaptive acquisition, we demonstrate a 120X improvement in total scanning time and total light dose for multiphoton microscopy, while successfully recovering fine structures within the sample.
- Imaging in low light settings is extremely challenging due to low photon counts, both in photography and in microscopy. In photography, imaging under low light, high gain settings often results in highly structured, non-Gaussian sensor noise that’s hard to characterize or denoise. In this talk, we address this by developing a GAN-tuned physics-based noise model to more accurately represent camera noise at the lowest light, and highest gain settings. Using this noise model, we train a video denoiser using synthetic data and demonstrate photorealistic videography at starlight (submillilux levels of illumination) for the first time.
See All News & Events for Computer Vision -
-
Research Highlights
-
Internships
-
ST2083: Deep Learning for Radar Perception
The Computation Sensing team at MERL is seeking a highly motivated intern to conduct fundamental research in radar perception. Expertise in deep learning-based object detection, multiple object tracking, data association, and representation learning (detection points, heatmaps, and raw radar waveforms) is required. Previous hands-on experience on open indoor/outdoor radar datasets is a plus. Familiarity with the concept of FMCW, MIMO, and range-Doppler-angle spectrum is an asset. The intern will collaborate with a small group of MERL researchers to develop novel algorithms, design experiments with MERL in-house testbed, and prepare results for patents and publication. The expected duration of the internship is 3 months with a flexible start date.
-
CV2119: Conditional Video Generation
We seek a highly motivated intern to conduct original research in generative models for conditional video generation. We are interested in applications to various tasks such as video generation from text, images, and diagrams. The successful candidate will collaborate with MERL researchers to design and implement new models, conduct experiments, and prepare results for publication. The candidate should be a PhD student (or postdoc) in computer vision and machine learning with a strong publication record including at least one paper in a top-tier computer vision or machine learning venue such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS, AAAI, or TPAMI. Strong programming skills, experience developing and implementing new models in deep learning platforms such as PyTorch, and broad knowledge of machine learning and deep learning methods are expected, including experience in the latest advances in conditional video generation. Start date is flexible; duration should be at least 3 months.
-
CV2089: Visual Localization and Mapping
MERL is looking for a highly motivated intern to work on an original research project on visual localization and mapping. A strong background in 3D computer vision is required. Experience in robot vision and/or deep learning will be valued. The successful candidate is expected to have published at least one paper in a top-tier computer vision or robotics venues, such as CVPR, ECCV, ICCV, ICRA, IROS, or RSS, along with solid programming skills in Python and/or C/C++. The position is available for graduate students on a Ph.D. track. Duration and start dates are flexible.
See All Internships for Computer Vision -
-
Recent Publications
- "Late Audio-Visual Fusion for In-The-Wild Speaker Diarization", Hands-free Speech Communication and Microphone Arrays (HSCMA), April 2024.BibTeX TR2024-029 PDF
- @inproceedings{Pan2024apr,
- author = {Pan, Zexu and Wichern, Gordon and Germain, François G and Subramanian, Aswin and Le Roux, Jonathan},
- title = {Late Audio-Visual Fusion for In-The-Wild Speaker Diarization},
- booktitle = {Hands-free Speech Communication and Microphone Arrays (HSCMA)},
- year = 2024,
- month = apr,
- url = {https://www.merl.com/publications/TR2024-029}
- }
, - "Oriented-grid Encoder for 3D Implicit Representations", International Conference on 3D Vision (3DV), March 2024.BibTeX TR2024-031 PDF
- @inproceedings{Gaur2024mar,
- author = {Gaur, Arihant and Pais, Goncalo and Miraldo, Pedro},
- title = {Oriented-grid Encoder for 3D Implicit Representations},
- booktitle = {International Conference on 3D Vision (3DV)},
- year = 2024,
- month = mar,
- url = {https://www.merl.com/publications/TR2024-031}
- }
, - "Single-pixel imaging of dynamic flows using Neural ODE regularization", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2024.BibTeX TR2024-024 PDF
- @inproceedings{Sholokhov2024mar,
- author = {Sholokhov, Aleksei and Rapp, Joshua and Nabi, Saleh and Brunton, Steven and Kutz, Nathan and Mansour, Hassan},
- title = {Single-pixel imaging of dynamic flows using Neural ODE regularization},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2024,
- month = mar,
- url = {https://www.merl.com/publications/TR2024-024}
- }
, - "Wi-Fi based Indoor Monitoring Enhanced by Multimodal Fusion", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2024.BibTeX TR2024-012 PDF
- @inproceedings{Hori2024mar,
- author = {Hori, Chiori and Wang, Pu and Rahman, Mahbub and Vaca-Rubio, Cristian and Khurana, Sameer and Cherian, Anoop and Le Roux, Jonathan},
- title = {Wi-Fi based Indoor Monitoring Enhanced by Multimodal Fusion},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2024,
- month = mar,
- url = {https://www.merl.com/publications/TR2024-012}
- }
, - "Lunar Landing with Feasible Divert using Controllable Sets", AIAA SciTech, DOI: 10.2514/6.2024-0324, January 2024, pp. AIAA 2024-0324.BibTeX TR2024-004 PDF
- @inproceedings{Srinivas2024jan,
- author = {Srinivas, Neeraj and Vinod, Abraham P. and Di Cairano, Stefano and Weiss, Avishai},
- title = {Lunar Landing with Feasible Divert using Controllable Sets},
- booktitle = {AIAA SCITECH 2024 Forum},
- year = 2024,
- pages = {AIAA 2024--0324},
- month = jan,
- doi = {10.2514/6.2024-0324},
- url = {https://www.merl.com/publications/TR2024-004}
- }
, - "Pixel-Grounded Prototypical Part Networks", IEEE Winter Conference on Applications of Computer Vision (WACV), January 2024.BibTeX TR2024-002 PDF Presentation
- @inproceedings{Carmichael2024jan,
- author = {Carmichael, Zachariah and Jones, Lohit, Suhas and Cherian, Anoop and Michael J. and Scheirer, Walter},
- title = {Pixel-Grounded Prototypical Part Networks},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2024,
- month = jan,
- url = {https://www.merl.com/publications/TR2024-002}
- }
, - "CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments", AAAI Conference on Artificial Intelligence, December 2023.BibTeX TR2023-154 PDF
- @inproceedings{Liu2023dec2,
- author = {Liu, Xiulong and Paul, Sudipta and Chatterjee, Moitreya and Cherian, Anoop},
- title = {CAVEN: An Embodied Conversational Agent for Efficient Audio-Visual Navigation in Noisy Environments},
- booktitle = {AAAI Conference on Artificial Intelligence},
- year = 2023,
- month = dec,
- url = {https://www.merl.com/publications/TR2023-154}
- }
, - "Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction", IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), DOI: 10.1109/ASRU57964.2023.10389618, December 2023.BibTeX TR2023-152 PDF
- @inproceedings{Pan2023dec2,
- author = {Pan, Zexu and Wichern, Gordon and Masuyama, Yoshiki and Germain, François G and Khurana, Sameer and Hori, Chiori and Le Roux, Jonathan},
- title = {Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction},
- booktitle = {IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)},
- year = 2023,
- month = dec,
- doi = {10.1109/ASRU57964.2023.10389618},
- isbn = {979-8-3503-0689-7},
- url = {https://www.merl.com/publications/TR2023-152}
- }
,
- "Late Audio-Visual Fusion for In-The-Wild Speaker Diarization", Hands-free Speech Communication and Microphone Arrays (HSCMA), April 2024.
-
Videos
-
Software & Data Downloads
-
Pixel-Grounded Prototypical Part Networks -
BAyesian Network for adaptive SAmple Consensus -
Simple Multimodal Algorithmic Reasoning Task Dataset -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
MotionNet -
Street Scene Dataset -
FoldingNet++ -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Gradient-based Nikaido-Isoda -
Circular Maze Environment -
Discriminative Subspace Pooling -
Kernel Correlation Network -
Fast Resampling on Point Clouds via Graphs -
FoldingNet -
MERL Shopping Dataset -
Joint Geodesic Upsampling -
Plane Extraction using Agglomerative Clustering -
Partial Group Convolutional Neural Networks
-