Computer Vision
Extracting meaning and building representations of visual objects and events in the world.
Our main research themes cover the areas of deep learning and artificial intelligence for object and action detection, classification and scene understanding, robotic vision and object manipulation, 3D processing and computational geometry, as well as simulation of physical systems to enhance machine learning systems.
Quick Links
-
Researchers
Tim K.
Marks
Anoop
Cherian
Michael J.
Jones
Chiori
Hori
Suhas
Lohit
Matthew
Brand
Hassan
Mansour
Jonathan
Le Roux
Devesh K.
Jha
Moitreya
Chatterjee
Radu
Corcodel
Siddarth
Jain
Diego
Romeres
Petros T.
Boufounos
Anthony
Vetro
Daniel N.
Nikovski
Kuan-Chuan
Peng
Ye
Wang
Dehong
Liu
Pedro
Miraldo
Arvind
Raghunathan
William S.
Yerazunis
Toshiaki
Koike-Akino
Gordon
Wichern
Jose
Amaya
Stefano
Di Cairano
Yanting
Ma
Philip V.
Orlik
Huifang
Sun
Abraham P.
Vinod
Yebin
Wang
Avishai
Weiss
Sameer
Khurana
Jing
Liu
Ryoma
Yataka
-
Awards
-
AWARD Best Paper - Honorable Mention Award at WACV 2021 Date: January 6, 2021
Awarded to: Rushil Anirudh, Suhas Lohit, Pavan Turaga
MERL Contact: Suhas Lohit
Research Areas: Computational Sensing, Computer Vision, Machine LearningBrief- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
The paper proposes a novel model of natural images as a composition of small patches which are obtained from a deep generative network. This is unlike prior approaches where the networks attempt to model image-level distributions and are unable to generalize outside training distributions. The key idea in this paper is that learning patch-level statistics is far easier. As the authors demonstrate, this model can then be used to efficiently solve challenging inverse problems in imaging such as compressive image recovery and inpainting even from very few measurements for diverse natural scenes.
- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
-
AWARD MERL Researchers win Best Paper Award at ICCV 2019 Workshop on Statistical Deep Learning in Computer Vision Date: October 27, 2019
Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
MERL Contact: Tim K. Marks
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
-
AWARD CVPR 2011 Longuet-Higgins Prize Date: June 25, 2011
Awarded to: Paul A. Viola and Michael J. Jones
Awarded for: "Rapid Object Detection using a Boosted Cascade of Simple Features"
Awarded by: Conference on Computer Vision and Pattern Recognition (CVPR)
MERL Contact: Michael J. Jones
Research Area: Machine LearningBrief- Paper from 10 years ago with the largest impact on the field: "Rapid Object Detection using a Boosted Cascade of Simple Features", originally published at Conference on Computer Vision and Pattern Recognition (CVPR 2001).
See All Awards for MERL -
-
News & Events
-
NEWS Anoop Cherian gives a podcast interview with AI Business Date: September 26, 2023
Where: Virtual
MERL Contact: Anoop Cherian
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- Anoop Cherian, a Senior Principal Research Scientist in the Computer Vision team at MERL, gave a podcast interview with award-winning journalist, Deborah Yao. Deborah is the editor of AI Business -- a leading content platform for artificial intelligence and its applications in the real world, delivering its readers up-to-the-minute insights into how AI technologies are currently affecting the global economy and society. The podcast was based on the recent research that Anoop and his colleagues did at MERL with his collaborators at MIT; this research attempts to objectively answer the pertinent question: are current deep neural networks smarter than second graders? The podcast discusses shortcomings in the recent artificial general intelligence systems with regard to their capabilities for knowledge abstraction, learning, and generalization, which are brought out by this research.
-
TALK [MERL Seminar Series 2023] Dr. Tanmay Gupta presents talk titled Visual Programming - A compositional approach to building General Purpose Vision Systems Date & Time: Tuesday, October 31, 2023; 2:00 PM
Speaker: Tanmay Gupta, Allen Institute for Artificial Intelligence
MERL Host: Moitreya Chatterjee
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningAbstractBuilding General Purpose Vision Systems (GPVs) that can perform a huge variety of tasks has been a long-standing goal for the computer vision community. However, end-to-end training of these systems to handle different modalities and tasks has proven to be extremely challenging. In this talk, I will describe a lucrative neuro-symbolic alternative to the common end-to-end learning paradigm called Visual Programming. Visual Programming is a general framework that leverages the code-generation abilities of LLMs, existing neural models, and non-differentiable programs to enable powerful applications. Some of these applications continue to remain elusive for the current generation of end-to-end trained GPVs.
See All News & Events for Computer Vision -
-
Research Highlights
-
Internships
-
SA2073: Multimodal scene-understanding
We are looking for a graduate student interested in helping advance the field of multimodal scene understanding, with a focus on scene understanding using natural language for robot dialog and/or indoor monitoring using a large language model. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for publication. Internships regularly lead to one or more publications in top-tier venues, which can later become part of the intern''s doctoral work. The ideal candidates are senior Ph.D. students with experience in deep learning for audio-visual, signal, and natural language processing. Good programming skills in Python and knowledge of deep learning frameworks such as PyTorch are essential. Multiple positions are available with flexible start date (not just Spring/Summer but throughout 2024) and duration (typically 3-6 months).
-
CV2071: Video Anomaly Detection
MERL is looking for a self-motivated intern to work on the problem of video anomaly detection. The intern will help to develop new ideas for improving the state of the art in detecting anomalous activity in videos. The ideal candidate would be a Ph.D. student with a strong background in machine learning and computer vision and some experience with video anomaly detection in particular. Proficiency in Python programming and Pytorch is necessary. The successful candidate is expected to have published at least one paper in a top-tier computer vision or machine learning venue, such as CVPR, ECCV, ICCV, ICML, ICLR, NeurIPS or AAAI. The intern will collaborate with MERL researchers to develop and test algorithms and prepare manuscripts for scientific publications. The internship is for 3 months and the start date is flexible.
-
OR2116: Collaborative robotic manipulation
MERL is offering a new research internship opportunity in the field of robotic manipulation. The position requires a robotics background, excellent programming skills and experience with Deep RL and Computer Vision. The position is open to graduate students on a PhD track only, and the length of the internship is three months with the possibility of extending if required. The intern is expected to disseminate this research in top tier scientific conferences such as RSS, IROS, ICRA etc., and if applicable, help with filing associated patents. Start and end dates are flexible.
See All Internships for Computer Vision -
-
Recent Publications
- "Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes", IEEE International Conference on Computer Vision (ICCV), October 2023.BibTeX TR2023-123 PDF
- @inproceedings{Delattre2023oct,
- author = {Delattre, Fabien and Dirnfeld, David and Nguyen, Phat and Scarano, Stephen and Jones, Michael J. and Miraldo, Pedro and Learned-Miller, Erik},
- title = {Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes},
- booktitle = {IEEE International Conference on Computer Vision (ICCV)},
- year = 2023,
- month = oct,
- url = {https://www.merl.com/publications/TR2023-123}
- }
, - "BANSAC: A dynamic BAyesian Network for adaptive SAmple Consensus", IEEE International Conference on Computer Vision (ICCV), October 2023.BibTeX TR2023-124 PDF Video Software
- @inproceedings{Miraldo2023oct,
- author = {Miraldo, Pedro and Piedade, Valter},
- title = {BANSAC: A dynamic BAyesian Network for adaptive SAmple Consensus},
- booktitle = {IEEE International Conference on Computer Vision (ICCV)},
- year = 2023,
- month = oct,
- url = {https://www.merl.com/publications/TR2023-124}
- }
, - "Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis", IEEE International Conference on Computer Vision (ICCV), October 2023.BibTeX TR2023-126 PDF Presentation
- @inproceedings{Nair2023sep,
- author = {Nair, Nithin Gopalakrishnan and Cherian, Anoop and Lohit, Suhas and Wang, Ye and Koike-Akino, Toshiaki and Patel, Vishal M. and Marks, Tim K.},
- title = {Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis},
- booktitle = {IEEE International Conference on Computer Vision (ICCV)},
- year = 2023,
- month = oct,
- url = {https://www.merl.com/publications/TR2023-126}
- }
, - "Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection", IEEE International Conference on Computer Vision Workshops (ICCV), October 2023, pp. 924-932.BibTeX TR2023-125 PDF Presentation
- @inproceedings{Sharma2023oct,
- author = {Sharma, Manish and Chatterjee, Moitreya and Peng, Kuan-Chuan and Lohit, Suhas and Jones, Michael J.},
- title = {Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection},
- booktitle = {IEEE International Conference on Computer Vision Workshops (ICCV)},
- year = 2023,
- pages = {924--932},
- month = oct,
- url = {https://www.merl.com/publications/TR2023-125}
- }
, - "Unrolled IPPG: Video Heart Rate Esitmation via Unrolling Proximal Gradient Descent", IEEE International Conference on Image Processing (ICIP), September 2023.BibTeX TR2023-116 PDF Video
- @inproceedings{Shenoy2023sep,
- author = {Shenoy, Vineet and Marks, Tim K. and Mansour, Hassan and Lohit, Suhas},
- title = {Unrolled IPPG: Video Heart Rate Esitmation via Unrolling Proximal Gradient Descent},
- booktitle = {IEEE International Conference on Image Processing (ICIP)},
- year = 2023,
- month = sep,
- url = {https://www.merl.com/publications/TR2023-116}
- }
, - "Overview of the Tenth Dialog System Technology Challenge: DSTC10", IEE/ACM Transactions on Audio, Speech, and Language Processing, DOI: 10.1109/TASLP.2023.3293030, pp. 1-14, August 2023.BibTeX TR2023-109 PDF
- @article{Yoshino2023aug,
- author = {Yoshino, Koichiro and Chen, Yun-Nung and Crook, Paul and Kottur, Satwik and Li, Jinchao and Hedayatnia, Behnam and Moon, Seungwhan and Fe, Zhengcong and Li, Zekang and Zhang, Jinchao and Fen, Yang and Zhou, Jie and Kim, Seokhwan and Liu, Yang and Jin, Di and Papangelis, Alexandros and Gopalakrishnan, Karthik and Hakkani-Tur, Dilek and Damavandi, Babak and Geramifard, Alborz and
Hori, Chiori and Shah, Ankit and Zhang, Chen and Li, Haizhou and Sedoc, João and D’Haro, Luis F. and Banchs, Rafael and Rudnicky, Alexander}, - title = {Overview of the Tenth Dialog System Technology Challenge: DSTC10},
- journal = {IEE/ACM Transactions on Audio, Speech, and Language Processing},
- year = 2023,
- pages = {1--14},
- month = aug,
- doi = {10.1109/TASLP.2023.3293030},
- issn = {2329-9290},
- url = {https://www.merl.com/publications/TR2023-109}
- }
, - "Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos", Interspeech, DOI: 10.21437/Interspeech.2023-1983, August 2023, pp. 4663-4667.BibTeX TR2023-104 PDF
- @inproceedings{Hori2023aug,
- author = {Hori, Chiori and Peng, Puyuang and Harwath, David and Liu, Xinyu and Ota, Kei and Jain, Siddarth and Corcodel, Radu and Jha, Devesh K. and Romeres, Diego and Le Roux, Jonathan},
- title = {Style-transfer based Speech and Audio-visual Scene understanding for Robot Action Sequence Acquisition from Videos},
- booktitle = {Interspeech},
- year = 2023,
- pages = {4663--4667},
- month = aug,
- doi = {10.21437/Interspeech.2023-1983},
- url = {https://www.merl.com/publications/TR2023-104}
- }
, - "EVAL: Explainable Video Anomaly Localization", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.BibTeX TR2023-071 PDF Video Presentation
- @inproceedings{Singh2023jun,
- author = {Singh, Ashish and Jones, Michael J. and Learned-Miller, Erik},
- title = {EVAL: Explainable Video Anomaly Localization},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2023,
- month = jun,
- url = {https://www.merl.com/publications/TR2023-071}
- }
,
- "Robust Frame-to-Frame Camera Rotation Estimation in Crowded Scenes", IEEE International Conference on Computer Vision (ICCV), October 2023.
-
Videos
-
[MERL Seminar Series Fall 2023] The Confluence of Vision, Language, and Robotics -
Are Deep Neural Networks SMARTer than Second Graders? -
[CVPR 2023] EVAL: Explainable Video Anomaly Localization -
[MERL Seminar Series Spring 2023] Pitfalls and Opportunities in Interpretable Machine Learning -
Human Perspective Scene Understanding via Multimodal Sensing -
[MERL Seminar Series Spring 2022] Self-Supervised Scene Representation Learning -
[MERL Seminar Series Spring 2022] Learning Speech Representations with Multimodal Self-Supervision -
HealthCam: A system for non-contact monitoring of vital signs -
[MERL Seminar Series 2021] Learning to See by Moving: Self-supervising 3D scene representations for perception, control, and visual reasoning -
[MERL Seminar Series 2021] Look and Listen: From Semantic to Spatial Audio-Visual Perception -
Towards Human-Level Learning of Complex Physical Puzzles -
Scene-Aware Interaction Technology -
3D Object Discovery and Modeling Using Single RGB-D Images Containing Multiple Object Instances -
Joint 3D Reconstruction of a Static Scene and Moving Objects -
Direct Multichannel Tracking -
FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds -
FasTFit: A fast T-spline fitting algorithm -
CASENet: Deep Category-Aware Semantic Edge Detection -
Object Detection and Tracking in RGB-D SLAM via Hierarchical Feature Grouping -
Pinpoint SLAM: A Hybrid of 2D and 3D Simultaneous Localization and Mapping for RGB-D Sensors -
Action Detection Using A Deep Recurrent Neural Network -
Obstacle Detection -
MERL Research on Autonomous Vehicles -
3D Reconstruction -
Saffron - Digital Type System -
Semantic Scene Labeling -
Robot Bin Picking -
Dose optimization for particle beam therapy -
Sapphire - High Accuracy NC Milling Simulation -
Deep Hierarchical Parsing for Semantic Segmentation -
Global Local Face Upsampling Network -
Gaussian Conditional Random Field Network for Semantic Segmentation -
Fast Graspability Evaluation on Single Depth Maps for Bin Picking with General Grippers -
Point-Plane SLAM for Hand-Held 3D Sensors -
Tracking an RGB-D Camera Using Points and Planes -
Fast Plane Extraction in Organized Point Clouds Using Agglomerative Hierarchical Clustering -
Calibration of Non-Overlapping Cameras Using an External SLAM System -
Voting-Based Pose Estimation for Robotic Assembly Using a 3D Sensor -
Fast Object Localization and Pose Estimation in Heavy Clutter for Robotic Bin Picking -
Learning to rank 3D features
-
-
Software & Data Downloads
-
BAyesian Network for adaptive SAmple Consensus -
Simple Multimodal Algorithmic Reasoning Task Dataset -
SOurce-free Cross-modal KnowledgE Transfer -
Audio-Visual-Language Embodied Navigation in 3D Environments -
Instance Segmentation GAN -
Audio Visual Scene-Graph Segmentor -
Generalized One-class Discriminative Subspaces -
Generating Visual Dynamics from Sound and Context -
Adversarially-Contrastive Optimal Transport -
MotionNet -
Contact-Implicit Trajectory Optimization -
Street Scene Dataset -
FoldingNet++ -
Landmarks’ Location, Uncertainty, and Visibility Likelihood -
Gradient-based Nikaido-Isoda -
Circular Maze Environment -
Discriminative Subspace Pooling -
Kernel Correlation Network -
Fast Resampling on Point Clouds via Graphs -
FoldingNet -
MERL Shopping Dataset -
Joint Geodesic Upsampling -
Plane Extraction using Agglomerative Clustering -
Partial Group Convolutional Neural Networks
-