Computer Vision
Extracting meaning and building representations of visual objects and events in the world.
Our main research themes cover the areas of deep learning and artificial intelligence for object and action detection, classification and scene understanding, robotic vision and object manipulation, 3D processing and computational geometry, as well as simulation of physical systems to enhance machine learning systems.
Quick Links
-
Researchers
Tim K.
Marks
Anoop
Cherian
Michael J.
Jones
Chiori
Hori
Matthew
Brand
Hassan
Mansour
Suhas
Lohit
Jonathan
Le Roux
Petros T.
Boufounos
Devesh K.
Jha
Anthony
Vetro
Radu
Corcodel
Siddarth
Jain
Daniel N.
Nikovski
Diego
Romeres
Dehong
Liu
Kuan-Chuan
Peng
Ye
Wang
Arvind
Raghunathan
William S.
Yerazunis
Pedro
Miraldo
Gordon
Wichern
Jose
Amaya
Stefano
Di Cairano
Toshiaki
Koike-Akino
Yanting
Ma
Philip V.
Orlik
Kei
Ota
Huifang
Sun
Abraham P.
Vinod
Yebin
Wang
Avishai
Weiss
Moitreya
Chatterjee
Jing
Liu
-
Awards
-
AWARD Best Paper - Honorable Mention Award at WACV 2021 Date: January 6, 2021
Awarded to: Rushil Anirudh, Suhas Lohit, Pavan Turaga
MERL Contact: Suhas Lohit
Research Areas: Computational Sensing, Computer Vision, Machine LearningBrief- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
The paper proposes a novel model of natural images as a composition of small patches which are obtained from a deep generative network. This is unlike prior approaches where the networks attempt to model image-level distributions and are unable to generalize outside training distributions. The key idea in this paper is that learning patch-level statistics is far easier. As the authors demonstrate, this model can then be used to efficiently solve challenging inverse problems in imaging such as compressive image recovery and inpainting even from very few measurements for diverse natural scenes.
- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
-
AWARD MERL Researchers win Best Paper Award at ICCV 2019 Workshop on Statistical Deep Learning in Computer Vision Date: October 27, 2019
Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
MERL Contact: Tim K. Marks
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
-
AWARD CVPR 2011 Longuet-Higgins Prize Date: June 25, 2011
Awarded to: Paul A. Viola and Michael J. Jones
Awarded for: "Rapid Object Detection using a Boosted Cascade of Simple Features"
Awarded by: Conference on Computer Vision and Pattern Recognition (CVPR)
MERL Contact: Michael J. Jones
Research Area: Machine LearningBrief- Paper from 10 years ago with the largest impact on the field: "Rapid Object Detection using a Boosted Cascade of Simple Features", originally published at Conference on Computer Vision and Pattern Recognition (CVPR 2001).
See All Awards for MERL -
-
News & Events
-
NEWS MERL Researchers Present Thirteen Papers at the 2023 IEEE International Conference on Robotics and Automation (ICRA) Date: May 29, 2023 - June 2, 2023
Where: 2023 IEEE International Conference on Robotics and Automation (ICRA)
MERL Contacts: Anoop Cherian; Radu Corcodel; Siddarth Jain; Devesh K. Jha; Toshiaki Koike-Akino; Tim K. Marks; Daniel N. Nikovski; Kei Ota; Arvind Raghunathan; Diego Romeres
Research Areas: Computer Vision, Machine Learning, Optimization, RoboticsBrief- MERL researchers will present thirteen papers, including eight main conference papers and five workshop papers, at the 2023 IEEE International Conference on Robotics and Automation (ICRA) to be held in London, UK from May 29 to June 2. ICRA is one of the largest and most prestigious conferences in the robotics community. The papers cover a broad set of topics in Robotics including estimation, manipulation, vision-based object recognition and segmentation, tactile estimation and tool manipulation, robotic food handling, robot skill learning, and model-based reinforcement learning.
In addition to the paper presentations, MERL robotics researchers will also host an exhibition booth and look forward to discussing our research with visitors.
- MERL researchers will present thirteen papers, including eight main conference papers and five workshop papers, at the 2023 IEEE International Conference on Robotics and Automation (ICRA) to be held in London, UK from May 29 to June 2. ICRA is one of the largest and most prestigious conferences in the robotics community. The papers cover a broad set of topics in Robotics including estimation, manipulation, vision-based object recognition and segmentation, tactile estimation and tool manipulation, robotic food handling, robot skill learning, and model-based reinforcement learning.
-
TALK [MERL Seminar Series 2023] Dr. Suraj Srinivas presents talk titled Pitfalls and Opportunities in Interpretable Machine Learning Date & Time: Tuesday, March 14, 2023; 1:00 PM
Speaker: Suraj Srinivas, Harvard University
MERL Host: Suhas Lohit
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningAbstractIn this talk, I will discuss our recent research on understanding post-hoc interpretability. I will begin by introducing a characterization of post-hoc interpretability methods as local function approximators, and the implications of this viewpoint, including a no-free-lunch theorem for explanations. Next, we shall challenge the assumption that post-hoc explanations provide information about a model's discriminative capabilities p(y|x) and instead demonstrate that many common methods instead rely on a conditional generative model p(x|y). This observation underscores the importance of being cautious when using such methods in practice. Finally, I will propose to resolve this via regularization of model structure, specifically by training low curvature neural networks, resulting in improved model robustness and stable gradients.
See All News & Events for Computer Vision -
-
Research Highlights
-
Recent Publications
- "Generalizable Human-Robot Collaborative Assembly Using Imitation Learning and Force Control", European Control Conference (ECC), May 2023.BibTeX TR2023-065 PDF
- @inproceedings{Jha2023may,
- author = {Jha, Devesh K. and Jain, Siddarth and Romeres, Diego and Yerazunis, William S. and Nikovski, Daniel},
- title = {Generalizable Human-Robot Collaborative Assembly Using Imitation Learning and Force Control},
- booktitle = {European Control Conference (ECC)},
- year = 2023,
- month = may,
- url = {https://www.merl.com/publications/TR2023-065}
- }
, - "MPC with Integrated Evasive Maneuvers for Failure-safe Automated Driving", American Control Conference (ACC), May 2023.BibTeX TR2023-055 PDF
- @inproceedings{Skibik2023may,
- author = {Skibik, Terrence and Vinod, Abraham P. and Weiss, Avishai and Di Cairano, Stefano},
- title = {MPC with Integrated Evasive Maneuvers for Failure-safe Automated Driving},
- booktitle = {American Control Conference (ACC)},
- year = 2023,
- month = may,
- url = {https://www.merl.com/publications/TR2023-055}
- }
, - "Discriminative 3D Shape Modeling for Few-Shot Instance Segmentation", IEEE International Conference on Robotics and Automation (ICRA), May 2023.BibTeX TR2023-010 PDF
- @inproceedings{Cherian2023may,
- author = {Cherian, Anoop and Jain, Siddarth and Marks, Tim K. and Sullivan, Alan},
- title = {Discriminative 3D Shape Modeling for Few-Shot Instance Segmentation},
- booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
- year = 2023,
- month = may,
- url = {https://www.merl.com/publications/TR2023-010}
- }
, - "H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding Object Articulations from Interactions", IEEE International Conference on Robotics and Automation (ICRA), May 2023.BibTeX TR2023-009 PDF
- @inproceedings{Ota2023may,
- author = {Ota, Kei and Tung, Hsiao-Yu and Smith, Kevin and Cherian, Anoop and Marks, Tim K. and Sullivan, Alan and Kanezaki, Asako and Tenenbaum, Joshua B.},
- title = {H-SAUR: Hypothesize, Simulate, Act, Update, and Repeat for Understanding Object Articulations from Interactions},
- booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
- year = 2023,
- month = may,
- url = {https://www.merl.com/publications/TR2023-009}
- }
, - "HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), May 2023.BibTeX TR2023-035 PDF
- @inproceedings{Shah2023may,
- author = {Shah, Anshul and Roy, Aniket and Shah, Ketul and Mishra, Shlok Kumar and Jacobs, David and Cherian, Anoop and Chellappa, Rama},
- title = {HaLP: Hallucinating Latent Positives for Skeleton-based Self-Supervised Learning of Actions},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2023,
- month = may,
- url = {https://www.merl.com/publications/TR2023-035}
- }
, - "Aligning Step-by-Step Instructional Diagrams to Video Demonstrations", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), May 2023.BibTeX TR2023-034 PDF
- @inproceedings{Zhang2023may,
- author = {Zhang, Jiahao and Cherian, Anoop and Liu, Yanbin and Shabat, Itzik Ben and Rodriguez, Cristian and Gould, Stephen},
- title = {Aligning Step-by-Step Instructional Diagrams to Video Demonstrations},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2023,
- month = may,
- url = {https://www.merl.com/publications/TR2023-034}
- }
, - "Robust Time Series Recovery and Classification Using Test-time Noise Simulator Networks", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), May 2023.BibTeX TR2023-021 PDF Presentation
- @inproceedings{Jeon2023may,
- author = {Jeon, Eun Som and Lohit, Suhas and Anirudh, Rushil and Turaga, Pavan},
- title = {Robust Time Series Recovery and Classification Using Test-time Noise Simulator Networks},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2023,
- month = may,
- url = {https://www.merl.com/publications/TR2023-021}
- }
, - "Are Deep Neural Networks SMARTer than Second Graders?", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), March 2023.BibTeX TR2023-014 PDF Data
- @inproceedings{Cherian2023mar,
- author = {Cherian, Anoop and Peng, Kuan-Chuan and Lohit, Suhas and Smith, Kevin and Tenenbaum, Joshua B.},
- title = {Are Deep Neural Networks SMARTer than Second Graders?},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2023,
- month = mar,
- url = {https://www.merl.com/publications/TR2023-014}
- }
,
- "Generalizable Human-Robot Collaborative Assembly Using Imitation Learning and Force Control", European Control Conference (ECC), May 2023.
-
Videos
-
Are Deep Neural Networks SMARTer than Second Graders?
-
[CVPR 2023] EVAL: Explainable Video Anomaly Localization
-
[MERL Seminar Series Spring 2023] Pitfalls and Opportunities in Interpretable Machine Learning
-
Human Perspective Scene Understanding via Multimodal Sensing
-
[MERL Seminar Series Spring 2022] Self-Supervised Scene Representation Learning
-
[MERL Seminar Series Spring 2022] Learning Speech Representations with Multimodal Self-Supervision
-
HealthCam: A system for non-contact monitoring of vital signs
-
[MERL Seminar Series 2021] Learning to See by Moving: Self-supervising 3D scene representations for perception, control, and visual reasoning
-
[MERL Seminar Series 2021] Look and Listen: From Semantic to Spatial Audio-Visual Perception
-
Towards Human-Level Learning of Complex Physical Puzzles
-
Scene-Aware Interaction Technology
-
3D Object Discovery and Modeling Using Single RGB-D Images Containing Multiple Object Instances
-
Joint 3D Reconstruction of a Static Scene and Moving Objects
-
Direct Multichannel Tracking
-
FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds
-
FasTFit: A fast T-spline fitting algorithm
-
CASENet: Deep Category-Aware Semantic Edge Detection
-
Object Detection and Tracking in RGB-D SLAM via Hierarchical Feature Grouping
-
Pinpoint SLAM: A Hybrid of 2D and 3D Simultaneous Localization and Mapping for RGB-D Sensors
-
Action Detection Using A Deep Recurrent Neural Network
-
Saffron - Digital Type System
-
MERL Research on Autonomous Vehicles
-
3D Reconstruction
-
Obstacle Detection
-
Semantic Scene Labeling
-
Robot Bin Picking
-
Dose optimization for particle beam therapy
-
Sapphire - High Accuracy NC Milling Simulation
-
Deep Hierarchical Parsing for Semantic Segmentation
-
Global Local Face Upsampling Network
-
Gaussian Conditional Random Field Network for Semantic Segmentation
-
Fast Graspability Evaluation on Single Depth Maps for Bin Picking with General Grippers
-
Point-Plane SLAM for Hand-Held 3D Sensors
-
Tracking an RGB-D Camera Using Points and Planes
-
Fast Plane Extraction in Organized Point Clouds Using Agglomerative Hierarchical Clustering
-
Calibration of Non-Overlapping Cameras Using an External SLAM System
-
Voting-Based Pose Estimation for Robotic Assembly Using a 3D Sensor
-
Fast Object Localization and Pose Estimation in Heavy Clutter for Robotic Bin Picking
-
Learning to rank 3D features
-
-
Downloads
-
Simple Multimodal Algorithmic Reasoning Task Dataset
-
SOurce-free Cross-modal KnowledgE Transfer
-
Audio-Visual-Language Embodied Navigation in 3D Environments
-
Instance Segmentation GAN
-
Audio Visual Scene-Graph Segmentor
-
Generalized One-class Discriminative Subspaces
-
Generating Visual Dynamics from Sound and Context
-
Adversarially-Contrastive Optimal Transport
-
MotionNet
-
Contact-Implicit Trajectory Optimization
-
Street Scene Dataset
-
FoldingNet++
-
Landmarks’ Location, Uncertainty, and Visibility Likelihood
-
Gradient-based Nikaido-Isoda
-
Circular Maze Environment
-
Discriminative Subspace Pooling
-
Kernel Correlation Network
-
Fast Resampling on Point Clouds via Graphs
-
FoldingNet
-
MERL Shopping Dataset
-
Joint Geodesic Upsampling
-
Plane Extraction using Agglomerative Clustering
-
Partial Group Convolutional Neural Networks
-