Computer Vision
Extracting meaning and building representations of visual objects and events in the world.
Our main research themes cover the areas of deep learning and artificial intelligence for object and action detection, classification and scene understanding, robotic vision and object manipulation, 3D processing and computational geometry, as well as simulation of physical systems to enhance machine learning systems.
Quick Links
-
Researchers
Tim K.
Marks
Anoop
Cherian
Michael J.
Jones
Chiori
Hori
Matthew E.
Brand
Hassan
Mansour
Jonathan
Le Roux
Jay
Thornton
Anthony
Vetro
Radu
Corcodel
Devesh K.
Jha
Suhas
Lohit
Petros T.
Boufounos
Dehong
Liu
Daniel N.
Nikovski
Diego
Romeres
Ye
Wang
Siddarth
Jain
Arvind
Raghunathan
Gordon
Wichern
William S.
Yerazunis
Jose
Amaya
Toshiaki
Koike-Akino
Pedro
Miraldo
Philip V.
Orlik
Kuan-Chuan
Peng
Huifang
Sun
Yebin
Wang
-
Awards
-
AWARD Best Paper - Honorable Mention Award at WACV 2021 Date: January 6, 2021
Awarded to: Rushil Anirudh, Suhas Lohit, Pavan Turaga
MERL Contact: Suhas Lohit
Research Areas: Computational Sensing, Computer Vision, Machine LearningBrief- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
The paper proposes a novel model of natural images as a composition of small patches which are obtained from a deep generative network. This is unlike prior approaches where the networks attempt to model image-level distributions and are unable to generalize outside training distributions. The key idea in this paper is that learning patch-level statistics is far easier. As the authors demonstrate, this model can then be used to efficiently solve challenging inverse problems in imaging such as compressive image recovery and inpainting even from very few measurements for diverse natural scenes.
- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
-
AWARD MERL Researchers win Best Paper Award at ICCV 2019 Workshop on Statistical Deep Learning in Computer Vision Date: October 27, 2019
Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
MERL Contact: Tim K. Marks
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
-
AWARD CVPR 2011 Longuet-Higgins Prize Date: June 25, 2011
Awarded to: Paul A. Viola and Michael J. Jones
Awarded for: "Rapid Object Detection using a Boosted Cascade of Simple Features"
Awarded by: Conference on Computer Vision and Pattern Recognition (CVPR)
MERL Contact: Michael J. Jones
Research Area: Machine LearningBrief- Paper from 10 years ago with the largest impact on the field: "Rapid Object Detection using a Boosted Cascade of Simple Features", originally published at Conference on Computer Vision and Pattern Recognition (CVPR 2001).
See All Awards for MERL -
-
News & Events
-
NEWS MERL presenting 8 papers at ICASSP 2022 Date: May 22, 2022 - May 27, 2022
Where: Singapore
MERL Contacts: Anoop Cherian; Chiori Hori; Toshiaki Koike-Akino; Jonathan Le Roux; Tim K. Marks; Philip V. Orlik; Kuan-Chuan Peng; Pu (Perry) Wang; Gordon Wichern
Research Areas: Artificial Intelligence, Computer Vision, Signal Processing, Speech & AudioBrief- MERL researchers are presenting 8 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Singapore from May 22-27, 2022. A week of virtual presentations also took place earlier this month.
Topics to be presented include recent advances in speech recognition, audio processing, scene understanding, computational sensing, and classification.
ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.
- MERL researchers are presenting 8 papers at the IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which is being held in Singapore from May 22-27, 2022. A week of virtual presentations also took place earlier this month.
-
NEWS MERL Scientists Presenting 5 Papers at IEEE International Conference on Communications (ICC) 2022 Date: May 16, 2022 - May 20, 2022
Where: Seoul, Korea
MERL Contacts: Jianlin Guo; Kyeong Jin (K.J.) Kim; Toshiaki Koike-Akino; Philip V. Orlik; Kieran Parsons; Pu (Perry) Wang; Ye Wang
Research Areas: Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Machine Learning, Signal ProcessingBrief- MERL Connectivity & Information Processing Team scientists remotely presented 5 papers at the IEEE International Conference on Communications (ICC) 2022, held in Seoul Korea on May 16-20, 2022. Topics presented include recent advancements in communications technologies, deep learning methods, and quantum machine learning (QML). Presentation videos are also found on our YouTube channel. In addition, K. J. Kim organized "Industrial Private 5G-and-beyond Wireless Networks Workshop" at the conference.
IEEE ICC is one of two IEEE Communications Society’s flagship conferences (ICC and Globecom). Each year, close to 2,000 attendees from over 70 countries attend IEEE ICC to take advantage of a program which consists of exciting keynote session, robust technical paper sessions, innovative tutorials and workshops, and engaging industry sessions. This 5-day event is known for bringing together audiences from both industry and academia to learn about the latest research and innovations in communications and networking technology, share ideas and best practices, and collaborate on future projects.
- MERL Connectivity & Information Processing Team scientists remotely presented 5 papers at the IEEE International Conference on Communications (ICC) 2022, held in Seoul Korea on May 16-20, 2022. Topics presented include recent advancements in communications technologies, deep learning methods, and quantum machine learning (QML). Presentation videos are also found on our YouTube channel. In addition, K. J. Kim organized "Industrial Private 5G-and-beyond Wireless Networks Workshop" at the conference.
See All News & Events for Computer Vision -
-
Research Highlights
-
Internships
-
CV1862: 6D pose estimation for machine vision
The Computer Vision group at MERL, in collaboration with Mitsubishi Electric Automotive America, is seeking a highly skilled graduate student for an internship position for Online Pose Estimation of vehicles using machine vision. The research work is expected to be deployed in real-world applications such as vehicle range and pose detection. Candidates should have advanced knowledge of full-6D pose estimation from stereo imaging using both Machine Learning and geometric approaches, point cloud processing using ML and computational geometry. Exceptional programming skills are required, including ROS, C++ and Python. The expected start date is Sep/Oct 2022 and the duration of the internship is 6 months.
-
CV1703: Software development in ROS for robotic manipulation
MERL is offering an internship position for non-research software development for robotic manipulation. The scope of the internship is to develop robust ROS packages by refactoring existing experimental code. The position is open to prospective candidates with very strong programming skills in ROS (Robot Operating System) using C++ primarily and Python respectively. The selected intern will have a software engineering role rather than research oriented. The position is open to both senior undergraduate students and master students. Flexible start and end dates.
See All Internships for Computer Vision -
-
Recent Publications
- "An Empirical Analysis of Boosting Deep Networks", International Joint Conference on Neural Networks (IJCNN), July 2022.BibTeX TR2022-075 PDF Presentation
- @inproceedings{Rambhatla2022jul,
- author = {Rambhatla, Sai and Jones, Michael J. and Chellappa, Rama},
- title = {An Empirical Analysis of Boosting Deep Networks},
- booktitle = {International Joint Conference on Neural Networks (IJCNN)},
- year = 2022,
- month = jul,
- url = {https://www.merl.com/publications/TR2022-075}
- }
, - "A Unified Model for Line Projections in Catadioptric Cameras with Rotationally Symmetric Mirrors", IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022.BibTeX TR2022-084 PDF
- @inproceedings{Miraldo2022jun,
- author = {Miraldo, Pedro and Iglesias, Jose Pedro},
- title = {A Unified Model for Line Projections in Catadioptric Cameras with Rotationally Symmetric Mirrors},
- booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2022,
- month = jun,
- url = {https://www.merl.com/publications/TR2022-084}
- }
, - "PointMotionNet: Point-Wise Motion Learning for Large-Scale LiDAR Point Clouds Sequences", CVPR Workshop on Autonomous Driving, June 2022.BibTeX TR2022-083 PDF
- @inproceedings{Sullivan2022jun,
- author = {Sullivan, Alan and Wang, Jun and Li, Xiaolong and Chen, Siheng and Abbot, Lynn},
- title = {PointMotionNet: Point-Wise Motion Learning for Large-Scale LiDAR Point Clouds Sequences},
- booktitle = {CVPR Workshop on Autonomous Driving},
- year = 2022,
- month = jun,
- url = {https://www.merl.com/publications/TR2022-083}
- }
, - "Quantifying Predictive Uncertainty for Stochastic Video Synthesis from Audio", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2022.BibTeX TR2022-082 PDF
- @inproceedings{Chatterjee2022jun,
- author = {Chatterjee, Moitreya and Ahuja, Narendra and Cherian, Anoop},
- title = {Quantifying Predictive Uncertainty for Stochastic Video Synthesis from Audio},
- booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
- year = 2022,
- month = jun,
- url = {https://www.merl.com/publications/TR2022-082}
- }
, - "Learning to Synthesize Volumetric Meshes from Vision-based Tactile Imprints", IEEE International Conference on Robotics and Automation (ICRA), May 2022.BibTeX TR2022-055 PDF
- @inproceedings{Zhu2022may2,
- author = {Zhu, Xinghao and Jain, Siddarth and Tomizuka, Masayoshi and van Baar, Jeroen},
- title = {Learning to Synthesize Volumetric Meshes from Vision-based Tactile Imprints},
- booktitle = {IEEE International Conference on Robotics and Automation (ICRA)},
- year = 2022,
- month = may,
- url = {https://www.merl.com/publications/TR2022-055}
- }
, - "Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2022.BibTeX TR2022-019 PDF
- @inproceedings{Shah2022apr,
- author = {Shah, Ankit Parag and Geng, Shijie and Gao, Peng and Cherian, Anoop and Hori, Takaaki and Marks, Tim K. and Le Roux, Jonathan and Hori, Chiori},
- title = {Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2022,
- month = apr,
- url = {https://www.merl.com/publications/TR2022-019}
- }
, - "Overview of Audio Visual Scene-Aware Dialog with Reasoning Track for Natural Language Generation in DSTC10", The 10th Dialog System Technology Challenge Workshop at AAAI, February 2022.BibTeX TR2022-016 PDF
- @inproceedings{Hori2022feb,
- author = {Hori, Chiori and Shah, Ankit Parag and Geng, Shijie and Gao, Peng and Cherian, Anoop and Hori, Takaaki and Le Roux, Jonathan and Marks, Tim K.},
- title = {Overview of Audio Visual Scene-Aware Dialog with Reasoning Track for Natural Language Generation in DSTC10},
- booktitle = {The 10th Dialog System Technology Challenge Workshop at AAAI},
- year = 2022,
- month = feb,
- url = {https://www.merl.com/publications/TR2022-016}
- }
, - DSTC10-AVSD Submission System with Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning, February 2022.BibTeX TR2022-025 PDF
- @book{Shah2022feb,
- author = {Shah, Ankit Parag and Hori, Takaaki and Le Roux, Jonathan and Hori, Chiori},
- title = {DSTC10-AVSD Submission System with Reasoning using Audio-Visual Transformers with Joint Student-Teacher Learning},
- year = 2022,
- month = feb,
- url = {https://www.merl.com/publications/TR2022-025}
- }
,
- "An Empirical Analysis of Boosting Deep Networks", International Joint Conference on Neural Networks (IJCNN), July 2022.
-
Videos
-
[MERL Seminar Series Spring 2022] Self-Supervised Scene Representation Learning
-
[MERL Seminar Series Spring 2022] Learning Speech Representations with Multimodal Self-Supervision
-
HealthCam: A system for non-contact monitoring of vital signs
-
[MERL Seminar Series 2021] Learning to See by Moving: Self-supervising 3D scene representations for perception, control, and visual reasoning
-
[MERL Seminar Series 2021] Look and Listen: From Semantic to Spatial Audio-Visual Perception
-
Towards Human-Level Learning of Complex Physical Puzzles
-
Scene-Aware Interaction Technology
-
3D Object Discovery and Modeling Using Single RGB-D Images Containing Multiple Object Instances
-
Joint 3D Reconstruction of a Static Scene and Moving Objects
-
Direct Multichannel Tracking
-
FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds
-
FasTFit: A fast T-spline fitting algorithm
-
CASENet: Deep Category-Aware Semantic Edge Detection
-
Object Detection and Tracking in RGB-D SLAM via Hierarchical Feature Grouping
-
Pinpoint SLAM: A Hybrid of 2D and 3D Simultaneous Localization and Mapping for RGB-D Sensors
-
Action Detection Using A Deep Recurrent Neural Network
-
Dose optimization for particle beam therapy
-
3D Reconstruction
-
MERL Research on Autonomous Vehicles
-
Saffron - Digital Type System
-
Obstacle Detection
-
Semantic Scene Labeling
-
Robot Bin Picking
-
Sapphire - High Accuracy NC Milling Simulation
-
Deep Hierarchical Parsing for Semantic Segmentation
-
Global Local Face Upsampling Network
-
Gaussian Conditional Random Field Network for Semantic Segmentation
-
Fast Graspability Evaluation on Single Depth Maps for Bin Picking with General Grippers
-
Point-Plane SLAM for Hand-Held 3D Sensors
-
Tracking an RGB-D Camera Using Points and Planes
-
Fast Plane Extraction in Organized Point Clouds Using Agglomerative Hierarchical Clustering
-
Calibration of Non-Overlapping Cameras Using an External SLAM System
-
Voting-Based Pose Estimation for Robotic Assembly Using a 3D Sensor
-
Fast Object Localization and Pose Estimation in Heavy Clutter for Robotic Bin Picking
-
Learning to rank 3D features
-
-
Software Downloads
-
SOurce-free Cross-modal KnowledgE Transfer
-
Instance Segmentation GAN
-
Audio Visual Scene-Graph Segmentor
-
Generating Visual Dynamics from Sound and Context
-
Adversarially-Contrastive Optimal Transport
-
MotionNet
-
Contact-Implicit Trajectory Optimization
-
FoldingNet++
-
Landmarks’ Location, Uncertainty, and Visibility Likelihood
-
Gradient-based Nikaido-Isoda
-
Circular Maze Environment
-
Discriminative Subspace Pooling
-
Kernel Correlation Network
-
Fast Resampling on Point Clouds via Graphs
-
FoldingNet
-
Joint Geodesic Upsampling
-
Plane Extraction using Agglomerative Clustering
-