Computer Vision
Extracting meaning and building representations of visual objects and events in the world.
Our main research themes cover the areas of deep learning and artificial intelligence for object and action detection, classification and scene understanding, robotic vision and object manipulation, 3D processing and computational geometry, as well as simulation of physical systems to enhance machine learning systems.
Quick Links
-
Researchers
Jeroen
van Baar
Tim
Marks
Michael
Jones
Anoop
Cherian
Alan
Sullivan
Matthew
Brand
Chiori
Hori
Hassan
Mansour
Ronald
Perry
Takaaki
Hori
Jay
Thornton
Radu
Corcodel
Petros
Boufounos
Devesh
Jha
Dehong
Liu
Suhas
Lohit
Daniel
Nikovski
Arvind
Raghunathan
Diego
Romeres
Anthony
Vetro
Ye
Wang
Gordon
Wichern
William
Yerazunis
Siddarth
Jain
Toshiaki
Koike-Akino
Jonathan
Le Roux
Philip
Orlik
Huifang
Sun
Yebin
Wang
Varun
Haritsa
Kuan-Chuan
Peng
-
Awards
-
AWARD Best Paper - Honorable Mention Award at WACV 2021 Date: January 6, 2021
Awarded to: Rushil Anirudh, Suhas Lohit, Pavan Turaga
MERL Contact: Suhas Lohit
Research Areas: Computational Sensing, Computer Vision, Machine LearningBrief- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
The paper proposes a novel model of natural images as a composition of small patches which are obtained from a deep generative network. This is unlike prior approaches where the networks attempt to model image-level distributions and are unable to generalize outside training distributions. The key idea in this paper is that learning patch-level statistics is far easier. As the authors demonstrate, this model can then be used to efficiently solve challenging inverse problems in imaging such as compressive image recovery and inpainting even from very few measurements for diverse natural scenes.
- A team of researchers from Mitsubishi Electric Research Laboratories (MERL), Lawrence Livermore National Laboratory (LLNL) and Arizona State University (ASU) received the Best Paper Honorable Mention Award at WACV 2021 for their paper "Generative Patch Priors for Practical Compressive Image Recovery".
-
AWARD MERL Researchers win Best Paper Award at ICCV 2019 Workshop on Statistical Deep Learning in Computer Vision Date: October 27, 2019
Awarded to: Abhinav Kumar, Tim K. Marks, Wenxuan Mou, Chen Feng, Xiaoming Liu
MERL Contact: Tim Marks
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- MERL researcher Tim Marks, former MERL interns Abhinav Kumar and Wenxuan Mou, and MERL consultants Professor Chen Feng (NYU) and Professor Xiaoming Liu (MSU) received the Best Oral Paper Award at the IEEE/CVF International Conference on Computer Vision (ICCV) 2019 Workshop on Statistical Deep Learning in Computer Vision (SDL-CV) held in Seoul, Korea. Their paper, entitled "UGLLI Face Alignment: Estimating Uncertainty with Gaussian Log-Likelihood Loss," describes a method which, given an image of a face, estimates not only the locations of facial landmarks but also the uncertainty of each landmark location estimate.
-
AWARD R&D100 award for Deep Learning-based Water Detector Date: November 16, 2018
Awarded to: Ziming Zhang, Alan Sullivan, Hideaki Maehara, Kenji Taira, Kazuo Sugimoto
MERL Contact: Alan Sullivan
Research Areas: Artificial Intelligence, Computer Vision, Machine LearningBrief- Researchers and developers from MERL, Mitsubishi Electric and Mitsubishi Electric Engineering (MEE) have been recognized with an R&D100 award for the development of a deep learning-based water detector. Automatic detection of water levels in rivers and streams is critical for early warning of flash flooding. Existing systems require a height gauge be placed in the river or stream, something that is costly and sometimes impossible. The new deep learning-based water detector uses only images from a video camera along with 3D measurements of the river valley to determine water levels and warn of potential flooding. The system is robust to lighting and weather conditions working well during the night as well as during fog or rain. Deep learning is a relatively new technique that uses neural networks and AI that are trained from real data to perform human-level recognition tasks. This work is powered by Mitsubishi Electric's Maisart AI technology.
See All Awards for Computer Vision -
-
News & Events
-
NEWS Chiori Hori will give keynote on scene understanding via multimodal sensing at AI Electronics Symposium Date: February 15, 2021
Where: The 2nd International Symposium on AI Electronics
MERL Contact: Chiori Hori
Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & AudioBrief- Chiori Hori, a Senior Principal Researcher in MERL's Speech and Audio Team, will be a keynote speaker at the 2nd International Symposium on AI Electronics, alongside Alex Acero, Senior Director of Apple Siri, Roberto Cipolla, Professor of Information Engineering at the University of Cambridge, and Hiroshi Amano, Professor at Nagoya University and winner of the Nobel prize in Physics for his work on blue light-emitting diodes. The symposium, organized by Tohoku University, will be held online on February 15, 2021, 10am-4pm (JST).
Chiori's talk, titled "Human Perspective Scene Understanding via Multimodal Sensing", will present MERL's work towards the development of scene-aware interaction. One important piece of technology that is still missing for human-machine interaction is natural and context-aware interaction, where machines understand their surrounding scene from the human perspective, and they can share their understanding with humans using natural language. To bridge this communications gap, MERL has been working at the intersection of research fields such as spoken dialog, audio-visual understanding, sensor signal understanding, and robotics technologies in order to build a new AI paradigm, called scene-aware interaction, that enables machines to translate their perception and understanding of a scene and respond to it using natural language to interact more effectively with humans. In this talk, the technologies will be surveyed, and an application for future car navigation will be introduced.
- Chiori Hori, a Senior Principal Researcher in MERL's Speech and Audio Team, will be a keynote speaker at the 2nd International Symposium on AI Electronics, alongside Alex Acero, Senior Director of Apple Siri, Roberto Cipolla, Professor of Information Engineering at the University of Cambridge, and Hiroshi Amano, Professor at Nagoya University and winner of the Nobel prize in Physics for his work on blue light-emitting diodes. The symposium, organized by Tohoku University, will be held online on February 15, 2021, 10am-4pm (JST).
-
EVENT MERL Virtual Open House 2020 Date & Time: Wednesday, December 9, 2020; 1:00-5:00PM EST
MERL Contacts: Elizabeth Phillips; Jeroen van Baar; Anthony Vetro
Location: Virtual
Research Areas: Applied Physics, Artificial Intelligence, Communications, Computational Sensing, Computer Vision, Control, Data Analytics, Dynamical Systems, Electric Systems, Electronic and Photonic Devices, Machine Learning, Multi-Physical Modeling, Optimization, Robotics, Signal Processing, Speech & AudioBrief- MERL will host a virtual open house on December 9, 2020. Live sessions will be held from 1-5pm EST, including an overview of recent activities by our research groups and a talk by Prof. Pierre Moulin of University of Illinois at Urbana-Champaign on adversarial machine learning. Registered attendees will also be able to browse our virtual booths at their convenience and connect with our research staff on engagement opportunities including internship, post-doc and research scientist openings, as well as visiting faculty positions.
Registration: https://mailchi.mp/merl/merl-virtual-open-house-2020
Schedule: https://www.merl.com/events/voh20
Current internship and employment openings:
https://www.merl.com/internship/openings
https://www.merl.com/employment/employment
Information about working at MERL:
https://www.merl.com/employment
- MERL will host a virtual open house on December 9, 2020. Live sessions will be held from 1-5pm EST, including an overview of recent activities by our research groups and a talk by Prof. Pierre Moulin of University of Illinois at Urbana-Champaign on adversarial machine learning. Registered attendees will also be able to browse our virtual booths at their convenience and connect with our research staff on engagement opportunities including internship, post-doc and research scientist openings, as well as visiting faculty positions.
See All News & Events for Computer Vision -
-
Research Highlights
-
Internships
-
CV1546: Vibration analysis in video sequences
MERL is looking for a self-motivated intern to work on vibration analysis in video sequences. The ideal candidate would be a Ph.D. student with a strong background in machine learning, optimization and computer vision. Experience in computational photography and MATLAB/Python is a plus. You are expected to collaborate with MERL researchers to develop algorithms and prepare manuscripts for scientific publications. The internship is for a minimum of 3 months and the start date is flexible. This internship is preferred to be onsite at MERL, but may be done remotely where you live if the COVID pandemic makes it necessary.
-
CV1568: Uncertainty Estimation in 3D Face Landmark Tracking
We are seeking a highly motivated intern to conduct original research extending MERL's work on uncertainty estimation in face landmark localization (the LUVLi model) to the domains of 3D faces and video sequences. The successful candidate will collaborate with MERL researchers to design and implement new models, conduct experiments, and prepare results for publication. The candidate should be a PhD student in computer vision and machine learning with a strong publication record. Experience in deep learning-based face landmark estimation, video tracking, and 3D face modeling is preferred. Strong programming skills, experience developing and implementing new models in deep learning platforms such as PyTorch, and broad knowledge of machine learning and deep learning methods are expected.
-
CV1569: Robot learning from videos of human demonstrations
MERL is looking for a highly motivated and qualified intern to work on developing algorithms for robot learning from videos of human demonstrations. The ideal candidate would be a current Ph.D. student with a strong background in computer vision, deep learning, and robotics. Familiarity with imitation learning, learning from demonstrations (LfD), reinforcement learning, and machine learning for robotics will be valued. Proficiency in Python programming is necessary and experience in working with a physics engine simulator like Mujoco or pyBullet is a plus. A successful candidate will collaborate with MERL researchers and publication of the relevant results is expected. Start date is flexible and the expected duration of the internship is 3-4 months. Interested candidates are encouraged to apply with their recent CV and list of publications in related topics. This internship is preferred to be onsite at MERL, but may be done remotely where you live if the COVID pandemic makes it necessary.
See All Internships for Computer Vision -
-
Openings
See All Openings at MERL -
Recent Publications
- "Fusion-Based Image Correlations Framework For Strain Measurement", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), February 2021.BibTeX TR2021-012 PDF
- @inproceedings{Shi2021feb,
- author = {Shi, Laixi and Liu, Dehong and Umeda, Masaki and Hana, Norihiko},
- title = {Fusion-Based Image Correlations Framework For Strain Measurement},
- booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
- year = 2021,
- month = feb,
- url = {https://www.merl.com/publications/TR2021-012}
- }
, - "Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers", AAAI Conference on Artificial Intelligence, February 2021.BibTeX TR2021-010 PDF
- @inproceedings{Geng2021feb,
- author = {Geng, Shijie and Gao, Peng and Chatterjee, Moitreya and Hori, Chiori and Le Roux, Jonathan and Zhang, Yongfeng and Li, Hongsheng and Cherian, Anoop},
- title = {Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers},
- booktitle = {AAAI Conference on Artificial Intelligence},
- year = 2021,
- month = feb,
- url = {https://www.merl.com/publications/TR2021-010}
- }
, - "Recovering Trajectories of Unmarked Joints in 3D Human Actions Using Latent Space Optimization", IEEE Winter Conference on Applications of Computer Vision (WACV), January 2021.BibTeX TR2021-004 PDF
- @inproceedings{Lohit2021jan,
- author = {Lohit, Suhas and Anirudh, Rushil and Turaga, Pavan},
- title = {Recovering Trajectories of Unmarked Joints in 3D Human Actions Using Latent Space Optimization},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2021,
- month = jan,
- url = {https://www.merl.com/publications/TR2021-004}
- }
, - "Generative Patch Priors for Practical Compressive Image Recovery", IEEE Winter Conference on Applications of Computer Vision (WACV), January 2021.BibTeX TR2021-003 PDF
- @inproceedings{Anirudh2021jan,
- author = {Anirudh, Rushil and Lohit, Suhas and Turaga, Pavan},
- title = {Generative Patch Priors for Practical Compressive Image Recovery},
- booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
- year = 2021,
- month = jan,
- url = {https://www.merl.com/publications/TR2021-003}
- }
, - "Near-Infrared Imaging Photoplethysmography During Driving", IEEE Transactions on Intelligent Transportation Systems, DOI: 10.1109/TITS.2020.3038317, pp. 1-12, December 2020.BibTeX TR2020-161 PDF
- @article{Nowara2020dec,
- author = {Nowara, Ewa and Marks, Tim and Mansour, Hassan and Veeraraghavan, Ashok},
- title = {Near-Infrared Imaging Photoplethysmography During Driving},
- journal = {IEEE Transactions on Intelligent Transportation Systems},
- year = 2020,
- pages = {1--12},
- month = dec,
- doi = {10.1109/TITS.2020.3038317},
- url = {https://www.merl.com/publications/TR2020-161}
- }
, - "Symbiotic Graph Neural Networks for 3D Skeleton-based Human Action Recognition and Motion Prediction", IEEE Transactions on Pattern Analysis and Machine Intelligence, December 2020.BibTeX TR2020-166 PDF
- @article{Chen2020dec,
- author = {Li, Maosen and Chen, Siheng and Chen, Xu and Zhang, Ya and Wang, Yanfeng and Tian, Qi},
- title = {Symbiotic Graph Neural Networks for 3D Skeleton-based Human Action Recognition and Motion Prediction},
- journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
- year = 2020,
- month = dec,
- url = {https://www.merl.com/publications/TR2020-166}
- }
, - "Interactive Tactile Perception for Classification of Novel Object Instances", IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), DOI: 10.1109/IROS45743.2020.9341795, November 2020, pp. 9861-9868.BibTeX TR2020-143 PDF
- @inproceedings{Corcodel2020nov,
- author = {Corcodel, Radu and Jain, Siddarth and van Baar, Jeroen},
- title = {Interactive Tactile Perception for Classification of Novel Object Instances},
- booktitle = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
- year = 2020,
- pages = {9861--9868},
- month = nov,
- publisher = {IEEE},
- doi = {10.1109/IROS45743.2020.9341795},
- url = {https://www.merl.com/publications/TR2020-143}
- }
, - "Sound2Sight: Generating Visual Dynamics from Sound and Context", European Conference on Computer Vision (ECCV), Vedaldi, A., Bischof, H., Brox, Th., Frahm, J.-M., Eds., August 2020.BibTeX TR2020-121 PDF Software
- @inproceedings{Cherian2020aug,
- author = {Cherian, Anoop and Chatterjee, Moitreya and Ahuja, Narendra},
- title = {Sound2Sight: Generating Visual Dynamics from Sound and Context},
- booktitle = {European Conference on Computer Vision (ECCV)},
- year = 2020,
- editor = {Vedaldi, A., Bischof, H., Brox, Th., Frahm, J.-M.},
- month = aug,
- publisher = {Springer},
- url = {https://www.merl.com/publications/TR2020-121}
- }
,
- "Fusion-Based Image Correlations Framework For Strain Measurement", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), February 2021.
-
Videos
-
Towards Human-Level Learning of Complex Physical Puzzles
-
3D Object Discovery and Modeling Using Single RGB-D Images Containing Multiple Object Instances
-
Joint 3D Reconstruction of a Static Scene and Moving Objects
-
Direct Multichannel Tracking
-
FoldingNet: Interpretable Unsupervised Learning on 3D Point Clouds
-
FasTFit: A fast T-spline fitting algorithm
-
CASENet: Deep Category-Aware Semantic Edge Detection
-
Object Detection and Tracking in RGB-D SLAM via Hierarchical Feature Grouping
-
Pinpoint SLAM: A Hybrid of 2D and 3D Simultaneous Localization and Mapping for RGB-D Sensors
-
Action Detection Using A Deep Recurrent Neural Network
-
Saffron - Digital Type System
-
Sapphire - High Accuracy NC Milling Simulation
-
MERL Research on Autonomous Vehicles
-
Dose optimization for particle beam therapy
-
3D Reconstruction
-
Robot Bin Picking
-
Semantic Scene Labeling
-
Obstacle Detection
-
Deep Hierarchical Parsing for Semantic Segmentation
-
Global Local Face Upsampling Network
-
Gaussian Conditional Random Field Network for Semantic Segmentation
-
Fast Graspability Evaluation on Single Depth Maps for Bin Picking with General Grippers
-
Point-Plane SLAM for Hand-Held 3D Sensors
-
Tracking an RGB-D Camera Using Points and Planes
-
Fast Plane Extraction in Organized Point Clouds Using Agglomerative Hierarchical Clustering
-
Calibration of Non-Overlapping Cameras Using an External SLAM System
-
Voting-Based Pose Estimation for Robotic Assembly Using a 3D Sensor
-
Fast Object Localization and Pose Estimation in Heavy Clutter for Robotic Bin Picking
-
Learning to rank 3D features
-
-
Software Downloads
-
Generating Visual Dynamics from Sound and Context
-
Adversarially-Contrastive Optimal Transport
-
MotionNet
-
Contact-Implicit Trajectory Optimization
-
FoldingNet++
-
Landmarks’ Location, Uncertainty, and Visibility Likelihood
-
Gradient-based Nikaido-Isoda
-
Circular Maze Environment
-
Discriminative Subspace Pooling
-
Kernel Correlation Network
-
Fast Resampling on Point Clouds via Graphs
-
FoldingNet
-
Joint Geodesic Upsampling
-
Plane Extraction using Agglomerative Clustering
-