Mitsubishi Electric Research Laboratories

Video Object Segmentation

Main purpose of video segmentation is to enable content-based representation by extracting objects of interest from a series of consecutive video frames. It is also a key to many robotic vision applications. Most vision based autonomous vehicles acquire information on their surroundings by analyzing video. Particularly, it is required for high-level image understanding and scene interpretation such as spotting and tracking of special events in surveillance video. For instance, pedestrian and highway traffic can be regularized using density evaluations obtained by segmenting people and vehicles. By object segmentation, speeding and suspicious moving cars, road obstacles, strange activities can be detected. Forbidden zones, parking lots, elevators can be monitored automatically. Gesture recognition as well as visual biometric extraction can be done for user interfaces. We developed a novel algorithm for automatic and reliable segmentation of moving objects in color video sequences and extraction of video object planes. A set of object descriptors are proposed to establish the relation between the different video objects hierarchically.

Background & Objective:  Our method has several advantages over the conventional techniques; it is automatic, computationally efficient, extracts object shape precisely, generates a multi-resolution object tree to expedite content analysis, and able to incorporate priori information.

Technical Discussion:  After filtering the input video, markers are selected. Markers serve as the seeds of volumes. A volume is defined as the aggregation of video object planes of the same object in every frame of the sequence. Using the local color and texture characteristics, a volume is grown around each marker. The grown volumes are refined and motion trajectories are extracted. Self-descriptors for each volume, mutual-descriptors for a pair of volumes are computed from trajectories. These descriptors designed to capture motion, shape as well as spatial information of volumes. In the clustering stage, volumes are merged into objects by evaluating their descriptors. Iterative clustering is carried out until the motion similarity of merged objects becomes small. After clustering, an object tree that gives the video object planes for every possible number of objects is obtained.

Technology Areas:
Audio Video Processing
Computer Vision

Modification Date:  September 12, 2007