Mitsubishi Electric Research Laboratories

A Fast Algorithm for Depth Segmentation

This document presents a fast algorithm for depth segmentation. The algorithm uses pre-computed disparity maps to detect regions of the scene that do not have a predetermined depth. The form of the algorithm allows it to take advantage of the single instruction, multiple data (SIMD) instruction set extensions that have recently become commonly available in consumer-grade microprocessors. We also present the application of the algorithm to the problem of detecting contact between a foreground object (a hand) and a geometrically static background (a display surface) in real-time video sequences. Due to the geometric nature of the segmentation algorithm, the touch detector is invariant to lighting, color, and motion. It therefore is applicable to interactive projected displays. A simple extension to the algorithm allows for the background surface to be given analytically.

Background & Objective:  The increasing availability of cheap cameras and high-quality projectors is enabling a wide array of novel interactive space applications. These interfaces present a particular challenge for computer vision algorithms since surfaces lit by bright, high-contrast projection displays may have a continuously changing visual appearance. This means that standard background subtraction techniques that relay on static background appearance will not work. Even worse, front projected displays also cast light on foreground objects, making color tracking and other appearance-based methods difficult or impossible to use.

Technical Discussion:  One constraint that remains in these situations is the geometric configuration of the projection surfaces. A feature that exposes this constraint is stereo disparity. It is possible to utilize disparity for segmentation without computing a dense depth map by instead employing pre-computed disparity maps to rectify the input images prior to a direct image subtraction. An example application called TouchIt detects touch events between foreground objects (the user's hand for example) and the background surface. TouchIt uses two depth segmentation maps and simple logical operations to combine them into a touch map. We are able to generate 320x240 segmentation maps in 10ms on a 1GHz Intel PIII. This allows the TouchIt application to run in real time on an off-the-shelf PC.

Recent improvements on this scheme include the ability to generate depth maps from mathematically specified surfaces. This enable the application of the algorithm to situations where it would be difficult to construct suitable calibration surfaces.

Publications:
Christopher R. Wren, Yuri A. Ivanov, "Volumetric Operations with Surface Margins", Computer Vision and Pattern Recognition Conference: Technical Sketches, 12/11/2001 (TR2001-047)

Technology Areas:
Computer Vision
Audio Video Processing

Modification Date:  November 18, 2008