Mitsubishi Electric Research Laboratories

Multilinear Face Models

A multilinear face model is a simple function that can generate a wide variety of realistic face images. The images can vary by many attributes, including identity, expression, speech articulation, and 3D pose.  We have developed all the tools needed to estimate such a model and use it to compress and edit video of faces, for example, changing the appearance and performance of an actor in existing video. The image illustrates these tools being used to combine expression, identity, viseme, and 3D pose from the two inset videos, synthesizing the central video.

Background & Objective:  It has long been conjectured that the appearance, expression, mouth shape, pose, etc. of a photographed face could be encoded with a relatively short vector of numbers yet reconstructed with high accuracy. Finding such an encoding is essential for extreme video compression, biometrics, visual user interfaces, etc. We noted that multilinear functions could provide a parameterization of faces that gives the editor separable control over attributes such as appearance, expression, age, etc. Thus one could preserve the fact that every individual smiles in a unique manner, and guarantee that the coding of the smile is the same across all individuals. This would open up interesting applications in film post-production, foreign language dubbing, and surgical modeling. Therefore we set out to develop methods to estimate multilinear models from facial scan data, to recover multilinear facial parameters from ordinary video, and to resynthesize such videos with interesting changes.

Technical Discussion:  We estimated a multilinear model from a set of high-resolution 3D face scans taken of several people in various expressions and speech articulations. Because a standard estimator would require a scan of every individual in every expression and articulation, we developed a fast imputative method to estimate multilinear models from incomplete data.  We then incorporated the model into an optical flow based visual tracker that estimates identity, 3D pose, expression, etc., from ordinary video. This encodes the facial area of the video, and also gives a detailed estimate of the 3D shape of the face in every frame. We can then mix and match attributes from several such parameter streams, use the multilinear model to obtain the corresponding 3D shape and texture, and render the results back into the video.

Technology Areas:
Computer Vision
Graphics

Modification Date:  September 12, 2007