System Identification for Video Texture

The top images at left are taken from a synthetic video of trees blowing in the wind. The rightmost image shows the difference between the two synthesized images, highlighting the sway of branches and the roll of the clouds. The video has completely realistic visual and dynamic texture, demonstrating the correctness of our novel system identification algorithm for high-dimensional sources such as video. The bottom image shows a temporal coding of the motion; each horizontal stripe indicates the independent swaying of a branch or motion of a segment of cloud. Only the first 1/3 of the motion is real; the rest is synthesized.

Background & Objective:  System identification is the problem of finding a model that fits data well enough to generate synthetic datapoints that are indistinguishable from real ones. We consider the problem of identifying a system that can synthesize realistic-looking  video given a short training sequence. We conjecture that a linear dynamical system (a.k.a. Kalman filter) will suffice for a large range of natural phenomena, for example videos of rain, ocean waves, plants swaying in the breeze, the sky, facial motion, waterfalls, etc. The goal is to identify the system and its dimensionality, then solve for its parameters. The problem is that video images are extremely high-dimensional sources, while the phenomenon being modeled is probably generated by a low-dimensional system.

Technical Discussion:  A linear dynamic system that generates observations Y=[y(1),...,y(T)] is defined  y(t)=C*x(t)+D*e(t)+v(t); x(t)=A*x(t)+B*e(t)+u(t) where A is the system evolution matrix, B is the input matrix, C is the observation generating matrix, D is the feed-through matrix, X=[x1...xT] is the "hidden" state, and u and v are noise sources. We treat system identification (solving for A,B,C,D and the covariance of u and v) as a least-squares factorization problem and solve it explicitly. While not optimized for simulation and control, the solution does have excellent synthesis properties and is highly suitable for video and audio synthesis.

Contact:  Matthew Brand

Publications:
Brand, M.E., "Subspace Mappings for Image Sequences", Statistical Methods in Video Processing, June 2002 (Final Program, TR2002-025)

Technology Areas:
Computer Vision
Audio Video Processing
Artificial Intelligence
Graphics

Modification Date:  July 23, 2003