Learning Concise Models of Visual Activity
We want machines to describe what is happening in video. The description should be readable and useful. The basic approach is to learn a probabilistic model of the scene dynamics. The machine extracts a motion-based feature vector from each frame, then learns signal dynamics in a hidden Markov model. There is a twist: A new entropic training algorithm removes excess parameters in the model, leaving a transition matrix so sparse that it can be read as a flowchart of the scene dynamics. Here is an example when the algorithm is applied to a half-hour of ambient video of office activity. The transition matrix is rendered automatically as a state graph. (Labels added by hand in the illustration.)
This normative signal model can be used to infer what you are doing (and adjust your environment accordingly) or detect unusual behavior. The model depicted above proved quite adept at detecting coffee buzz roughly an hour after several espressos.
Note that each state has only a few transition options; in a conventionally trained model, each state would be connected to every other state. The concision of our model translates to faster run-times, better generalization, and better classification.
Background & Objective: The training algorithm is based on an entropic prior and a solution for its maximum a posteriori estimator. The solution is quite general and yields entropy-minimization algorithms for estimating the structure and parameters of many kinds of probabilistic models. These E-MAP algorithms are fast and guaranteed to improve the model as much as possible at each step.
Technical Discussion: Entropic estimation was first proposed in Structure learning in conditional probability models via an entropic prior and parameter extinction. (M. Brand, 19oct97, revised 16apr98. To appear, Neural Computation). For a concise description, see An entropic estimator for structure discovery. (M. Brand, 2feb98. To appear, NIPS98). The framework is generalized to include deterministic annealing in Pattern discovery via entropy minimization. (M. Brand, 8mar98. To appear, Uncertainty99, AI & Statistics).
| Technical Reports: | |
| Learning concise models of human activity from ambient video via a structure-inducing M-step estimator | |
Technology Areas:
Computer Vision
Artificial Intelligence
Modification Date: July 12, 2001
