Cross-entropy Estimation

Statistical models lie at the heart of advanced technologies such as communications, sensing, language processing, and signal processing; they are important to any industrial process that depends on data collection and analysis. The better the model, the better the process that depends on it. We present a basic result on the optimal bias for learning these models from data, and practical algorithms that translate this result into superior statistical models. Immediate applications include better classifiers (e.g., for predicting disease from weakly diagnostic tests), and a method for manipulating stylistic qualities of data-sets such as those used for character animation.

Background & Objective:  When forming a statistical model from data, a density model generally contains the most information and can support the broadest variety of queries. A density model describes how the data-points are distributed through the space of possible measurements, including where they are dense. However, specialized models often outperform density models for limited kinds of queries such as classification, largely because it is much harder to estimate a good density model than a classifier.

Technical Discussion:  Density estimation is a kind of inference; every inference begins with a prior belief. We showed that the optimal prior belief for extracting information from data is a preference for the least uniform density that describes the distribution of the data well. Formally, one wants to minimize the entropy (uncertainty) of one's model of the data, or, equivalently, maximize its cross-entropy with the uniform (zero-information) density. In this light, we developed efficient algorithms for estimating a density T from data w while maximizing cross-entropy (a measure of difference) with another density Z (and/or the uniform density U). This is depicted in the figure above where T represents the manifold (surface) of all possible models available. These techniques have numerous applications, including estimating models that identify stylistic variations in datasets (see the "Style Machines" URL below), and estimating density models whose classification accuracy is competitive with the best current specialized classification methods. By maximizing the cross-entropy between density models for each class of data, we minimize their overlap and the probability of incorrectly classifying any points near their boundary.

Contact:  Matthew Brand

Technology Areas:
Artificial Intelligence
Graphics

Modification Date:  June 26, 2001