Predictive Modeling

Identification of accurate predictive models from collected data.

Our research on predictive modeling of systems and processes from collected sensor data is based on statistical machine learning algorithms that must operate under demanding industrial conditions: large data volumes, streaming measurements, multiple data sources, and often on embedded and distributed computation devices. The unifying purpose of these models is to enable fast and accurate prediction of outputs and effects resulting from the application of controllable decision variables as well as uncontrollable operating conditions, for the purpose of finding the most optimal settings for the decision variables. Examples of such predictive models of physical systems are thermodynamic models of air conditioners, entire buildings, electromagnetic fields, electrical motors, etc. Other models predict variables corresponding to diverse natural and man-made phenomena and processes, such as demand for electricity, travel times on highways, arrival rates of passengers for trains and elevators, ambient temperature and humidity, document flows in enterprise software systems. A third kind of models corresponds to aspects of human behavior, such as the likelihood of initiating an interaction with a machine, selecting a product, or liking a book or a film.

To address these widely different modeling problems, the DA group has developed various multi-purpose algorithms for predictive modeling. We have proposed system identification algorithms for physical systems, both physically-motivated and purely data driven, that can re-estimate system models continuously as new data measurements are collected. Our exemplar learning algorithms address the problem of model identification in limited memory on embedded devices. We have applied support vector machine learning algorithms to power demand and highway travel time prediction, as well as equipment condition monitoring, and have enhanced them to handle seasonality and heteroscedasticity in the measured time series. Various dimensionality reduction, feature selection, and variable partitioning algorithms have been employed to process large-volume data sets, now commonly known as Big Data.