Mitsubishi Electric Research Laboratories

Data-mining and Recommending

The Instant Movie Recommender (IMR) predicts how you would rate over 1000 movies on the basis of your ratings of a few familiar ones. As you vary the rating of one movie, all the other movies are instantly re-ranked and re-sorted. Predictions are made on the basis of correlations between ratings by previous users -- people who like "X" tend not to like "Y", etc. The IMR "learns" from your ratings -- updating the correlational model "on the fly". This update is a form of data mining, but with substantially better economics and usability than the standard practice of warehousing and batch-processing data.  The IMR showcases a new technology for fast updating of a "thin" Singular Value Decomposition -- a decomposition of tabular data into simple factors. The technology is distinguished both by its speed -- it is the first linear-time single-pass algorithm -- and by its ability to handle tables with many missing elements -- a common problem in data mining.

Background & Objective:  The SVD forms the core of many data mining algorithms and thousands of algorithms in signal processing. The "thin" SVD decomposes tabular data into the product of two small matrices, and is very useful for data compression, noise suppression, and prediction. For very large tables, computing an SVD is impractical because the compute time grows quadratically with the size of the table. Worse, the SVD is not uniquely defined if the table is missing some entries. We set out to develop an SVD algorithm that is suitable for data sets that are far too large to fit into the computer's memory and that are missing many elements.

Technical Discussion:  The new Incremental Imputative SVD (IISVD) allows an SVD to be computed from streaming data. The data table arrives one row or column at a time, and the SVD is updated to reflect the newly arrived information. The data need not be stored. If the row is missing entries, the algorithm chooses values that are most consistent with the correlational structure exhibited in previous updates. This is how the IMR predicts ratings of unseen movies. Equally fast rank-1 SVD updates allow users to change or retract their ratings. Experiments with a dataset used by the data mining community for benchmarking indicate that the IMR is quite accurate, making predictions within 2 rating points of the "true rating" more than 99% of the time.

Publications:
Brand, M., "Fast Online SVD Revisions for Lightweight Recommender Systems", SIAM International Conference on Data Mining (SDM), May 2003 (SDM 2003, TR2003-014)

Brand, M.E., "Incremental Singular Value Decomposition of Uncertain Data with Missing Values", European Conference on Computer Vision (ECCV), Vol 2350, pps 707-720, May 2002 (Lecture Notes in Computer Science, TR2002-024)

Technology Areas:
Artificial Intelligence
Computer Vision
Net Services

Modification Date:  February 9, 2005