TR99-04

Separating style and content with bilinear models


    •  Joshua B. Tenenbaum, William T. Freeman, "Separating style and content with bilinear models", Tech. Rep. TR99-04, Mitsubishi Electric Research Laboratories, Cambridge, MA, January 1999.
      BibTeX TR99-04 PDF
      • @techreport{MERL_TR99-04,
      • author = {Joshua B. Tenenbaum, William T. Freeman},
      • title = {Separating style and content with bilinear models},
      • institution = {MERL - Mitsubishi Electric Research Laboratories},
      • address = {Cambridge, MA 02139},
      • number = {TR99-04},
      • month = jan,
      • year = 1999,
      • url = {https://www.merl.com/publications/TR99-04/}
      • }
  • Research Areas:

    Artificial Intelligence, Computer Vision, Machine Learning

Abstract:

PERCEPTUAL systems routinely separate \"content\" from \"style\", classifying familiar words spoken in an unfamiliar accent, identifying a font or handwriting style across letters, or recognizing a familiar face or object seen under unfamiliar viewing conditions. Yet a general and tractable computational model of this ability to untangle the underlying factors of perceptual observations remains elusive. Existing factor models are either insufficiently rich to capture the complex interactions of perceptually meaningful factors such as phoneme and speaker accent or letter and font, or do not allow efficient learning algorithms. Here we show how perceptual systems may learn to solve these crucial tasks using surprisingly simple bilinear models. We report promising results in three realistic perceptual domains: spoken vowel classification with a benchmark multi-speaker database, extrapolation of fonts to unseen letters, and translation of faces to novel illuminants.