TR2005-078

Layered Dynamic Mixture Model for Pattern Discovery in Asynchronous Multi-Modal Streams


    •  Xie, L.; Kennedy, L.; Chang, S.-F.; Divakaran, A.; Sun, H.; Lin, C.-Y., "Layered Dynamic Mixture Model for Pattern Discovery in Asynchronous Multi-Modal Streams", IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), ISSN: 1520-6149, March 2005, vol. 2, pp. 1053-1056.
      BibTeX Download PDF
      • @inproceedings{Xie2005mar,
      • author = {Xie, L. and Kennedy, L. and Chang, S.-F. and Divakaran, A. and Sun, H. and Lin, C.-Y.},
      • title = {Layered Dynamic Mixture Model for Pattern Discovery in Asynchronous Multi-Modal Streams},
      • booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
      • year = 2005,
      • volume = 2,
      • pages = {1053--1056},
      • month = mar,
      • issn = {1520-6149},
      • url = {http://www.merl.com/publications/TR2005-078}
      • }
  • MERL Contact:
  • Research Area:

    Multimedia


We propose a layered dynamic mixture model for asynchronous multi-modal fusion for unsupervised pattern discovery in video. The lower layer of the model uses generative temporal structures such as a hierarchical hidden Markov model to convert the audio-visual streams into mid-level labels, it also models the correlations in text with probabilistic latent semantic analysis. The upper layer fuses the statistical evidence across diverse modalities with a flexible meta-mixture model that assumes loose temporal correspondence. Evaluation on a large news database shows that multi-modal clusters have better correspondence to news topics than audio-visual clusters alone; novel analysis techniques suggest that meaningful clusters occur when the prediction of salient features by the model concurs with those shown in the story clusters.