TR2004-128

Discovering Meaningful Multimedia Patterns with Audio-Visual Concepts and Associated Text


    •  Xie, L.; Kennedy, L.; Chang, S.-F.; Divakaran, A.; Sun, H.; Lin, C.-Y., "Discovering Meaningful Multimedia Patterns with Audio-Visual Concepts and Associated Text", IEEE International Conference on Image Processing (ICIP), ISSN: 1522-4880, October 2004, vol. 4, pp. 2383-2386.
      BibTeX Download PDF
      • @inproceedings{Xie2004oct,
      • author = {Xie, L. and Kennedy, L. and Chang, S.-F. and Divakaran, A. and Sun, H. and Lin, C.-Y.},
      • title = {Discovering Meaningful Multimedia Patterns with Audio-Visual Concepts and Associated Text},
      • booktitle = {IEEE International Conference on Image Processing (ICIP)},
      • year = 2004,
      • volume = 4,
      • pages = {2383--2386},
      • month = oct,
      • issn = {1522-4880},
      • url = {http://www.merl.com/publications/TR2004-128}
      • }
  • MERL Contact:
  • Research Area:

    Multimedia


This paper presents algorithms for finding the meanings of the audio-visual video patterns obtained in the unsupervised discovery process. This problem is interesting in domains where neither perceptual patterns nor semantic concepts have simple structures. The patterns in the video are modeled with hierarchical hidden Markov models, with efficient algorithms to jointly learn the model parameters, the optimal model complexity, as well as the relevant feature subsets. The meanings are contained in words of the speech transcript of the video. The pattern-word association is obtained via co-occurrence analysis and machine translation models. Promising results are obtained on TRECVID news videos: video patterns that associate with distinct topics such as el-nino and politics are itentified; a temporal structure model compares favorably to a non-temporal clustering algorithm.