Video Mining Using Combinations of Unsupervised and Supervised Learning Techniques
| Citation: |
Divakaran, A.; Miyaraha, K.; Peker, K.A.; Radhakrishnan, R.; Xion, Z., "Video Mining Using Combinations of Unsupervised and Supervised Learning Techniques", SPIE Conference on Storage and Retrieval for Multimedia Databases, Vol. 5307, pp. 235-243, January 2004 (SPIE Proceedings) |
| MERL Report: | TR2004-007 |
We discuss the meaning and significance of the video mining problem, and present our work on some aspects of video mining. A simple definition of video mining is unsupervised discovery of patterns in audio-visual content. Such purely unsupervised discovery is readily applicable to video surveillance as well as to consumer video browsing applications. We interpret video mining as content-adaptive or blind content processing, in which the first stage is content characterization and the second stage is event discovery based on the characterization obtained in stage 1. We discuss the target applications and find that using a purely unsupervised approach is too computationally complex and generally unmanageable to be implemented on our product platform. We then describe various combinations of unsupervised and supervised learning techniques that help discover patterns that are useful to the end-user of the application. We target consumer video browsing applications such as commercial message detection, sports highlights extraction etc. We employ both audio and video features. We find that supervised audio classification combined with unsupervised unusual event discovery enables accurate supervised detection of desired events. Our techniques are computationally simple and robust to common variations in production styles etc.