Blind Summarization: Content-Adaptive Video Summarization Using Time-Series Analysis
| Citation: |
Divakaran, A.; Radhakrishnan, R.; Peker, K.A., "Blind Summarization: Content-Adaptive Video Summarization Using Time-Series Analysis", SPIE Conference Multimedia Content Analysis, Management and Retrieval, Vol. 6073, pp. 6-10, January 2006 (SPIE Proceedings) |
| MERL Report: | TR2006-026 |
Severe complexity constraints on consumer electronic devices motivate us to investigate general-purpose video summaarization techniques that are able toa pply a common hardware setup to multiple content genres. On the other hand, we know that high quality summaries can only be produced with domain-specific processing. In this paper, we present a time-series analysis based video summarization technique that provides a general core to which we are able to add small content-specific extensions for each genre. The proposed time-series analysis technique consists of unsupervised clustering of samples taken through sliding windows from the time series of features obtained from the content. We classify content into two broad categories, scripted content such as news and drama, and unscripted content such as sports and surveillance. The summarization problem then reduces to finding either finding semantic boundaries of the scripted content or detecting highlights in the unscripted content. The proposed technique is essentially and event detection technique and it thus best suited to uncripted content, however, we also find applications to scripted content. We thoroughly examine the trade-off between content-neutral and content-specific processing for effective summarization for a number of genres, and find that our core technique enables us to minimize the complexity of the content-specific processing and to postpone it to the final stage. We achieve the best results with unscripted content such as sports and surveillance video in terms of quality of summaries and minimizing content-specific processing. For other genres such as drama, we find that more content-specific processing is required. We also find that judicious choice of key audio-visual object detectors enables us to minimize the complexity of the content-specific processing while maintaining its applicability to a broad range of genres. We will present a demonstration of our proposed technique at the conference.