Sound Spotter: Recognition and Extraction from Mixed Audio
Sound Spotter is a system for analysis and tracking of individual sound sources in mixed audio scenes. This technology has applications in artificial hearing and audio content description. Additionally, the core technology has been included in the MPEG-7 standard for indexing and extraction of audio content in digital media applications.
Background & Objective: Audio recognition systems, such as ASR and Internet audio search engines, require that a sound signal is isolated from all other sounds, such as noise in the environment. But sounds in the real world occur in context such as shown in the figure above; a news broadcast is delivered against a background of screams, gunshots and crowd noise. The Sound Spotter isolates and separates independent source sounds from the mixture; in this case the speech and the screams are separated. This technique may enable the use of ASR in non-ideal environments; e.g. a conversations in a busy street or office, or recognition of actors' voices in a film sound track. Identification and extraction of non-speech sounds, such as musical instruments or sound effects, is also enabled using this method.
Technical Discussion: Sound Spotter employs a higher-order statistics framework, using ICA, in order to decompose the output of an auditory filter-bank into statistically independent features. These features are organized into clusters using mean-field theory annealing. The resulting clusters are groups of sound features that correspond to persistent underlying sources: such as a segment of speech or a gunshot. One major advantage of the technique is that a single mixed audio channel is sufficient for successful operation. This contrasts with blind signal separation algorithms that generally require multi-channel input and are thus limited in application scope.
Contact: Bent Schmidt-Nielsen
Technology Area: Audio Video Processing
Modification Date: November 1, 2007
