Acoustic Doppler for Denoising Speech Signals

Acoustic Doppler readings provide measurements of the movements of a talker's face. These measurements will typically not be corrupted by the same noise sources that may corrupt a speech signal. They may hence be utilized to provide secondary evidence that can be used to denoise speech signals.

Background & Objective:  Speech-based devices and applications such as cellphones and kiosks are frequently used in very noisy environments. Denoising techniques that directly work on the speech signal are often ineffective in these environments. Their performance can be greatly enhanced through the use of secondary sensors that measure other characteristics of the speech that do not get affected by the noise. Such secondary sensors are, however, highly expensive.  Our goal is to develop an inexpensive, but effective secondary sensing mechanism through Acoustic Doppler to effect the denoising.

Technical Discussion:  Our goal is to denoise speech signal for improved coding, transmission, recognition etc.  Any speech activity is accompanied by corresponding movement of facial features such as lips, cheeks etc. These movements are correlated with the speech signal. Any measurements of these movements that are not corrupted by any noise that is correlated to the noise that corrupts the speech signal can hence be used to restrict the space of possible values for parameters derived from the speech signal. These restrictions can further be employed for improved denoising of the speech signal.  We derive our measurements of the movements of the talker's face through a acoustic Doppler radar. We incident a 40Khz tone on the talker and capture the reflections. These reflections are FM demodulated. Finally thet joint distribution of the Doppler and audio signals is modelled by a time-series model. On noisy data, predictions of the value of audio features are obtained from the Doppler and used to denoise the speech.

Outside Collaborations:  We are currently collaborating with MIT to develop portable devices that heterodyne high-frequency Doppler signals into audio-range frequencies, such that signals can be captured using conventional sound cards on a PC. MIT will deploy our technology in information kiosks at the Stata Center.

Future Direction:  We are in the process of developing more detailed models for the joint distribution of speech and Doppler. Future work will be the completion of this task and the development of a real-time denoising mechanism using the Doppler sensor.

Contact:  Bhiksha Raj

Technology Area:  Multimedia

Modification Date:  September 12, 2007