TR2016-001

Sequence Summarizing Neural Network for Speaker Adaptation

- Vesely, K., Watanabe, S., Zmolikova, K., Karafiat, M., Burget, L., Cernocky, J.H., "Sequence Summarizing Neural Network for Speaker Adaptation", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP.2016.7472692, March 2016, pp. 5315-5319.
  BibTeX TR2016-001 PDF
  - @inproceedings{Vesely2016mar,
  - author = {Vesely, Karel and Watanabe, Shinji and Zmolikova, Katerina and Karafiat, Martin and Burget, Lukas and Cernocky, Jan, Honza},
  - title = {{Sequence Summarizing Neural Network for Speaker Adaptation}},
  - booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
  - year = 2016,
  - pages = {5315--5319},
  - month = mar,
  - doi = {10.1109/ICASSP.2016.7472692},
  - url = {https://www.merl.com/publications/TR2016-001}
  - }
Research Areas:

Artificial Intelligence, Speech & Audio

Abstract:

In this paper, we propose a DNN adaptation technique, where the i-vector extractor is replaced by a Sequence Summarizing Neural Network (SSNN). Similarly to i-vector extractor, the SSNN produces a "summary vector", representing an acoustic summary of an utterance. Such vector is then appended to the input of main network, while both networks are trained together optimizing single loss function. Both the i-vector and SSNN speaker adaptation methods are compared on AMI meeting data. The results show comparable performance of both techniques on FBANK system with frame classification training. Moreover, appending both the i-vector and "summary vector" to the FBANK features leads to additional improvement comparable to the performance of FMLLR adapted DNN system.

Related News & Events

NEWS MERL researchers present 12 papers at ICASSP 2016
Date: March 20, 2016 - March 25, 2016
Where: Shanghai, China
MERL Contacts: Petros T. Boufounos; Chiori Hori; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Philip V. Orlik; Anthony Vetro
Research Areas: Computational Sensing, Digital Video, Speech & Audio, Communications, Signal Processing
Brief
- MERL researchers have presented 12 papers at the recent IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), which was held in Shanghai, China from March 20-25, 2016. ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing, with more than 1200 papers presented and over 2000 participants.

Research Areas:

Abstract: