TR2016-001

Sequence Summarizing Neural Network for Speaker Adaptation


    •  Vesely, K.; Watanabe, S.; Zmolikova, K.; Karafiat, M.; Burget, L.; Cernocky, J.H., "Sequence Summarizing Neural Network for Speaker Adaptation", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/ICASSP.2016.7472692, March 2016, pp. 5315-5319.
      BibTeX Download PDF
      • @inproceedings{Vesely2016mar,
      • author = {Vesely, K. and Watanabe, S. and Zmolikova, K. and Karafiat, M. and Burget, L. and Cernocky, J.H.},
      • title = {Sequence Summarizing Neural Network for Speaker Adaptation},
      • booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
      • year = 2016,
      • pages = {5315--5319},
      • month = mar,
      • doi = {10.1109/ICASSP.2016.7472692},
      • url = {http://www.merl.com/publications/TR2016-001}
      • }
  • Research Area:

    Speech & Audio


In this paper, we propose a DNN adaptation technique, where the i-vector extractor is replaced by a Sequence Summarizing Neural Network (SSNN). Similarly to i-vector extractor, the SSNN produces a "summary vector", representing an acoustic summary of an utterance. Such vector is then appended to the input of main network, while both networks are trained together optimizing single loss function. Both the i-vector and SSNN speaker adaptation methods are compared on AMI meeting data. The results show comparable performance of both techniques on FBANK system with frame classification training. Moreover, appending both the i-vector and "summary vector" to the FBANK features leads to additional improvement comparable to the performance of FMLLR adapted DNN system.