Student-Teacher Network Learning with Enhanced Features

    •  Watanabe, S., Hori, T., Le Roux, J., Hershey, J.R., "Student-Teacher Network Learning with Enhanced Features", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), March 2017.
      BibTeX TR2017-011 PDF
      • @inproceedings{Watanabe2017mar,
      • author = {Watanabe, Shinji and Hori, Takaaki and Le Roux, Jonathan and Hershey, John R.},
      • title = {Student-Teacher Network Learning with Enhanced Features},
      • booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
      • year = 2017,
      • month = mar,
      • url = {}
      • }
  • MERL Contact:
  • Research Areas:

    Artificial Intelligence, Speech & Audio


Recent advances in distant-talking ASR research have confirmed that speech enhancement is an essential technique for improving the ASR performance, especially in the multichannel scenario. However, speech enhancement inevitably distorts speech signals, which can cause significant degradation when enhanced signals are used as training data. Thus, distant-talking ASR systems often resort to using the original noisy signals as training data and the enhanced signals only at test time, and give up on taking advantage of enhancement techniques in the training stage. This paper proposes to make use of enhanced features in the student-teacher learning paradigm. The enhanced features are used as input to a teacher network to obtain soft targets, while a student network tries to mimic the teacher network's outputs using the original noisy features as input, so that speech enhancement is implicitly performed within the student network. Compared with conventional student-teacher learning, which uses a better network as teacher, the proposed selfsupervised method uses better (enhanced) inputs to a teacher. This setup matches the above scenario of making use of enhanced features in network training. Experiments with the CHiME-4 challenge real dataset show significant ASR improvements with an error reduction rate of 12% in the single-channel track and 15% in the 2-channel track, respectively, by using 6-channel beamformed features for the teacher model.


  • Related News & Events

    •  NEWS    MERL to present 10 papers at ICASSP 2017
      Date: March 5, 2017 - March 9, 2017
      Where: New Orleans
      MERL Contacts: Petros T. Boufounos; Jonathan Le Roux; Dehong Liu; Hassan Mansour; Anthony Vetro; Ye Wang
      Research Areas: Computer Vision, Computational Sensing, Digital Video, Information Security, Speech & Audio
      • MERL researchers will presented 10 papers at the upcoming IEEE International Conference on Acoustics, Speech & Signal Processing (ICASSP), to be held in New Orleans from March 5-9, 2017. Topics to be presented include recent advances in speech recognition and audio processing; graph signal processing; computational imaging; and privacy-preserving data analysis.

        ICASSP is the flagship conference of the IEEE Signal Processing Society, and the world's largest and most comprehensive technical conference focused on the research advances and latest technological development in signal and information processing. The event attracts more than 2000 participants each year.