TR2016-114

Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection

- Hayashi, T., Watanabe, S., Toda, T., Hori, T., Le Roux, J., Takeda, K., "Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection", Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), September 2016, pp. 35-39.
  BibTeX TR2016-114 PDF
  - @inproceedings{Hayashi2016sep,
  - author = {Hayashi, Tomoki and Watanabe, Shinji and Toda, Tomoki and Hori, Takaaki and {Le Roux}, Jonathan and Takeda, Kazuya},
  - title = {{Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection}},
  - booktitle = {Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE)},
  - year = 2016,
  - pages = {35--39},
  - month = sep,
  - url = {https://www.merl.com/publications/TR2016-114}
  - }
MERL Contact:
- Jonathan
  Le Roux
Research Areas:

Artificial Intelligence, Speech & Audio

Abstract:

In this study, we propose a new method of polyphonic sound event detection based on a Bidirectional Long Short-Term Memory Hidden Markov Model hybrid system (BLSTM-HMM). We extend the hybrid model of neural network and HMM, which achieved stateof-the-art performance in the field of speech recognition, to the multi-label classification problem. This extension provides an explicit duration model for output labels, unlike the straightforward application of BLSTM-RNN. We compare the performance of our proposed method to conventional methods such as non-negative matrix factorization (NMF) and standard BLSTM-RNN, using the DCASE2016 task 2 dataset. Our proposed method outperformed conventional approaches in both monophonic and polyphonic tasks, and finally achieved an average F1 score of 67.1 % (error rate of 64.5 %) on the event-based evaluation, and an average F1-score of 76.0 % (error rate of 50.0 %) on the segment-based evaluation.

MERL Contact:

JonathanLe Roux

Research Areas:

Abstract:

Jonathan
Le Roux