TR2015-153

A sampling-based speaker clustering using utterance-oriented Dirichlet process mixture model and its evaluation on large scale data


    •  Tawara, N., Ogawa, T., Watanabe, S., Nakamura, A., Kobayashi, T., "A Sampling-Based Speaker Clustering Using Utterance-Oriented Dirichlet Process Mixture Model and Its Evaluation on Large Scale Data", APSIPA Transactions on Signal and Information Processing, DOI: 10.1017/​ATSIP.2015.19, Vol. 4, October 2015.
      BibTeX TR2015-153 PDF
      • @article{Tawara2015oct,
      • author = {Tawara, N. and Ogawa, T. and Watanabe, S. and Nakamura, A. and Kobayashi, T.},
      • title = {A Sampling-Based Speaker Clustering Using Utterance-Oriented Dirichlet Process Mixture Model and Its Evaluation on Large Scale Data},
      • journal = {APSIPA Transactions on Signal and Information Processing},
      • year = 2015,
      • volume = 4,
      • month = oct,
      • doi = {10.1017/ATSIP.2015.19},
      • issn = {2048-7703},
      • url = {https://www.merl.com/publications/TR2015-153}
      • }
  • Research Areas:

    Artificial Intelligence, Speech & Audio

Abstract:

An infinite mixture model is applied to model-based speaker clustering with sampling-based optimization to make it possible to estimate the number of speakers. For this purpose, a framework of nonparametric Bayesian modeling is implemented with the Markov chain Monte Carlo (MCMC) and incorporated in the utterance-oriented speaker model. The proposed model is called the utterance-oriented Dirichlet process mixture model (UO-DPMM). The present paper demonstrates that UO-DPMM is successfully applied on large-scale data and outperforms the conventional hierarchical agglomerative clustering, especially for large amounts of utterances.