NEWS    MERL's breakthrough speech separation technology featured in Mitsubishi Electric Corporation's Annual R&D Open House

Date released: June 3, 2017


  •  NEWS    MERL's breakthrough speech separation technology featured in Mitsubishi Electric Corporation's Annual R&D Open House
  • Date:

    May 24, 2017

  • Where:

    Tokyo, Japan

  • Description:

    Mitsubishi Electric Corporation announced that it has created the world's first technology that separates in real time the simultaneous speech of multiple unknown speakers recorded with a single microphone. It's a key step towards building machines that can interact in noisy environments, in the same way that humans can have meaningful conversations in the presence of many other conversations. In tests, the simultaneous speeches of two and three people were separated with up to 90 and 80 percent accuracy, respectively. The novel technology, which was realized with Mitsubishi Electric's proprietary "Deep Clustering" method based on artificial intelligence (AI), is expected to contribute to more intelligible voice communications and more accurate automatic speech recognition. A characteristic feature of this approach is its versatility, in the sense that voices can be separated regardless of their language or the gender of the speakers. A live speech separation demonstration that took place on May 24 in Tokyo, Japan, was widely covered by the Japanese media, with reports by three of the main Japanese TV stations and multiple articles in print and online newspapers. The technology is based on recent research by MERL's Speech and Audio team.

    Links:
    Mitsubishi Electric Corporation Press Release
    MERL Deep Clustering Demo

    Media Coverage:

    Fuji TV, News, "Minna no Mirai" (Japanese)
    The Nikkei (Japanese)
    Nikkei Technology Online (Japanese)
    Sankei Biz (Japanese)
    EE Times Japan (Japanese)
    ITpro (Japanese)
    Nikkan Sports (Japanese)
    Nikkan Kogyo Shimbun (Japanese)
    Dempa Shimbun (Japanese)
    Il Sole 24 Ore (Italian)
    IEEE Spectrum (English).

  • MERL Contact:
  • Research Area:

    Speech & Audio

    •  Isik, Y., Le Roux, J., Chen, Z., Watanabe, S., Hershey, J.R., "Single-Channel Multi-Speaker Separation using Deep Clustering", Interspeech, DOI: 10.21437/​Interspeech.2016-1176, September 2016, pp. 545-549.
      BibTeX TR2016-073 PDF
      • @inproceedings{Isik2016sep,
      • author = {Isik, Yusuf and Le Roux, Jonathan and Chen, Zhuo and Watanabe, Shinji and Hershey, John R.},
      • title = {Single-Channel Multi-Speaker Separation using Deep Clustering},
      • booktitle = {Interspeech},
      • year = 2016,
      • pages = {545--549},
      • month = sep,
      • doi = {10.21437/Interspeech.2016-1176},
      • url = {https://www.merl.com/publications/TR2016-073}
      • }
    •  Hershey, J.R., Chen, Z., Le Roux, J., Watanabe, S., "Deep Clustering: Discriminative Embeddings for Segmentation and Separation", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), DOI: 10.1109/​ICASSP.2016.7471631, March 2016, pp. 31-35.
      BibTeX TR2016-003 PDF
      • @inproceedings{Hershey2016mar,
      • author = {Hershey, John R. and Chen, Zhuo and Le Roux, Jonathan and Watanabe, Shinji},
      • title = {Deep Clustering: Discriminative Embeddings for Segmentation and Separation},
      • booktitle = {IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
      • year = 2016,
      • pages = {31--35},
      • month = mar,
      • doi = {10.1109/ICASSP.2016.7471631},
      • url = {https://www.merl.com/publications/TR2016-003}
      • }