Takaaki Hori

Takaaki Hori
  • Biography

    Before joining MERL in 2015, Takaaki spent 15 years doing research on speech and language technology at Nippon Telegraph, and Telephone (NTT) in Japan. His work includes studies on speech recognition algorithms using weighted finite-state transducers (WFSTs), efficient search algorithms for spoken document retrieval, spoken language understanding, and automatic meeting analysis.

  • News & Events

    •  EVENT   SANE 2018 - Speech and Audio in the Northeast
      Date: Thursday, October 18, 2018
      MERL Contacts: Takaaki Hori; Jonathan Le Roux
      Location: Google, Cambridge, MA
      Research Area: Speech & Audio
      Brief
      • SANE 2018, a one-day event gathering researchers and students in speech and audio from the Northeast of the American continent, will be held on Thursday October 18, 2018 at Google, in Cambridge, MA. MERL is one of the organizers and sponsors of the workshop.

        It is the 7th edition in the SANE series of workshops, which started at MERL in 2012. Since the first edition, the audience has steadily grown, with a record 180 participants in 2017.

        SANE 2018 will feature invited talks by leading researchers from the Northeast, as well as from the international community. It will also feature a lively poster session, open to both students and researchers.
    •  
    •  NEWS   Takaaki Hori leads speech technology workshop
      Date: June 25, 2018 - August 3, 2018
      Where: Johns Hopkins University, Baltimore, MD
      MERL Contacts: Takaaki Hori; Jonathan Le Roux
      Research Area: Speech & Audio
      Brief
      • MERL Speech & Audio Team researcher Takaaki Hori led a team of 27 senior researchers and Ph.D. students from different organizations around the world, working on "Multi-lingual End-to-End Speech Recognition for Incomplete Data" as part of the Jelinek Memorial Summer Workshop on Speech and Language Technology (JSALT). The JSALT workshop is a renowned 6-week hands-on workshop held yearly since 1995. This year, the workshop was held at Johns Hopkins University in Baltimore from June 25 to August 3, 2018. Takaaki's team developed new methods for end-to-end Automatic Speech Recognition (ASR) with a focus on low-resource languages with limited labelled data.

        End-to-end ASR can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries. Some end-to-end systems have recently achieved performance comparable to or better than conventional systems in several tasks. However, the current model training algorithms basically require paired data, i.e., speech data and the corresponding transcription. Sufficient amount of such complete data is usually unavailable for minor languages, and creating such data sets is very expensive and time consuming.

        The goal of Takaaki's team project was to expand the applicability of end-to-end models to multilingual ASR, and to develop new technology that would make it possible to build highly accurate systems even for low-resource languages without a large amount of paired data. Some major accomplishments of the team include building multi-lingual end-to-end ASR systems for 17 languages, developing novel architectures and training methods for end-to-end ASR, building end-to-end ASR-TTS (Text-to-speech) chain for unpaired data training, and developing ESPnet, an open-source end-to-end speech processing toolkit. Three papers stemming from the team's work have already been accepted to the 2018 IEEE Spoken Language Technology Workshop (SLT), with several more to be submitted to upcoming conferences.
    •  

    See All News & Events for Takaaki
  • Awards

    •  AWARD   MERL's Speech Team Achieves World's 2nd Best Performance at the Third CHiME Speech Separation and Recognition Challenge
      Date: December 15, 2015
      Awarded to: John R. Hershey, Takaaki Hori, Jonathan Le Roux and Shinji Watanabe
      MERL Contacts: Takaaki Hori; Jonathan Le Roux
      Research Areas: Speech & Audio, Artificial Intelligence
      Brief
      • The results of the third 'CHiME' Speech Separation and Recognition Challenge were publicly announced on December 15 at the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015) held in Scottsdale, Arizona, USA. MERL's Speech and Audio Team, in collaboration with SRI, ranked 2nd out of 26 teams from Europe, Asia and the US. The task this year was to recognize speech recorded using a tablet in real environments such as cafes, buses, or busy streets. Due to the high levels of noise and the distance from the speaker's mouth to the microphones, this is very challenging task, where the baseline system only achieved 33.4% word error rate. The MERL/SRI system featured state-of-the-art techniques including multi-channel front-end, noise-robust feature extraction, and deep learning for speech enhancement, acoustic modeling, and language modeling, leading to a dramatic 73% reduction in word error rate, down to 9.1%. The core of the system has since been released as a new official challenge baseline for the community to use.
    •  
    See All Awards for MERL
  • Research Highlights

  • Internships with Takaaki

    • SA1132: End-to-end acoustic analysis recognition and inference

      MERL is looking for an intern to work on fundamental research in the area of end-to-end acoustic analysis, recognition, and inference using machine learning techniques such as deep learning. The intern will collaborate with MERL researchers to derive and implement new models and optimization methods, conduct experiments, and prepare results for high impact publication. The ideal candidate would be a senior Ph.D. student with experience in one or more of source separation, speech recognition, and natural language processing including practical machine learning algorithms with related programming skills. The duration of the internship is expected to be 3-6 months.

    See All Internships at MERL
  • MERL Publications

    •  Watanabe, S., Hori, T., Karita, S., Hayashi, T., Nishitoba, J., Unno, Y., Enrique Yalta Soplin, N., Heymann, J., Wiesner, M., Chen, N., Renduchintala, A., Ochiai, T., "ESPnet: End-to-End Speech Processing Toolkit", Interspeech, September 2018.
    •  Seki, H., Hori, T., Watanabe, S., Le Roux, J., Hershey, J., "A Purely End-to-end System for Multi-speaker Speech Recognition", Annual Meeting of the Association for Computational Linguistics (ACL), Jul 16, 2018.
    •  Hori, C., Alamri, H., Wang, J., Wichern, G., Hori, T., Cherian, A., Marks, T.K., Cartillier, V., Lopes, R., Das, A., Essa, I., Batra, D., Parikh, D., "End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features", arXiv, July 13, 2018.
      BibTeX Download PDFAbout TR2018-085
      • @techreport{MERL_TR2018-085,
      • author = {Hori, C. and Alamri, H. and Wang, J. and Wichern, G. and Hori, T. and Cherian, A. and Marks, T.K. and Cartillier, V. and Lopes, R. and Das, A. and Essa, I. and Batra, D. and Parikh, D.},
      • title = {End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features},
      • institution = {MERL - Mitsubishi Electric Research Laboratories},
      • address = {Cambridge, MA 02139},
      • number = {TR2018-085},
      • month = jul,
      • year = 2018,
      • url = {http://www.merl.com/publications/TR2018-085/}
      • }
    •  Seki, H., Hori, T., Watanabe, S., Le Roux, J., Hershey, J., "A Purely End-to-end System for Multi-speaker Speech Recognition", arXiv, July 10, 2018.
      BibTeX Download PDFAbout TR2018-058
      • @techreport{MERL_TR2018-058,
      • author = {Seki, H. and Hori, T. and Watanabe, S. and Le Roux, J. and Hershey, J.},
      • title = {A Purely End-to-end System for Multi-speaker Speech Recognition},
      • institution = {MERL - Mitsubishi Electric Research Laboratories},
      • address = {Cambridge, MA 02139},
      • number = {TR2018-058},
      • month = jul,
      • year = 2018,
      • url = {http://www.merl.com/publications/TR2018-058/}
      • }
    •  Seki, H., Watanabe, S., Hori, T., Le Roux, J., Hershey, J.R., "An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), April 2018.
    See All Publications for Takaaki
  • MERL Issued Patents

    • Title: "Method and System for Role Dependent Context Sensitive Spoken and Textual Language Understanding with Neural Networks"
      Inventors: Hori, Chiori; Hori, Takaaki; Watanabe, Shinji; Hershey, John R.
      Patent No.: 9,842,106
      Issue Date: Dec 12, 2017
    See All Patents for MERL