Takaaki Hori

Takaaki Hori
  • Biography

    Before joining MERL in 2015, Takaaki spent 15 years doing research on speech and language technology at Nippon Telegraph, and Telephone (NTT) in Japan. His work includes studies on speech recognition algorithms using weighted finite-state transducers (WFSTs), efficient search algorithms for spoken document retrieval, spoken language understanding, and automatic meeting analysis.

  • Recent News & Events

    •  NEWS   Chiori Hori will give keynote on scene understanding via multimodal sensing at AI Electronics Symposium
      Date: February 15, 2021
      Where: The 2nd International Symposium on AI Electronics
      MERL Contact: Chiori Hori
      Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
      Brief
      • Chiori Hori, a Senior Principal Researcher in MERL's Speech and Audio Team, will be a keynote speaker at the 2nd International Symposium on AI Electronics, alongside Alex Acero, Senior Director of Apple Siri, Roberto Cipolla, Professor of Information Engineering at the University of Cambridge, and Hiroshi Amano, Professor at Nagoya University and winner of the Nobel prize in Physics for his work on blue light-emitting diodes. The symposium, organized by Tohoku University, will be held online on February 15, 2021, 10am-4pm (JST).

        Chiori's talk, titled "Human Perspective Scene Understanding via Multimodal Sensing", will present MERL's work towards the development of scene-aware interaction. One important piece of technology that is still missing for human-machine interaction is natural and context-aware interaction, where machines understand their surrounding scene from the human perspective, and they can share their understanding with humans using natural language. To bridge this communications gap, MERL has been working at the intersection of research fields such as spoken dialog, audio-visual understanding, sensor signal understanding, and robotics technologies in order to build a new AI paradigm, called scene-aware interaction, that enables machines to translate their perception and understanding of a scene and respond to it using natural language to interact more effectively with humans. In this talk, the technologies will be surveyed, and an application for future car navigation will be introduced.
    •  
    •  NEWS   MERL's Scene-Aware Interaction Technology Featured in Mitsubishi Electric Corporation Press Release
      Date: July 22, 2020
      Where: Tokyo, Japan
      MERL Contacts: Anoop Cherian; Chiori Hori; Takaaki Hori; Jonathan Le Roux; Tim K. Marks; Alan Sullivan; Anthony Vetro
      Research Areas: Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio
      Brief
      • Mitsubishi Electric Corporation announced that the company has developed what it believes to be the world’s first technology capable of highly natural and intuitive interaction with humans based on a scene-aware capability to translate multimodal sensing information into natural language.

        The novel technology, Scene-Aware Interaction, incorporates Mitsubishi Electric’s proprietary Maisart® compact AI technology to analyze multimodal sensing information for highly natural and intuitive interaction with humans through context-dependent generation of natural language. The technology recognizes contextual objects and events based on multimodal sensing information, such as images and video captured with cameras, audio information recorded with microphones, and localization information measured with LiDAR.

        Scene-Aware Interaction for car navigation, one target application, will provide drivers with intuitive route guidance. The technology is also expected to have applicability to human-machine interfaces for in-vehicle infotainment, interaction with service robots in building and factory automation systems, systems that monitor the health and well-being of people, surveillance systems that interpret complex scenes for humans and encourage social distancing, support for touchless operation of equipment in public areas, and much more. The technology is based on recent research by MERL's Speech & Audio and Computer Vision groups.


        Demonstration Video:



        Link:

        Mitsubishi Electric Corporation Press Release
    •  

    See All News & Events for Takaaki
  • Awards

    •  AWARD   MERL's Speech Team Achieves World's 2nd Best Performance at the Third CHiME Speech Separation and Recognition Challenge
      Date: December 15, 2015
      Awarded to: John R. Hershey, Takaaki Hori, Jonathan Le Roux and Shinji Watanabe
      MERL Contacts: Takaaki Hori; Jonathan Le Roux
      Research Area: Speech & Audio
      Brief
      • The results of the third 'CHiME' Speech Separation and Recognition Challenge were publicly announced on December 15 at the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2015) held in Scottsdale, Arizona, USA. MERL's Speech and Audio Team, in collaboration with SRI, ranked 2nd out of 26 teams from Europe, Asia and the US. The task this year was to recognize speech recorded using a tablet in real environments such as cafes, buses, or busy streets. Due to the high levels of noise and the distance from the speaker's mouth to the microphones, this is very challenging task, where the baseline system only achieved 33.4% word error rate. The MERL/SRI system featured state-of-the-art techniques including multi-channel front-end, noise-robust feature extraction, and deep learning for speech enhancement, acoustic modeling, and language modeling, leading to a dramatic 73% reduction in word error rate, down to 9.1%. The core of the system has since been released as a new official challenge baseline for the community to use.
    •  
    See All Awards for MERL
  • Research Highlights

  • Internships with Takaaki

    • SA1612: End-to-end speech and audio processing

      MERL is looking for interns to work on fundamental research in the area of end-to-end speech and audio processing for new and challenging environments using advanced machine learning techniques. The intern will collaborate with MERL researchers to derive and implement new models and learning methods, conduct experiments, and prepare results for high-impact publication. The ideal candidates would be senior Ph.D. students with experience in one or more of automatic speech recognition, speech enhancement, sound event detection, and natural language processing, including good theoretical and practical knowledge of relevant machine learning algorithms with related programming skills. The internship will take place during fall/winter 2021 with an expected duration of 3-6 months and a flexible start date. This internship is preferred to be onsite at MERL, but may be done remotely where you live if the COVID pandemic makes it necessary.

    See All Internships at MERL
  • MERL Publications

    •  Shah, A.P., Geng, S., Gao, P., Cherian, A., Hori, T., Marks, T.K., Le Roux, J., Hori, C., "Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning", arXiv, October 2021.
      BibTeX
      • @inproceedings{Shah2021oct,
      • author = {Shah, Ankit Parag and Geng, Shijie and Gao, Peng and Cherian, Anoop and Hori, Takaaki and Marks, Tim K. and Le Roux, Jonathan and Hori, Chiori},
      • title = {Audio-Visual Scene-Aware Dialog and Reasoning Using Audio-Visual Transformers with Joint Student-Teacher Learning},
      • booktitle = {arXiv},
      • year = 2021,
      • month = oct
      • }
    •  Higuchi, Y., Moritz, N., Le Roux, J., Hori, T., "Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition", Annual Conference of the International Speech Communication Association (Interspeech), DOI: 10.21437/​Interspeech.2021-571, September 2021, pp. 726-730.
      BibTeX TR2021-103 PDF
      • @inproceedings{Higuchi2021sep,
      • author = {Higuchi, Yosuke and Moritz, Niko and Le Roux, Jonathan and Hori, Takaaki},
      • title = {Momentum Pseudo-Labeling for Semi-Supervised Speech Recognition},
      • booktitle = {Annual Conference of the International Speech Communication Association (Interspeech)},
      • year = 2021,
      • pages = {726--730},
      • month = sep,
      • doi = {10.21437/Interspeech.2021-571},
      • url = {https://www.merl.com/publications/TR2021-103}
      • }
    •  Hori, T., Moritz, N., Hori, C., Le Roux, J., "Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers", Annual Conference of the International Speech Communication Association (Interspeech), DOI: 10.21437/​Interspeech.2021-1643, August 2021, pp. 2097-2101.
      BibTeX TR2021-100 PDF
      • @inproceedings{Hori2021aug3,
      • author = {Hori, Takaaki and Moritz, Niko and Hori, Chiori and Le Roux, Jonathan},
      • title = {Advanced Long-context End-to-end Speech Recognition Using Context-expanded Transformers},
      • booktitle = {Annual Conference of the International Speech Communication Association (Interspeech)},
      • year = 2021,
      • pages = {2097--2101},
      • month = aug,
      • doi = {10.21437/Interspeech.2021-1643},
      • url = {https://www.merl.com/publications/TR2021-100}
      • }
    •  Hori, C., Hori, T., Le Roux, J., "Optimizing Latency for Online Video Captioning Using Audio-VisualTransformers", Annual Conference of the International Speech Communication Association (Interspeech), DOI: 10.21437/​Interspeech.2021-1975, August 2021, pp. 586–590.
      BibTeX TR2021-093 PDF
      • @inproceedings{Hori2021aug2,
      • author = {Hori, Chiori and Hori, Takaaki and Le Roux, Jonathan},
      • title = {Optimizing Latency for Online Video Captioning Using Audio-VisualTransformers},
      • booktitle = {Annual Conference of the International Speech Communication Association (Interspeech)},
      • year = 2021,
      • pages = {586–590},
      • month = aug,
      • publisher = {ISCA},
      • doi = {10.21437/Interspeech.2021-1975},
      • url = {https://www.merl.com/publications/TR2021-093}
      • }
    •  Moritz, N., Hori, T., Le Roux, J., "Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition", Annual Conference of the International Speech Communication Association (Interspeech), DOI: 10.21437/​Interspeech.2021-1693, August 2021, pp. 1822-1826.
      BibTeX TR2021-094 PDF
      • @inproceedings{Moritz2021aug,
      • author = {Moritz, Niko and Hori, Takaaki and Le Roux, Jonathan},
      • title = {Dual Causal/Non-Causal Self-Attention for Streaming End-to-End Speech Recognition},
      • booktitle = {Annual Conference of the International Speech Communication Association (Interspeech)},
      • year = 2021,
      • pages = {1822--1826},
      • month = aug,
      • doi = {10.21437/Interspeech.2021-1693},
      • url = {https://www.merl.com/publications/TR2021-094}
      • }
    See All Publications for Takaaki
  • Videos

  • MERL Issued Patents

    • Title: "System and Method for Multichannel End-to-End Speech Recognition"
      Inventors: Ochiai, Tsubasa; Watanabe, Shinji; Hori, Takaaki; Hershey, John R.
      Patent No.: 11,133,011
      Issue Date: Sep 28, 2021
    • Title: "System and Method for End-to-End Speech Recognition with Triggered Attention"
      Inventors: Moritz, Niko; Hori, Takaaki; Le Roux, Jonathan
      Patent No.: 11,100,920
      Issue Date: Aug 24, 2021
    • Title: "Method and System for Multi-Label Classification"
      Inventors: Hori, Takaaki; Hori, Chiori; Watanabe, Shinji; Hershey, John R.; Harsham, Bret A.; Le Roux, Jonathan
      Patent No.: 11,086,918
      Issue Date: Aug 10, 2021
    • Title: "Methods and Systems for Recognizing Simultaneous Speech by Multiple Speakers"
      Inventors: Le Roux, Jonathan; Hori, Takaaki; Settle, Shane; Seki, Hiroshi; Watanabe, Shinji; Hershey, John R.
      Patent No.: 10,811,000
      Issue Date: Oct 20, 2020
    • Title: "Method and Apparatus for Open-Vocabulary End-to-End Speech Recognition"
      Inventors: Hori, Takaaki; Watanabe, Shinji; Hershey, John R.
      Patent No.: 10,672,388
      Issue Date: May 2, 2020
    • Title: "Method and Apparatus for Multi-Lingual End-to-End Speech Recognition"
      Inventors: Watanabe, Shinji; Hori, Takaaki; Seki, Hiroshi; Le Roux, Jonathan; Hershey, John R.
      Patent No.: 10,593,321
      Issue Date: Mar 17, 2020
    • Title: "Method and System for Multi-Modal Fusion Model"
      Inventors: Hori, Chiori; Hori, Takaaki; Hershey, John R.; Marks, Tim
      Patent No.: 10,417,498
      Issue Date: Sep 17, 2019
    • Title: "Method and System for Training Language Models to Reduce Recognition Errors"
      Inventors: Hori, Takaaki; Hori, Chiori; Watanabe, Shinji; Hershey, John R.
      Patent No.: 10,176,799
      Issue Date: Jan 8, 2019
    • Title: "Method and System for Role Dependent Context Sensitive Spoken and Textual Language Understanding with Neural Networks"
      Inventors: Hori, Chiori; Hori, Takaaki; Watanabe, Shinji; Hershey, John R.
      Patent No.: 9,842,106
      Issue Date: Dec 12, 2017
    See All Patents for MERL