Collaborative Spoken-Language Interfaces
We have developed a prototype spoken-language interface to programming a video-cassette recorder (VCR), in which a user collaborates with an intelligent software agent in natural spoken English. In the collaborative conversation, the user communicates his goals to the software agent and the software agent asks for more information when necessary. The agent takes care of the details of managing the recording schedule and operating the VCR controls. This prototype demonstrates generic technology we have developed for building collaborative spoken-language interfaces for varied applications.
Background & Objective: Speech recognition technology has matured dramatically in the past few years, with the first generation of products with embedded speech recognition now coming to market. These products typically use a small set of command words as an alternative to pushing the buttons on a cell phone, car dashboard, or other device. Our goal is a second generation of spoken-language interfaces, which will support much more complex interactions, such as programming a VCR, operating a home network, or retrieving information from a large database.
Technical Discussion: The architecture of a collaborative spoken-language interface has three main components: spoken-language understanding (which starts with speech recognition and then maps from sequences of words to their meanings), spoken-language generation (which maps from meaning representations to sequences of words and then to speech), and collaboration management (which maintains a model of the conversation structure and the task status). We are currently using commercially available products for the first two components and the COLLAGEN (COLLaborative AGENt) system for the third. Each of the three components above operates by applying application-independent algorithms, such as parsing or plan recognition, to application- and/or language-specific data sets, such as a grammar of English or a hierarchical task model for VCR programming. Assembling complete data sets for all of these components for a particular application is a very large engineering effort, which we have not undertaken for the VCR programming prototype.
Contact: Richard (Dick) Waters
Technology Areas:
Spoken Language Interfaces
Artificial Intelligence
Modification Date: September 14, 2007

