FormsTalk: Multimodal Mixed-Initiative Form Filling
FormsTalk is middleware for building form-filling applications that support a mixture of speech and non-speech interaction modes. Examples of non-speech modes include touch screen, keyboard and mouse, and telephone keypad. FormsTalk also supports flexible, mixed-initiative interaction, in which either the user or the system can take the lead, depending on circumstances. In 2004 we built a set of extensions to FormsTalk to provide remote access to industrial plant control data via cell phone.
Background & Objective: Form-filling is a common framework for many different kinds of applications. A multimodal interface approach improves the accessibility of these applications by allowing users to choose whichever mode (speech, touch, etc.) is the best match for their capabilities and the current task. To date, developing such interfaces has tended to be very labor-intensive, with a lot of application-specific code. The goal of FormsTalk is to reduce the amount of application-specific code, so that most of the labor for a new application is involved in authoring the content of the forms, and deployment on different platforms requires a minimum of additional effort. To date, FormsTalk applications have been deployed on PCs, phones, and kiosks in three languages.
Technical Discussion: FormsTalk is built on top of DiamondTalk, which is an application-independent Java architecture for building conversational, multimodal spoken-language interfaces. DiamondTalk allows us to easily substitute different speech recognition and generation engines (e.g., from different vendors), as technologies and applications change. FormsTalk also uses Collagen as its dialogue manager, which provides its mixed-initiative capabilities.
A key part of FormsTalk is a modular architecture which supports the use of special purpose components in application domains where these are necessary. Extension of FormsTalk to the current cell phone application included implementation of components allowing integration with an Apache web server and with a Dialogic telephony board. These components can be reused in future applications. We also designed a special purpose interaction protocol allowing both the voice and data channels of the cell network to be used with an off-the-shelf phone handset. Smoothly integrating the differing capabilities of the two channels was a fundamental part of our work this year.
Contacts:
Bent Schmidt-Nielsen
Bret Harsham
Derek Schwenke
Technology Areas:
Off the Desktop Interaction and Display
Net Services
Spoken Language Interfaces
Modification Date: September 14, 2007

