Warning: pg_exec(): Query failed: ERROR: Relation "pub_link" does not exist in /var/www/html/template/function.php on line 25

Warning: pg_fetch_all() expects parameter 1 to be resource, boolean given in /var/www/html/template/function.php on line 25
MERL – DiamondTalk: A Java Architecture for Spoken-Language Interfaces

DiamondTalk: A Java Architecture for Spoken-Language Interfaces

DiamondTalk is an application-independent Java architecture for building conversational, multimodal spoken-language interfaces, especially for embedded applications, such as in automobiles, cell phones, home appliances, and robots. The top level components of DiamondTalk are shown in the diagram at the left. The first use of DiamondTalk has been as the platform for implementing a multimodal form-filling application (see FormsTalk project).

Background & Objective:  Building spoken-language interfaces, especially multimodal (e.g., involving both speech and touch or manipulation) and conversational ones, currently requires a large amount of specialized development for each application, and tends to be closely intertwined with the choice of speech recognition and generation engines. The goals of the DiamondTalk architecture are to reduce the amount of specialized development required for each new application by increasing the amount of code reuse (including testing and data collection tools) and to make it easy to substitute different speech engines in an existing application to take advantage of improved technology.

Technical Discussion:  DiamondTalk is based on the Java Beans component architecture; components in other programming languages can easily be integrated using the Java Native Interface facilities. DiamondTalk also takes full advantage of the internationalization facilities of Java, so that it can be used, for example, to build interfaces in English or Japanese. DiamondTalk components communicate by sending events (shown by the arrows in the figure above) and by querying each other's state (not shown). The dialogue manager is at the center of the architecture. It receives information about device state changes and about the user's actions, utterances, and state changes. Based on this information, it can update the device state and produce representations of utterance meaning, which are sent to the spoken-language generation component. Collagen (see project description) is the default implementation of the dialogue manager. The spoken-language understanding component is typically decomposed into a speech recognizer and a semantic analyzer; the spoken-language generation component is typically decomposed into a language generator and a speech synthesizer.

Contacts:
Bret Harsham

Technology Area:  Spoken Language Interfaces

Modification Date:  September 14, 2007