Context Sensitive Spoken Language Understanding using Role Dependent LSTM layers


Neural network models have become a recent focus of investigation in spoken language understanding (SLU). To understand speaker intentions accurately in a dialog, it is important to consider the sentence in the context of the surrounding sequence of dialog turns. In this study, we use long short-term memory (LSTM) recurrent neural networks (RNNs) to train a context sensitive model to predict sequences of dialog concepts from the spoken word sequences. In this model, words of each utterance are input one at a time, and concept tags are output at the end of each utterance. The model is trained from human-to-human dialog data annotated with concept tags representing client and agent intentions for a hotel reservation task. The LSTM layers jointly represent both the context within each utterance, and the context within the dialog. The different roles of client and agent are modeled by switching between role-dependent layers. To evaluate the performance of our models, we compared label accuracies using Logistic Regression (LR) and LSTMs. The results show 70.8% for LR, 72.4% for LR w/ word2vec, 78.8% for context sensitive LSTMs, and 84.0% for role dependent LSTMs. We confirmed significant improvement by using context sensitive role dependent LSTMs.