TR2020-029

Overview of the seventh Dialog System Technology Challenge: DSTC7

- D’Haro, L.F., Yoshino, K., Hori, C., Marks, T.K., Polymenakos, L., Kummerfeld, J.K., Galley, M., Gao, X., "Overview of the seventh Dialog System Technology Challenge: DSTC7", Computer Speech and Language, DOI: 10.1016/j.csl.2020.101068, Vol. 62, March 2020.
  BibTeX TR2020-029 PDF
  - @article{D’Haro2020mar,
  - author = {D’Haro, Luis Fernando and Yoshino, Koichiro and Hori, Chiori and Marks, Tim K. and Polymenakos, Lazaros and Kummerfeld, Jonathan K. and Galley, Michel and Gao, Xiang},
  - title = {{Overview of the seventh Dialog System Technology Challenge: DSTC7}},
  - journal = {Computer Speech and Language},
  - year = 2020,
  - volume = 62,
  - month = mar,
  - doi = {10.1016/j.csl.2020.101068},
  - url = {https://www.merl.com/publications/TR2020-029}
  - }
MERL Contacts:
- Chiori
  Hori
- Tim K.
  Marks
Research Areas:

Artificial Intelligence, Computer Vision, Machine Learning, Speech & Audio

Abstract:

This paper provides detailed information about the seventh Dialog System Technology Challenge (DSTC7) and its three tracks aimed to explore the problem of building robust and accurate end-to-end dialog systems. In more detail, DSTC7 focuses on developing and exploring end-to-end technologies for the following three pragmatic challenges: (1) sentence selection for multiple domains, (2) generation of informational responses grounded in external knowledge, and (3) audio visual scene-aware dialog to allow conversations with users about objects and events around them. This paper summarizes the overall setup and results of DSTC7, including detailed descriptions of the different tracks, provided datasets and annotations, overview of the submitted systems and their final results. For Track 1, LSTM-based models performed best across both datasets, allowing teams to effectively handle task variants where no correct answer was present or when multiple paraphrases were included. For Track 2, RNN-based architectures augmented to incorporate facts by using two types of encoders: a dialog encoder and a fact encoder plus using attention mechanisms and a pointer-generator approach provided the best results. Finally, for Track 3, the best model used Hierarchical Attention mechanisms to combine the text and vision information obtaining a 22% better result than the baseline LSTM system for the human rating score. More than 220 participants were registered and about 40 teams participated in the final challenge. 32 scientific papers reporting the systems submitted to DSTC7, and 3 general technical papers for dialog technologies, were presented during the one-day wrap-up workshop at AAAI-19. During the workshop, we reviewed the state-of-the-art systems, shared novel approaches to the DSTC7 tasks, and discussed the future directions for the challenge (DSTC8).

MERL Contacts:

ChioriHori

Tim K.Marks

Research Areas:

Abstract:

Chiori
Hori

Tim K.
Marks