STOP: A dataset for Spoken Task Oriented Semantic Parsing

Algayres, Robin; Copet, Jade; Dupoux, Emmanuel; Elkahky, Ali; Hsu, Po-Chun; Hsu, Wei-Ning; Lazar, Daniel; Le, Duc; Mohamed, Abdelrahman; Mordechay, Yossef; Nguyen, Tu Ahn; Sagar, Adithya; Shrivastava, Akshat; Tomasello, Paden; Zettlemoyer, Luke

STOP: A dataset for Spoken Task Oriented Semantic Parsing

Authors: Robin Algayres
Jade Copet
Emmanuel Dupoux
Ali Elkahky
Po-Chun Hsu
Wei-Ning Hsu
Daniel Lazar
Duc Le
Abdelrahman Mohamed
Yossef Mordechay
Tu Ahn Nguyen
Adithya Sagar
Akshat Shrivastava
Paden Tomasello
Luke Zettlemoyer
Publication date: 22 July 2022
Publisher

Abstract

End-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device. However, the limited number of public audio datasets with semantic parse labels hinders the research progress in this area. In this paper, we release the Spoken Task-Oriented semantic Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly available. Additionally, we define low-resource splits to establish a benchmark for improving SLU when limited labeled data is available. Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated version to benchmark the performance for low-resource domain adaptation of end-to-end SLU systems. Initial experimentation show end-to-end SLU models performing slightly worse than their cascaded counterparts, which we hope encourages future work in this direction

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2207.10643

Last time updated on 28/09/2022