Personalized Predictive ASR for Latency Reduction in Voice Assistants

He, Di; Hethnawi, Mohammed; Rastrow, Ariya; Schwarz, Andreas; Van Segbroeck, Maarten

Personalized Predictive ASR for Latency Reduction in Voice Assistants

Authors: Di He
Mohammed Hethnawi
Ariya Rastrow
Andreas Schwarz
Maarten Van Segbroeck
Publication date: 23 May 2023
Publisher

Abstract

Streaming Automatic Speech Recognition (ASR) in voice assistants can utilize prefetching to partially hide the latency of response generation. Prefetching involves passing a preliminary ASR hypothesis to downstream systems in order to prefetch and cache a response. If the final ASR hypothesis after endpoint detection matches the preliminary one, the cached response can be delivered to the user, thus saving latency. In this paper, we extend this idea by introducing predictive automatic speech recognition, where we predict the full utterance from a partially observed utterance, and prefetch the response based on the predicted utterance. We introduce two personalization approaches and investigate the tradeoff between potential latency gains from successful predictions and the cost increase from failed predictions. We evaluate our methods on an internal voice assistant dataset as well as the public SLURP dataset.Comment: Accepted for Interspeech 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2305.13794

Last time updated on 26/05/2023