4 research outputs found
Personalized Predictive ASR for Latency Reduction in Voice Assistants
Streaming Automatic Speech Recognition (ASR) in voice assistants can utilize
prefetching to partially hide the latency of response generation. Prefetching
involves passing a preliminary ASR hypothesis to downstream systems in order to
prefetch and cache a response. If the final ASR hypothesis after endpoint
detection matches the preliminary one, the cached response can be delivered to
the user, thus saving latency. In this paper, we extend this idea by
introducing predictive automatic speech recognition, where we predict the full
utterance from a partially observed utterance, and prefetch the response based
on the predicted utterance. We introduce two personalization approaches and
investigate the tradeoff between potential latency gains from successful
predictions and the cost increase from failed predictions. We evaluate our
methods on an internal voice assistant dataset as well as the public SLURP
dataset.Comment: Accepted for Interspeech 202