CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming
  ASR

An, Keyu; Ding, Ke; Ou, Zhijian; Wan, Guanglu; Xiang, Hongyu; Zheng, Huahuan

CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR

Authors: Keyu An
Ke Ding
Zhijian Ou
Guanglu Wan
Hongyu Xiang
Huahuan Zheng
Publication date: 30 March 2022
Publisher

Abstract

History and future contextual information are known to be important for accurate acoustic modeling. However, acquiring future context brings latency for streaming ASR. In this paper, we propose a new framework - Chunking, Simulating Future Context and Decoding (CUSIDE) for streaming speech recognition. A new simulation module is introduced to recursively simulate the future contextual frames, without waiting for future context. The simulation module is jointly trained with the ASR model using a self-supervised loss; the ASR model is optimized with the usual ASR loss, e.g., CTC-CRF as used in our experiments. Experiments show that, compared to using real future frames as right context, using simulated future context can drastically reduce latency while maintaining recognition accuracy. With CUSIDE, we obtain new state-of-the-art streaming ASR results on the AISHELL-1 dataset.Comment: submitted to INTERSPEECH 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2203.16758

Last time updated on 24/04/2022