1 research outputs found
Blank Collapse: Compressing CTC emission for the faster decoding
Connectionist Temporal Classification (CTC) model is a very efficient method
for modeling sequences, especially for speech data. In order to use CTC model
as an Automatic Speech Recognition (ASR) task, the beam search decoding with an
external language model like n-gram LM is necessary to obtain reasonable
results. In this paper we analyze the blank label in CTC beam search deeply and
propose a very simple method to reduce the amount of calculation resulting in
faster beam search decoding speed. With this method, we can get up to 78%
faster decoding speed than ordinary beam search decoding with a very small loss
of accuracy in LibriSpeech datasets. We prove this method is effective not only
practically by experiments but also theoretically by mathematical reasoning. We
also observe that this reduction is more obvious if the accuracy of the model
is higher.Comment: Accepted in Interspeech 202