Smart Speech Segmentation using Acousto-Linguistic Features with
  look-ahead

Basoglu, Chris; Behre, Piyush; Chang, Shuangyu; Khalil, Hosam; Liu, Geoffrey; Parihar, Naveen; Pathak, Sayan; Shah, Amy; Sharma, Eva; Tan, Sharman

Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead

Authors: Chris Basoglu
Piyush Behre
Shuangyu Chang
Hosam Khalil
Geoffrey Liu
Naveen Parihar
Sayan Pathak
Amy Shah
Eva Sharma
Sharman Tan
Publication date: 27 October 2022
Publisher

Abstract

Segmentation for continuous Automatic Speech Recognition (ASR) has traditionally used silence timeouts or voice activity detectors (VADs), which are both limited to acoustic features. This segmentation is often overly aggressive, given that people naturally pause to think as they speak. Consequently, segmentation happens mid-sentence, hindering both punctuation and downstream tasks like machine translation for which high-quality segmentation is critical. Model-based segmentation methods that leverage acoustic features are powerful, but without an understanding of the language itself, these approaches are limited. We present a hybrid approach that leverages both acoustic and language information to improve segmentation. Furthermore, we show that including one word as a look-ahead boosts segmentation quality. On average, our models improve segmentation-F0.5 score by 9.8% over baseline. We show that this approach works for multiple languages. For the downstream task of machine translation, it improves the translation BLEU score by an average of 1.05 points

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2210.14446

Last time updated on 06/12/2022