Search CORE

111 research outputs found

Four-in-One: A Joint Approach to Inverse Text Normalization, Punctuation, Capitalization, and Disfluency for Automatic Speech Recognition

Author: Alphonso Issac
Behre Piyush
Chang Shuangyu
Kibre Nick
Tan Sharman
Publication venue
Publication date: 26/10/2022
Field of study

Features such as punctuation, capitalization, and formatting of entities are important for readability, understanding, and natural language processing tasks. However, Automatic Speech Recognition (ASR) systems produce spoken-form text devoid of formatting, and tagging approaches to formatting address just one or two features at a time. In this paper, we unify spoken-to-written text conversion via a two-stage process: First, we use a single transformer tagging model to jointly produce token-level tags for inverse text normalization (ITN), punctuation, capitalization, and disfluencies. Then, we apply the tags to generate written-form text and use weighted finite state transducer (WFST) grammars to format tagged ITN entity spans. Despite joining four models into one, our unified tagging approach matches or outperforms task-specific models across all four tasks on benchmark test sets across several domains

arXiv.org e-Print Archive

Automatic truecasing of video subtitles using BERT: a multilingual adaptable approach

Author: Batista F.
Nuno Miguel Guerreiro
Ricardo Rei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

This paper describes an approach for automatic capitalization of text without case information, such as spoken transcripts of video subtitles, produced by automatic speech recognition systems. Our approach is based on pre-trained contextualized word embeddings, requires only a small portion of data for training when compared with traditional approaches, and is able to achieve state-of-the-art results. The paper reports experiments both on general written data from the European Parliament, and on video subtitles, revealing that the proposed approach is suitable for performing capitalization, not only in each one of the domains, but also in a cross-domain scenario. We have also created a versatile multilingual model, and the conducted experiments show that good results can be achieved both for monolingual and multilingual data. Finally, we applied domain adaptation by finetuning models, initially trained on general written data, on video subtitles, revealing gains over other approaches not only in performance but also in terms of computational cost.info:eu-repo/semantics/publishedVersio

Crossref

Repositório Institucional do ISCTE-IUL

Automatic punctuation restoration with BERT models

Author: Bial Bence
Nagy Attila
Ács Judit
Publication venue
Publication date: 01/01/2021
Field of study

We present an approach for automatic punctuation restoration with BERT models for English and Hungarian. For English, we conduct our experiments on Ted Talks, a commonly used benchmark for punctuation restoration, while for Hungarian we evaluate our models on the Szeged Treebank dataset. Our best models achieve a macro-averaged F1-score of 79.8 in English and 82.2 in Hungarian. Our code is publicly available

University of Szeged

Text Injection for Capitalization and Turn-Taking Prediction in Speech Models

Author: Bijwadia Shaan
Chang Shuo-yiin
Meng Zhong
Sainath Tara N.
Wang Weiran
Zhang Hao
Publication venue
Publication date: 14/08/2023
Field of study

Text injection for automatic speech recognition (ASR), wherein unpaired text-only data is used to supplement paired audio-text data, has shown promising improvements for word error rate. This study examines the use of text injection for auxiliary tasks, which are the non-ASR tasks often performed by an E2E model. In this work, we use joint end-to-end and internal language model training (JEIT) as our text injection algorithm to train an ASR model which performs two auxiliary tasks. The first is capitalization, which is a de-normalization task. The second is turn-taking prediction, which attempts to identify whether a user has completed their conversation turn in a digital assistant interaction. We show results demonstrating that our text injection method boosts capitalization performance for long-tail data, and improves turn-taking detection recall

arXiv.org e-Print Archive

Punctuation Prediction for Norwegian: Using Established Approaches for Under-Resourced Languages

Author: Prestegard Guro Sivertsen
Publication venue: The University of Bergen
Publication date: 01/01/2021
Field of study

Masteroppgåve i informasjonsvitskapINFO390MASV-INF

University of Bergen

NORA - Norwegian Open Research Archives