Search CORE

234 research outputs found

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

Author: Braun Stefan
Hsiao Roger
McDermott Erik
Publication venue
Publication date: 13/03/2023
Field of study

The neural transducer is an end-to-end model for automatic speech recognition (ASR). While the model is well-suited for streaming ASR, the training process remains challenging. During training, the memory requirements may quickly exceed the capacity of state-of-the-art GPUs, limiting batch size and sequence lengths. In this work, we analyze the time and space complexity of a typical transducer training setup. We propose a memory-efficient training method that computes the transducer loss and gradients sample by sample. We present optimizations to increase the efficiency and parallelism of the sample-wise method. In a set of thorough benchmarks, we show that our sample-wise method significantly reduces memory usage, and performs at competitive speed when compared to the default batched computation. As a highlight, we manage to compute the transducer loss and gradients for a batch size of 1024, and audio length of 40 seconds, using only 6 GB of memory.Comment: 5 pages, 4 figures, 1 table, 1 algorith

arXiv.org e-Print Archive

cmu gale speech-to-text system,”

Author: Florian Metze
Qin Jin
Roger Hsiao
Tanja Schultz
Udhyakumar Nallasamy
Publication venue
Publication date: 01/01/2010
Field of study

Abstract This paper describes the latest Speech-to-Text system developed for the Global Autonomous Language Exploitation ("GALE") domain by Carnegie Mellon University (CMU). This systems uses discriminative training, bottle-neck features and other techniques that were not used in previous versions of our system, and is trained on 1150 hours of data from a variety of Arabic speech sources. In this paper, we show how different lexica, pre-processing, and system combination techniques can be used to improve the final output, and provide analysis of the improvements achieved by the individual techniques

CiteSeerX

The CMU-InterACT 2008 Mandarin Transcription System

Author: Fuhs Mark
Hsiao Roger
Jin Qin
Schultz Tanja
Wilson Tam
Publication venue
Publication date: 04/08/2008
Field of study

KITopen

Метаболические изменения у пациентов с артериальной гипертонией на фоне рациональной гипотензивной фармакотерапии

Author: Arwin Hans
Birch Jens
Hsiao Ching-Lien
Järrendahl Kenneth
Magnusson Roger
Publication venue: ВГМУ
Publication date: 01/01/2005
Field of study

ГИПЕРТЕНЗИЯ /ЛЕК ТЕРКРОВЕНОСНЫХ СОСУДОВ БОЛЕЗНИОБМЕНА ВЕЩЕСТВ БОЛЕЗНИМЕТАБОЛИЧЕСКИЙ СИНДРОМ XАНТИГИПЕРТЕНЗИВНЫЕ СРЕДСТВАЛЕКАРСТВЕННАЯ ТЕРАПИ

Publikationer från Linköpings universitet

Electronic arhive of Vitebsk State Medical University Library (Репозиторий библиотеки Витебского государственного медицинского университета)

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Eleven generations of selection for the duration of fertility in the intergeneric crossbreeding of ducks

Author: Cheng Yu-Shin
Huang Shang-Chi
Huang Yu-Chia
Liao Chung-Wen
Liu Tai Jui-Jane
Liu Hsiao-Lung
Poivey Jean-Paul
Rouvier Roger
Tai Chein
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

A 12-generation selection experiment involving a selected line (S) and a control line (C) has been conducted since 1992 with the aim of increasing the number of fertile eggs laid by the Brown Tsaiya duck after a single artificial insemination (AI) with pooled Muscovy semen. On average, 28.9% of the females and 17.05% of the males were selected. The selection responses and the predicted responses showed similar trends. The average predicted genetic responses per generation in genetic standard deviation units were 0.40 for the number of fertile eggs, 0.45 for the maximum duration of fertility, and 0.32 for the number of hatched mule ducklings' traits. The fertility rates for days 2–8 after AI were 89.14% in the S line and 61.46% in the C line. Embryo viability was not impaired by this selection. The largest increase in fertility rate per day after a single AI was observed from d5 to d11. In G12, the fertility rate in the selected line was 91% at d2, 94% at d3, 92% at days 3 and 4 then decreased to 81% at d8, 75% at d9, 58% at d10 and 42% at d11. In contrast, the fertility rate in the control line showed an abrupt decrease from d4 (74%). The same tendencies were observed for the evolution of hatchability according to the egg set rates. It was concluded that selection for the number of fertile eggs after a single AI with pooled Muscovy semen could effectively increase the duration of the fertile period in ducks and that research should now be focused on ways to improve the viability of the hybrid mule duck embryo

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Variable Attention Masking for Configurable Transformer Transducer Speech Recognition

Author: Braun Stefan
Can Dogan
da Silva Thiago Fraga
Ghoshal Arnab
Hori Takaaki
Hsiao Roger
Mason Henry
McDermott Erik
Silovsky Honza
Swietojanski Pawel
Travadi Ruchir
Zhuang Xiaodan
Publication venue
Publication date: 02/11/2022
Field of study

This work studies the use of attention masking in transformer transducer based speech recognition for building a single configurable model for different deployment scenarios. We present a comprehensive set of experiments comparing fixed masking, where the same attention mask is applied at every frame, with chunked masking, where the attention mask for each frame is determined by chunk boundaries, in terms of recognition accuracy and latency. We then explore the use of variable masking, where the attention masks are sampled from a target distribution at training time, to build models that can work in different configurations. Finally, we investigate how a single configurable model can be used to perform both first pass streaming recognition and second pass acoustic rescoring. Experiments show that chunked masking achieves a better accuracy vs latency trade-off compared to fixed masking, both with and without FastEmit. We also show that variable masking improves the accuracy by up to 8% relative in the acoustic re-scoring scenario.Comment: 5 pages, 4 figures, 2 Table

arXiv.org e-Print Archive