Search CORE

195 research outputs found

Bridging the Granularity Gap for Acoustic Modeling

Author: Hu Chi
Jiao Chengbo
Liu Xiaoqian
Ma Anxiang
Wang Huizhen
Xiao Tong
Xu Chen
Zeng Xin
Zhang Yuhao
Zhu JingBo
Publication venue
Publication date: 26/05/2023
Field of study

While Transformer has become the de-facto standard for speech, modeling upon the fine-grained frame-level features remains an open challenge of capturing long-distance dependencies and distributing the attention weights. We propose \textit{Progressive Down-Sampling} (PDS) which gradually compresses the acoustic features into coarser-grained units containing more complete semantic information, like text-level representation. In addition, we develop a representation fusion method to alleviate information loss that occurs inevitably during high compression. In this way, we compress the acoustic features into 1/32 of the initial length while achieving better or comparable performances on the speech recognition task. And as a bonus, it yields inference speedups ranging from 1.20

\times

to 1.47

\times

. By reducing the modeling burden, we also achieve competitive results when training on the more challenging speech translation task.Comment: ACL 2023 Finding

arXiv.org e-Print Archive

GFM-Voc: A real-time voice quality modification system

Author: MCLOUGHLIN Ian,
Perrotin Olivier
Publication venue: HAL CCSD
Publication date: 15/09/2019
Field of study

International audienc

Hal - Université Grenoble Alpes

An investigation of speaker independent phrase break models in End-to-End TTS systems

Author: Vadapalli Anandaswarup
Publication venue
Publication date: 09/04/2023
Field of study

This paper presents our work on phrase break prediction in the context of end-to-end TTS systems, motivated by the following questions: (i) Is there any utility in incorporating an explicit phrasing model in an end-to-end TTS system?, and (ii) How do you evaluate the effectiveness of a phrasing model in an end-to-end TTS system? In particular, the utility and effectiveness of phrase break prediction models are evaluated in in the context of childrens story synthesis, using listener comprehension. We show by means of perceptual listening evaluations that there is a clear preference for stories synthesized after predicting the location of phrase breaks using a trained phrasing model, over stories directly synthesized without predicting the location of phrase breaks.Comment: Submitted for review to IEEE Acces

arXiv.org e-Print Archive

Automatic vocalisation-based detection of fragile X syndrome and Rett syndrome

Author: Bartl-Pokorny Katrin D.
Egger Mathias
Marschik Peter B.
Pokorny Florian B.
Schmitt Maximilian
Schuller Björn W.
Zhang Dajie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

Fragile X syndrome (FXS) and Rett syndrome (RTT) are developmental disorders currently not diagnosed before toddlerhood. Even though speech-language deficits are among the key symptoms of both conditions, little is known about infant vocalisation acoustics for an automatic earlier identification of affected individuals. To bridge this gap, we applied intelligent audio analysis methodology to a compact dataset of 4454 home-recorded vocalisations of 3 individuals with FXS and 3 individuals with RTT aged 6 to 11 months, as well as 6 age- and gender-matched typically developing controls (TD). On the basis of a standardised set of 88 acoustic features, we trained linear kernel support vector machines to evaluate the feasibility of automatic classification of (a) FXS vs TD, (b) RTT vs TD, (c) atypical development (FXS+RTT) vs TD, and (d) FXS vs RTT vs TD. In paradigms (a)–(c), all infants were correctly classified; in paradigm (d), 9 of 12 were so. Spectral/cepstral and energy-related features were most relevant for classification across all paradigms. Despite the small sample size, this study reveals new insights into early vocalisation characteristics in FXS and RTT, and provides technical underpinnings for a future earlier identification of affected individuals, enabling earlier intervention and family counselling

OPUS Augsburg

PubMed Central