Search CORE

14,356 research outputs found

Machine Assisted Analysis of Vowel Length Contrasts in Wolof

Author: Besacier Laurent
Gauthier Elodie
Voisin Sylvie
Publication venue
Publication date: 01/06/2017
Field of study

Growing digital archives and improving algorithms for automatic analysis of text and speech create new research opportunities for fundamental research in phonetics. Such empirical approaches allow statistical evaluation of a much larger set of hypothesis about phonetic variation and its conditioning factors (among them geographical / dialectal variants). This paper illustrates this vision and proposes to challenge automatic methods for the analysis of a not easily observable phenomenon: vowel length contrast. We focus on Wolof, an under-resourced language from Sub-Saharan Africa. In particular, we propose multiple features to make a fine evaluation of the degree of length contrast under different factors such as: read vs semi spontaneous speech ; standard vs dialectal Wolof. Our measures made fully automatically on more than 20k vowel tokens show that our proposed features can highlight different degrees of contrast for each vowel considered. We notably show that contrast is weaker in semi-spontaneous speech and in a non standard semi-spontaneous dialect.Comment: Accepted to Interspeech 201

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

HAL

Recommended from our members

Parallels in the sequential organization of birdsong and human speech.

Author: Gentner Timothy Q
Sainburg Tim
Theilman Brad
Thielk Marvin
Publication venue: eScholarship, University of California
Publication date: 01/08/2019
Field of study

Human speech possesses a rich hierarchical structure that allows for meaning to be altered by words spaced far apart in time. Conversely, the sequential structure of nonhuman communication is thought to follow non-hierarchical Markovian dynamics operating over only short distances. Here, we show that human speech and birdsong share a similar sequential structure indicative of both hierarchical and Markovian organization. We analyze the sequential dynamics of song from multiple songbird species and speech from multiple languages by modeling the information content of signals as a function of the sequential distance between vocal elements. Across short sequence-distances, an exponential decay dominates the information in speech and birdsong, consistent with underlying Markovian processes. At longer sequence-distances, the decay in information follows a power law, consistent with underlying hierarchical processes. Thus, the sequential organization of acoustic elements in two learned vocal communication signals (speech and birdsong) shows functionally equivalent dynamics, governed by similar processes

eScholarship - University of California

Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

Author: Cristia Alejandrina
Dupoux Emmanuel
Guevara-Rukoz Adriana
Ludusan Bogdan
Martin Andrew
Mazuka Reiko
Thiollière Roland
Publication venue
Publication date: 23/12/2017
Field of study

We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: the IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.Comment: Draf

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Language modeling and transcription of the TED corpus lectures

Author: Cettolo M.
Federico M.
Leeuwis E.
Publication venue: IEEE
Publication date: 01/01/2003
Field of study

Transcribing lectures is a challenging task, both in acoustic and in language modeling. In this work, we present our first results on the automatic transcription of lectures from the TED corpus, recently released by ELRA and LDC. In particular, we concentrated our effort on language modeling. Baseline acoustic and language models were developed using respectively 8 hours of TED transcripts and various types of texts: conference proceedings, lecture transcripts, and conversational speech transcripts. Then, adaptation of the language model to single speakers was investigated by exploiting different kinds of information: automatic transcripts of the talk, the title of the talk, the abstract and, finally, the paper. In the last case, a 39.2% WER was achieved

Archivio della ricerca - Fondazione Bruno Kessler

University of Twente Research Information

Transfer learning of language-independent end-to-end ASR with language model fusion

Author: Baskar Murali Karthick
Cho Jaejin
Inaguma Hirofumi
Kawahara Tatsuya
Watanabe Shinji
Publication venue
Publication date: 07/05/2019
Field of study

This work explores better adaptation methods to low-resource languages using an external language model (LM) under the framework of transfer learning. We first build a language-independent ASR system in a unified sequence-to-sequence (S2S) architecture with a shared vocabulary among all languages. During adaptation, we perform LM fusion transfer, where an external LM is integrated into the decoder network of the attention-based S2S model in the whole adaptation stage, to effectively incorporate linguistic context of the target language. We also investigate various seed models for transfer learning. Experimental evaluations using the IARPA BABEL data set show that LM fusion transfer improves performances on all target five languages compared with simple transfer learning when the external text data is available. Our final system drastically reduces the performance gap from the hybrid systems.Comment: Accepted at ICASSP201

arXiv.org e-Print Archive

Crossref

A corpus-based study of Spanish L2 mispronunciations by Japanese speakers

Author: Carranza Mario
Cucchiarini Catia
Llisterri Joaquim
Machuca Ayuso María Jesús
Ríos Antonio
Publication venue: 'IATED Academy'
Publication date: 01/01/2014
Field of study

In a companion paper (Carranza et al.) submitted to this conference we discuss the importance of collecting specific L1-L2 speech corpora for the sake of developing effective Computer Assisted Pronunciation Training (CAPT) programs. In this paper we examine this point more deeply by reporting on a study that was aimed at compiling and analysing such a corpus to draw up an inventory of recurrent pronunciation errors to be addressed in a CAPT application that makes use of Automatic Speech Recognition (ASR). In particular we discuss some of the results obtained in the analyses of this corpus and some of the methodological issues we had to deal with. The corpus features 8.9 hours of spontaneous, semi-spontaneous and read speech recorded from 20 Japanese students of Spanish L2. The speech data was segmented and transcribed at the orthographic, canonical-phonemic and narrow-phonetic level using Praat software [1]. We adopted the SAMPA phonemic inventory for the phonemic transcription adapted to Spanish [2] and added 11 new symbols and 7 diacritics taken from X-SAMPA [3] for the narrow-phonetic transcription. Non linguistic phenomena and incidents were also annotated with XML tags in independent tiers. Standards for transcribing and annotating non-native spontaneous speech ([4], [5]), as well as the error encoding system used in the project will be addressed. Up to 13410 errors were segmented, aligned with the canonical-phonemic tier and the narrow-phonetic tier, and annotated following an encoding system that specifies the type of error (substitutions, insertion and deletion), the affected phone and the preceding and following phonemic contexts where the error occurred. We then carried out additional analyses to check the accuracy of the transcriptions by asking two other annotators to transcribe a subset of the speech material. We calculated intertranscriber agreement coefficients. The data was automatically recovered by Praat scripts and statistically analyzed with R. The resulting frequency ratios obtained for the most frequent errors and the most frequent contexts of appearance were statistically tested to determine their significance values. We report on the analyses of the combined annotations and draw up an inventory of errors that should be addressed in the training. We then consider how ASR can be employed to properly detect these errors. Furthermore, we suggest possible exercises that may be included in the training to improve the errors identified

Diposit Digital de Documents de la UAB