59 research outputs found

    El papel de la frecuencia del input en la adquisición de la fonología del español como L1. Estudio basado en corpus

    Get PDF
    This study presents the phonological system exhibited by children (n=59) aged 3;0 to 6;0 and focuses on the role of input frequency. Using a spontaneous child speech corpus of Spanish (CHIEDE) as a data source, as well as computational processing techniques —including an automatic phonological transcriber—, data relating to the phonological level was retrieved. This resulted in a phonological inventory of Spanish-speaking children, ordered by frequency of use, which may serve as a model for research on typical and atypical child language development. Additionally, a study was carried out on the stability of the participants’ phonological systems by calculating the variability that the different age groups displayed, and outcomes were compared with other similar corpora. Results obtained from the comparison of the phonological inventory of children and adults show that there is a relationship between frequency of use in adult speech and the order of acquisition of phonemesEste estudio presenta el sistema fonológico que muestran 59 participantes de 3;0 a 6;0 años y el papel que juega la frecuencia del input. Usando como fuente un corpus de habla espontánea (CHIEDE) y técnicas de procesamiento computacional —que incluyen un transcriptor fonológico automático— se extrajeron los datos relativos al nivel fonológico, dando como resultado un inventario fonológico de niños hablantes de español. Este in-ventario, ordenado por frecuencia de uso, puede servir de modelo para la investigación en desarrollo infantil típico y atípico. Además, se realizó un estudio sobre la estabilidad del sistema fonológico de los participantes, calculando la variabilidad entre los diferentes grupos etarios y comparando resultados con otros corpus similares. Los resultados obtenidos de la comparación del inventario infantil con el adulto muestran una clara relación entre la frecuencia de uso del habla adulta y el orden de adquisición de los fonema

    Dealing with linguistic mismatches for automatic speech recognition

    Get PDF
    Recent breakthroughs in automatic speech recognition (ASR) have resulted in a word error rate (WER) on par with human transcribers on the English Switchboard benchmark. However, dealing with linguistic mismatches between the training and testing data is still a significant challenge that remains unsolved. Under the monolingual environment, it is well-known that the performance of ASR systems degrades significantly when presented with the speech from speakers with different accents, dialects, and speaking styles than those encountered during system training. Under the multi-lingual environment, ASR systems trained on a source language achieve even worse performance when tested on another target language because of mismatches in terms of the number of phonemes, lexical ambiguity, and power of phonotactic constraints provided by phone-level n-grams. In order to address the issues of linguistic mismatches for current ASR systems, my dissertation investigates both knowledge-gnostic and knowledge-agnostic solutions. In the first part, classic theories relevant to acoustics and articulatory phonetics that present capability of being transferred across a dialect continuum from local dialects to another standardized language are re-visited. Experiments demonstrate the potentials that acoustic correlates in the vicinity of landmarks could help to build a bridge for dealing with mismatches across difference local or global varieties in a dialect continuum. In the second part, we design an end-to-end acoustic modeling approach based on connectionist temporal classification loss and propose to link the training of acoustics and accent altogether in a manner similar to the learning process in human speech perception. This joint model not only performed well on ASR with multiple accents but also boosted accuracies of accent identification task in comparison to separately-trained models

    Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge

    Get PDF
    More than a decade has passed since research on automatic recognition of emotion from speech has become a new field of research in line with its 'big brothers' speech and speaker recognition. This article attempts to provide a short overview on where we are today, how we got there and what this can reveal us on where to go next and how we could arrive there. In a first part, we address the basic phenomenon reflecting the last fifteen years, commenting on databases, modelling and annotation, the unit of analysis and prototypicality. We then shift to automatic processing including discussions on features, classification, robustness, evaluation, and implementation and system integration. From there we go to the first comparative challenge on emotion recognition from speech-the INTERSPEECH 2009 Emotion Challenge, organised by (part of) the authors, including the description of the Challenge's database, Sub-Challenges, participants and their approaches, the winners, and the fusion of results to the actual learnt lessons before we finally address the ever-lasting problems and future promising attempts. (C) 2011 Elsevier B.V. All rights reserved.Schuller B., Batliner A., Steidl S., Seppi D., ''Recognising realistic emotions and affect in speech: state of the art and lessons learnt from the first challenge'', Speech communication, vol. 53, no. 9-10, pp. 1062-1087, November 2011.status: publishe

    Veröffentlichungen und Vorträge 2003 der Mitgleider der Fakultät für Informatik

    Get PDF
    corecore