72 research outputs found

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

    Machine Assisted Analysis of Vowel Length Contrasts in Wolof

    Full text link
    Growing digital archives and improving algorithms for automatic analysis of text and speech create new research opportunities for fundamental research in phonetics. Such empirical approaches allow statistical evaluation of a much larger set of hypothesis about phonetic variation and its conditioning factors (among them geographical / dialectal variants). This paper illustrates this vision and proposes to challenge automatic methods for the analysis of a not easily observable phenomenon: vowel length contrast. We focus on Wolof, an under-resourced language from Sub-Saharan Africa. In particular, we propose multiple features to make a fine evaluation of the degree of length contrast under different factors such as: read vs semi spontaneous speech ; standard vs dialectal Wolof. Our measures made fully automatically on more than 20k vowel tokens show that our proposed features can highlight different degrees of contrast for each vowel considered. We notably show that contrast is weaker in semi-spontaneous speech and in a non standard semi-spontaneous dialect.Comment: Accepted to Interspeech 201

    Rhythmic unit extraction and modelling for automatic language identification

    Get PDF
    International audienceThis paper deals with an approach to Automatic Language Identification based on rhythmic modelling. Beside phonetics and phonotactics, rhythm is actually one of the most promising features to be considered for language identification, even if its extraction and modelling are not a straightforward issue. Actually, one of the main problems to address is what to model. In this paper, an algorithm of rhythm extraction is described: using a vowel detection algorithm, rhythmic units related to syllables are segmented. Several parameters are extracted (consonantal and vowel duration, cluster complexity) and modelled with a Gaussian Mixture. Experiments are performed on read speech for 7 languages (English, French, German, Italian, Japanese, Mandarin and Spanish) and results reach up to 86 ± 6% of correct discrimination between stress-timed mora-timed and syllable-timed classes of languages, and to 67 ± 8% percent of correct language identification on average for the 7 languages with utterances of 21 seconds. These results are commented and compared with those obtained with a standard acoustic Gaussian mixture modelling approach (88 ± 5% of correct identification for the 7-languages identification task)

    Using Statistical Models of Morphology in the Search for Optimal Units of Representation in the Human Mental Lexicon

    Get PDF
    Determining optimal units of representing morphologically complex words in the mental lexicon is a central question in psycholinguistics. Here, we utilize advances in computational sciences to study human morphological processing using statistical models of morphology, particularly the unsupervised Morfessor model that works on the principle of optimization. The aim was to see what kind of model structure corresponds best to human word recognition costs for multimorphemic Finnish nouns: a model incorporating units resembling linguistically defined morphemes, a whole-word model, or a model that seeks for an optimal balance between these two extremes. Our results showed that human word recognition was predicted best by a combination of two models: a model that decomposes words at some morpheme boundaries while keeping others unsegmented and a whole-word model. The results support dual-route models that assume that both decomposed and full-form representations are utilized to optimally process complex words within the mental lexicon.Peer reviewe

    Acta Cybernetica : Volume 18. Number 3.

    Get PDF

    Literacy as a performing art: a phenomenological study of oral dramatic reading

    Get PDF
    Based on semiotic, aesthetic response, reader response, and drama in education theories, this phenomenological study seeks to describe the literary experience of text through oral interpretation for middle to high SES, fourth and eighth grade students as compared to Low SES fourth and eighth grade students. Using the research methodology of Moustakas (1994) and data analysis of Teddlie (2000), this study proposes to describe and understand the relation of literary understanding and oral dramatic expression implicit in the descriptive paralinguistic and chronemic patternizations of the oral rendition of text and describe the act of reading as phenomenology. Descriptions of the perceptions and reading experiences of Low Socioeconomic Status (SES) and Middle-High SES dramatic readers was obtained through multiple interviews and recorded readings. Rich descriptions were used as the basis for a reflective structural analysis. Ultimately, the goal was to determine the effect of the voice of interpretation on the perception of the reader and to determine the benefit of dramatization as a tool for comprehension across varied educational and experiential backgrounds. Results reflected an across the board positive correlation between students\u27 perceptions of reading as a significant and meaningful learning experience and students\u27 use of dramatic interpretation through the indices of the voice. For oral dramatic readers, the purpose for reading was the process, not just the product. Dramatic readers see reading as something composed that must be performed. They are able to perform the story much like a musical score, backing for patterns, beats, and rhythms. Literacy then is a performing art, by definition a form of aesthetic response that is autobiographical in essence, constructivist in nature, and a highly personal phenomenon

    Speech recognition system based on auditory features

    Get PDF
    Realitzat en col·laboració amb el centre o empresa: Sony Deutschland Gmb
    • 

    corecore