803 research outputs found

    Syntactic error modeling and scoring normalization in speech recognition: Error modeling and scoring normalization in the speech recognition task for adult literacy training

    Get PDF
    The purpose was to develop a speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Better mechanisms are provided for using speech recognition in a literacy tutor application. Using a combination of scoring normalization techniques and cheater-mode decoding, a reasonable acceptance/rejection threshold was provided. In continuous speech, the system was tested to be able to provide above 80 pct. correct acceptance of words, while correctly rejecting over 80 pct. of incorrectly pronounced words

    Syntactic error modeling and scoring normalization in speech recognition

    Get PDF
    The objective was to develop the speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Research was performed in the following areas: (1) syntactic error modeling; (2) score normalization; and (3) phoneme error modeling. The study into the types of errors that a reader makes will provide the basis for creating tests which will approximate the use of the system in the real world. NASA-Johnson will develop this technology into a 'Literacy Tutor' in order to bring innovative concepts to the task of teaching adults to read

    A computational simulation of children's performance across three nonword repetition tests

    Get PDF
    The nonword repetition test has been regularly used to examine children’s vocabulary acquisition, and yet there is no clear explanation of all of the effects seen in nonword repetition. This paper presents a study of 5-6 year-old children’s repetition performance on three nonword repetition tests that vary in the degree of their lexicality. EPAM-VOC, a model of children’s vocabulary acquisition, is then presented that captures the children’s performance in all three repetition tests. The model represents a clear explanation of how working memory and long-term lexical and sub-lexical knowledge interact in a way that is able to simulate repetition performance across three nonword tests within the same model and without the need for test specific parameter settings

    Parametrised phonological event parsing

    Get PDF
    This paper describes a phonological event parser for spoken language recognition which has been provided with a parametrisable development environment for examining the extent to which linguistically significant issues such as linguistic competence (structural constraints) and linguistic performance (robustness) can play a role in the spoken language recognition task.Ein phonologischer Ereignisparser zur Erkennung gesprochener Sprache wird zusammen mit einer parametrisierbaren Entwicklungsumgebung vorgestellt. Diese Umgebung dient nicht nur der Entwicklung und Konsistenz- und VollstĂ€ndigkeitsprĂŒfung des zugrundeliegenden computerphonologischen Modells, sondern ermöglicht auch eine gezielte Evaluierung ausgewĂ€hlter linguistisch motivierter constraints zur robusten Erkennung gesprochener Sprache

    A Multilingual Phonological Resource Toolkit for Ubiquitous Speech Technology

    Get PDF
    This paper outlines the generation process of a specifi computational linguistic representation termed the Multilingual Time Map, conceptually a multi-tape finit state transducer encoding linguistic data at different levels of granularity. The fi st component acquires phonological data from syllable labeled speech data, the second component define feature profiles the third component generates feature hierarchies and augments the acquired data with the define feature profiles and the fourth component displays the Multilingual Time Map as a graph

    Modelling the formation of phonotactic restrictions across the mental lexicon

    Get PDF
    Experimental data shows that adult learners of an artificial language with a phonotactic restriction learned this restriction better when being trained on word types (e.g. when they were presented with 80 different words twice each) than when being trained on word tokens (e.g. when presented with 40 different words four times each) (Hamann & Ernestus submitted). These findings support Pierrehumbert’s (2003) observation that phonotactic co-occurrence restrictions are formed across lexical entries, since only lexical levels of representation can be sensitive to type frequencies

    Revisiting the Status of Speech Rhythm

    Get PDF
    Text-to-Speech synthesis offers an interesting manner of synthesising various knowledge components related to speech production. To a certain extent, it provides a new way of testing the coherence of our understanding of speech production in a highly systematic manner. For example, speech rhythm and temporal organisation of speech have to be well-captured in order to mimic a speaker correctly. The simulation approach used in our laboratory for two languages supports our original hypothesis of multidimensionality and non-linearity in the production of speech rhythm. This paper presents an overview of our approach towards this issue, as it has been developed over the last years. We conceive the production of speech rhythm as a multidimensional task, and the temporal organisation of speech as a key component of this task (i.e., the establishment of temporal boundaries and durations). As a result of this multidimensionality, text-to-speech systems have to accommodate a number of systematic transformations and computations at various levels. Our model of the temporal organisation of read speech in French and German emerges from a combination of quantitative and qualitative parameters, organised according to psycholinguistic and linguistic structures. (An ideal speech synthesiser would also take into account subphonemic as well as pragmatic parameters. However such systems are not yet available)

    Loanword adaptation as first-language phonological perception

    Get PDF
    We show that loanword adaptation can be understood entirely in terms of phonological and phonetic comprehension and production mechanisms in the first language. We provide explicit accounts of several loanword adaptation phenomena (in Korean) in terms of an Optimality-Theoretic grammar model with the same three levels of representation that are needed to describe L1 phonology: the underlying form, the phonological surface form, and the auditory-phonetic form. The model is bidirectional, i.e., the same constraints and rankings are used by the listener and by the speaker. These constraints and rankings are the same for L1 processing and loanword adaptation

    The cross-linguistic performance of word segmentation models over time.

    Get PDF
    We select three word segmentation models with psycholinguistic foundations - transitional probabilities, the diphone-based segmenter, and PUDDLE - which track phoneme co-occurrence and positional frequencies in input strings, and in the case of PUDDLE build lexical and diphone inventories. The models are evaluated on caregiver utterances in 132 CHILDES corpora representing 28 languages and 11.9 m words. PUDDLE shows the best performance overall, albeit with wide cross-linguistic variation. We explore the reasons for this variation, fitting regression models to performance scores with linguistic properties which capture lexico-phonological characteristics of the input: word length, utterance length, diversity in the lexicon, the frequency of one-word utterances, the regularity of phoneme patterns at word boundaries, and the distribution of diphones in each language. These properties together explain four-tenths of the observed variation in segmentation performance, a strong outcome and a solid foundation for studying further variables which make the segmentation task difficult
    • 

    corecore