Search CORE

803 research outputs found

Syntactic error modeling and scoring normalization in speech recognition: Error modeling and scoring normalization in the speech recognition task for adult literacy training

Author: Olorenshaw Lex
Trawick David
Publication venue
Publication date
Field of study

The purpose was to develop a speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Better mechanisms are provided for using speech recognition in a literacy tutor application. Using a combination of scoring normalization techniques and cheater-mode decoding, a reasonable acceptance/rejection threshold was provided. In continuous speech, the system was tested to be able to provide above 80 pct. correct acceptance of words, while correctly rejecting over 80 pct. of incorrectly pronounced words

Syntactic error modeling and scoring normalization in speech recognition

Author: Olorenshaw Lex
Publication venue
Publication date
Field of study

The objective was to develop the speech recognition system to be able to detect speech which is pronounced incorrectly, given that the text of the spoken speech is known to the recognizer. Research was performed in the following areas: (1) syntactic error modeling; (2) score normalization; and (3) phoneme error modeling. The study into the types of errors that a reader makes will provide the basis for creating tests which will approximate the use of the system in the real world. NASA-Johnson will develop this technology into a 'Literacy Tutor' in order to bring innovative concepts to the task of teaching adults to read

A computational simulation of children's performance across three nonword repetition tests

Author: Baddeley
Bailey
Bhatarah
Bishop
Briscoe
Case
Chi
Conti-Ramsden
Cowan
Croker
Dollaghan
Dollaghan
Dunn
Edwards
Ellis Weismer
Feigenbaum
Freudenthal
Freudenthal
Gary Jones
Gathercole
Gathercole
Gathercole
Gathercole
Gathercole
Gathercole
Gathercole
Gobet
Jones
Jones
Jusczyk
Munson
Schneider
Theakston
Vitevitch
Zhang
Publication venue: 'Elsevier BV'
Publication date: 14/08/2010
Field of study

The nonword repetition test has been regularly used to examine children’s vocabulary acquisition, and yet there is no clear explanation of all of the effects seen in nonword repetition. This paper presents a study of 5-6 year-old children’s repetition performance on three nonword repetition tests that vary in the degree of their lexicality. EPAM-VOC, a model of children’s vocabulary acquisition, is then presented that captures the children’s performance in all three repetition tests. The model represents a clear explanation of how working memory and long-term lexical and sub-lexical knowledge interact in a way that is able to simulate repetition performance across three nonword tests within the same model and without the need for test specific parameter settings

Parametrised phonological event parsing

Author: Carson-Berndsen Julie
Drexel Guido
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1996
Field of study

This paper describes a phonological event parser for spoken language recognition which has been provided with a parametrisable development environment for examining the extent to which linguistically significant issues such as linguistic competence (structural constraints) and linguistic performance (robustness) can play a role in the spoken language recognition task.Ein phonologischer Ereignisparser zur Erkennung gesprochener Sprache wird zusammen mit einer parametrisierbaren Entwicklungsumgebung vorgestellt. Diese Umgebung dient nicht nur der Entwicklung und Konsistenz- und Vollständigkeitsprüfung des zugrundeliegenden computerphonologischen Modells, sondern ermöglicht auch eine gezielte Evaluierung ausgewählter linguistisch motivierter constraints zur robusten Erkennung gesprochener Sprache

A Multilingual Phonological Resource Toolkit for Ubiquitous Speech Technology

Author: Aioanei Daniel
Carson-Berndsen Julie
Geumann Anja
Kelly Robert
Neugebauer Moritz
Wilson Stephen
Publication venue: Paris : European Language Resources Association
Publication date: 13/12/2016
Field of study

This paper outlines the generation process of a specifi computational linguistic representation termed the Multilingual Time Map, conceptually a multi-tape finit state transducer encoding linguistic data at different levels of granularity. The fi st component acquires phonological data from syllable labeled speech data, the second component define feature profiles the third component generates feature hierarchies and augments the acquired data with the define feature profiles and the fourth component displays the Multilingual Time Map as a graph

Publikationsserver des Instituts für Deutsche Sprache

Modelling the formation of phonotactic restrictions across the mental lexicon

Author: Apoussidou Diana
Boersma Paul
Hamann Silke
Publication venue
Publication date: 01/01/2009
Field of study

Experimental data shows that adult learners of an artificial language with a phonotactic restriction learned this restriction better when being trained on word types (e.g. when they were presented with 80 different words twice each) than when being trained on word tokens (e.g. when presented with 40 different words four times each) (Hamann & Ernestus submitted). These findings support Pierrehumbert’s (2003) observation that phonotactic co-occurrence restrictions are formed across lexical entries, since only lexical levels of representation can be sensitive to type frequencies

CiteSeerX

UvA-DARE

Revisiting the Status of Speech Rhythm

Author: Zellner Keller Dr. Brigitte
Publication venue
Publication date: 01/01/2002
Field of study

Text-to-Speech synthesis offers an interesting manner of synthesising various knowledge components related to speech production. To a certain extent, it provides a new way of testing the coherence of our understanding of speech production in a highly systematic manner. For example, speech rhythm and temporal organisation of speech have to be well-captured in order to mimic a speaker correctly. The simulation approach used in our laboratory for two languages supports our original hypothesis of multidimensionality and non-linearity in the production of speech rhythm. This paper presents an overview of our approach towards this issue, as it has been developed over the last years. We conceive the production of speech rhythm as a multidimensional task, and the temporal organisation of speech as a key component of this task (i.e., the establishment of temporal boundaries and durations). As a result of this multidimensionality, text-to-speech systems have to accommodate a number of systematic transformations and computations at various levels. Our model of the temporal organisation of read speech in French and German emerges from a combination of quantitative and qualitative parameters, organised according to psycholinguistic and linguistic structures. (An ideal speech synthesiser would also take into account subphonemic as well as pragmatic parameters. However such systems are not yet available)

CiteSeerX

Serveur académique lausannois

CogPrints Cognitive Sciences Eprint Archive

Loanword adaptation as first-language phonological perception

Author: Boersma Paul
Hamann Silke
Publication venue
Publication date: 01/01/2009
Field of study

We show that loanword adaptation can be understood entirely in terms of phonological and phonetic comprehension and production mechanisms in the first language. We provide explicit accounts of several loanword adaptation phenomena (in Korean) in terms of an Optimality-Theoretic grammar model with the same three levels of representation that are needed to describe L1 phonology: the underlying form, the phonological surface form, and the auditory-phonetic form. The model is bidirectional, i.e., the same constraints and rankings are used by the listener and by the speaker. These constraints and rankings are the same for L1 processing and loanword adaptation

CiteSeerX

UvA-DARE

The cross-linguistic performance of word segmentation models over time.

Author: Andrew CAINES
Basbøll
Basbøll
Bernard
Bird
Braginsky
Emma ALTMANN-RICHER
Grønnum
Krogh
Ladefoged
MacWhinney
MacWhinney
Mampe
McCauley
Nespor
Paula BUTTERY
Zipf
Publication venue: J Child Lang
Publication date: 01/11/2019
Field of study

We select three word segmentation models with psycholinguistic foundations - transitional probabilities, the diphone-based segmenter, and PUDDLE - which track phoneme co-occurrence and positional frequencies in input strings, and in the case of PUDDLE build lexical and diphone inventories. The models are evaluated on caregiver utterances in 132 CHILDES corpora representing 28 languages and 11.9 m words. PUDDLE shows the best performance overall, albeit with wide cross-linguistic variation. We explore the reasons for this variation, fitting regression models to performance scores with linguistic properties which capture lexico-phonological characteristics of the input: word length, utterance length, diversity in the lexicon, the frequency of one-word utterances, the regularity of phoneme patterns at word boundaries, and the distribution of diphones in each language. These properties together explain four-tenths of the observed variation in segmentation performance, a strong outcome and a solid foundation for studying further variables which make the segmentation task difficult