Search CORE

3,383 research outputs found

The Unsupervised Acquisition of a Lexicon from Continuous Speech

Author: de Marcken Carl
Publication venue
Publication date: 01/01/1995
Field of study

We present an unsupervised learning algorithm that acquires a natural-language lexicon from raw speech. The algorithm is based on the optimal encoding of symbol sequences in an MDL framework, and uses a hierarchical representation of language that overcomes many of the problems that have stymied previous grammar-induction procedures. The forward mapping from symbol sequences to the speech stream is modeled using features based on articulatory gestures. We present results on the acquisition of lexicons and language models from raw speech, text, and phonetic transcripts, and demonstrate that our algorithm compares very favorably to other reported results with respect to segmentation performance and statistical efficiency.Comment: 27 page technical repor

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Recommended from our members

The Role of Speech Rhythm Sensitivity in Children's Reading Development

Author: Holliman Andrew John
Publication venue
Publication date: 01/01/2009
Field of study

This thesis examines whether speech rhythm sensitivity is related to children's reading development, phonological awareness, and non-speech rhythm sensitivity, whether children at risk of reading difficulties have a specific speech rhythm sensitivity deficit, and whether speech rhythm sensitivity is predictive of children's reading development over time. Study One investigated the relatedness of speech rhythm, non-speech rhythm, reading ability and phonological awareness. A hierarchical regression analysis revealed that non-speech rhythm sensitivity was unable to predict unique variance in reading attainment after controlling for speech rhythm sensitivity and phonological awareness. In contrast, sensitivity to speech rhythm was able to predict a significant amount of unique variance in reading attainment after age, vocabulary, phonological awareness, short-term memory, and non-speech rhythm had been accounted for. These results suggest that speech rhythm sensitivity is not merely an aspect of general phonological awareness or rhythmic appreciation; it is a skill that is explaining new variance in reading ability. Study Two investigated whether a measure of speech rhythm sensitivity administered to 5 to 7-year-old children could predict the different components of reading ability one year later. A series of hierarchical regression analyses revealed that speech rhythm sensitivity was able to predict a significant amount of unique variance in word reading, reading comprehension, and the phrasing component of a reading fluency measure after controlling for receptive vocabulary, age and phonological awareness. Study Three investigated whether apparent speech rhythm sensitivity deficits in young poor readers represent a specific deficit in these children who were at risk of reading difficulties. It was found that after controlling for receptive vocabulary and phonological awareness, the 'at risk' children were outperformed by their chronological-age matched controls. but not by their reading-age matched controls on measures of speech rhythm sensitivity. This is suggestive of a maturational lag as opposed to a specific deficit in speech rhythm sensitivity. The overall findings from these concurrent, longitudinal, and cross-sectional data suggest that speech rhythm sensitivity is an important, yet neglected aspect of English-speaking children's phonological representations, which needs to be incorporated into theoretical accounts of reading development

Open Research Online (The Open University)

OpenGrey Repository

Are words easier to learn from infant- than adult-directed speech? A quantitative corpus-based investigation

Author: Cristia Alejandrina
Dupoux Emmanuel
Guevara-Rukoz Adriana
Ludusan Bogdan
Martin Andrew
Mazuka Reiko
Thiollière Roland
Publication venue
Publication date: 23/12/2017
Field of study

We investigate whether infant-directed speech (IDS) could facilitate word form learning when compared to adult-directed speech (ADS). To study this, we examine the distribution of word forms at two levels, acoustic and phonological, using a large database of spontaneous speech in Japanese. At the acoustic level we show that, as has been documented before for phonemes, the realizations of words are more variable and less discriminable in IDS than in ADS. At the phonological level, we find an effect in the opposite direction: the IDS lexicon contains more distinctive words (such as onomatopoeias) than the ADS counterpart. Combining the acoustic and phonological metrics together in a global discriminability score reveals that the bigger separation of lexical categories in the phonological space does not compensate for the opposite effect observed at the acoustic level. As a result, IDS word forms are still globally less discriminable than ADS word forms, even though the effect is numerically small. We discuss the implication of these findings for the view that the functional role of IDS is to improve language learnability.Comment: Draf

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

A Review of Accent-Based Automatic Speech Recognition Models for E-Learning Environment

Author: Omojokun Gabriel Aju
Veronica Ijebusomma Osubor
Publication venue: Covenant University, Ota, Nigeria
Publication date: 16/12/2022
Field of study

The adoption of electronics learning (e-learning) as a method of disseminating knowledge in the global educational system is growing at a rapid rate, and has created a shift in the knowledge acquisition methods from the conventional classrooms and tutors to the distributed e-learning technique that enables access to various learning resources much more conveniently and flexibly. However, notwithstanding the adaptive advantages of learner-centric contents of e-learning programmes, the distributed e-learning environment has unconsciously adopted few international languages as the languages of communication among the participants despite the various accents (mother language influence) among these participants. Adjusting to and accommodating these various accents has brought about the introduction of accents-based automatic speech recognition into the e-learning to resolve the effects of the accent differences. This paper reviews over 50 research papers to determine the development so far made in the design and implementation of accents-based automatic recognition models for the purpose of e-learning between year 2001 and 2021. The analysis of the review shows that 50% of the models reviewed adopted English language, 46.50% adopted the major Chinese and Indian languages and 3.50% adopted Swedish language as the mode of communication. It is therefore discovered that majority of the ASR models are centred on the European, American and Asian accents, while unconsciously excluding the various accents peculiarities associated with the less technologically resourced continents

Covenant Journals (Covenant University)

Phoneme and sentence-level ensembles for speech recognition

Author: Bengio Samy
Dimitrakakis Christos
Publication venue
Publication date: 01/01/2011
Field of study

We address the question of whether and how boosting and bagging can be used for speech recognition. In order to do this, we compare two different boosting schemes, one at the phoneme level and one at the utterance level, with a phoneme-level bagging scheme. We control for many parameters and other choices, such as the state inference scheme used. In an unbiased experiment, we clearly show that the gain of boosting methods compared to a single hidden Markov model is in all cases only marginal, while bagging significantly outperforms all other methods. We thus conclude that bagging methods, which have so far been overlooked in favour of boosting, should be examined more closely as a potentially useful ensemble learning technique for speech recognition

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Chalmers Research

Hochschulschriftenserver - Universität Frankfurt am Main

Unsupervised Lexicon Discovery from Acoustic Input

Author: Glass James R.
Lee Chia-ying
O'Donnell Timothy John
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/02/2015
Field of study

We present a model of unsupervised phonological lexicon discovery -- the problem of simultaneously learning phoneme-like and word-like units from acoustic input. Our model builds on earlier models of unsupervised phone-like unit discovery from acoustic data (Lee and Glass, 2012), and unsupervised symbolic lexicon discovery using the Adaptor Grammar framework (Johnson et al., 2006), integrating these earlier approaches using a probabilistic model of phonological variation. We show that the model is competitive with state-of-the-art spoken term discovery systems, and present analyses exploring the model's behavior and the kinds of linguistic structures it learns

CiteSeerX

DSpace@MIT

Subword lexical modelling for speech recognition

Author: Lau Raymond, 1971-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1998
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.Includes bibliographical references (p. 155-160).by Raymond Lau.Ph.D

DSpace@MIT

An acoustic-phonetic approach in automatic Arabic speech recognition

Author: Marwan Al-Zabibi (7203125)
Publication venue
Publication date: 01/01/1990
Field of study

In a large vocabulary speech recognition system the broad phonetic classification technique is used instead of detailed phonetic analysis to overcome the variability in the acoustic realisation of utterances. The broad phonetic description of a word is used as a means of lexical access, where the lexicon is structured into sets of words sharing the same broad phonetic labelling. This approach has been applied to a large vocabulary isolated word Arabic speech recognition system. Statistical studies have been carried out on 10,000 Arabic words (converted to phonemic form) involving different combinations of broad phonetic classes. Some particular features of the Arabic language have been exploited. The results show that vowels represent about 43% of the total number of phonemes. They also show that about 38% of the words can uniquely be represented at this level by using eight broad phonetic classes. When introducing detailed vowel identification the percentage of uniquely specified words rises to 83%. These results suggest that a fully detailed phonetic analysis of the speech signal is perhaps unnecessary. In the adopted word recognition model, the consonants are classified into four broad phonetic classes, while the vowels are described by their phonemic form. A set of 100 words uttered by several speakers has been used to test the performance of the implemented approach. In the implemented recognition model, three procedures have been developed, namely voiced-unvoiced-silence segmentation, vowel detection and identification, and automatic spectral transition detection between phonemes within a word. The accuracy of both the V-UV-S and vowel recognition procedures is almost perfect. A broad phonetic segmentation procedure has been implemented, which exploits information from the above mentioned three procedures. Simple phonological constraints have been used to improve the accuracy of the segmentation process. The resultant sequence of labels are used for lexical access to retrieve the word or a small set of words sharing the same broad phonetic labelling. For the case of having more than one word-candidates, a verification procedure is used to choose the most likely one

Loughborough University Institutional Repository