Search CORE

896 research outputs found

Articulatory features for conversational speech recognition

Author: Metze Florian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2005
Field of study

Conversational Arabic Automatic Speech Recognition

Author: Al-Shareef Sarah
Publication venue: 'University of Sheffield Conference Proceedings'
Publication date: 01/05/2015
Field of study

Colloquial Arabic (CA) is the set of spoken variants of modern Arabic that exist in the form of regional dialects and are considered generally to be mother-tongues in those regions. CA has limited textual resource because it exists only as a spoken language and without a standardised written form. Normally the modern standard Arabic (MSA) writing convention is employed that has limitations in phonetically representing CA. Without phonetic dictionaries the pronunciation of CA words is ambiguous, and can only be obtained through word and/or sentence context. Moreover, CA inherits the MSA complex word structure where words can be created from attaching affixes to a word. In automatic speech recognition (ASR), commonly used approaches to model acoustic, pronunciation and word variability are language independent. However, one can observe significant differences in performance between English and CA, with the latter yielding up to three times higher error rates. This thesis investigates the main issues for the under-performance of CA ASR systems. The work focuses on two directions: first, the impact of limited lexical coverage, and insufficient training data for written CA on language modelling is investigated; second, obtaining better models for the acoustics and pronunciations by learning to transfer between written and spoken forms. Several original contributions result from each direction. Using data-driven classes from decomposed text are shown to reduce out-of-vocabulary rate. A novel colloquialisation system to import additional data is introduced; automatic diacritisation to restore the missing short vowels was found to yield good performance; and a new acoustic set for describing CA was defined. Using the proposed methods improved the ASR performance in terms of word error rate in a CA conversational telephone speech ASR task

White Rose E-theses Online

Feature extraction and event detection for automatic speech recognition

Author: Stouten Frederik
Publication venue: Ghent University. Faculty of Engineering
Publication date: 01/01/2008
Field of study

Ghent University Academic Bibliography

Sound structure and sound change: A modeling approach

Author: Morley Rebecca L.
Publication venue: Language Science Press
Publication date: 23/09/2019
Field of study

Research in linguistics, as in most other scientific domains, is usually approached in a modular way – narrowing the domain of inquiry in order to allow for increased depth of study. This is necessary and productive for a topic as wide-ranging and complex as human language. However, precisely because language is a complex system, tied to perception, learning, memory, and social organization, the assumption of modularity can also be an obstacle to understanding language at a deeper level. This book examines the consequences of enforcing non-modularity along two dimensions: the temporal, and the cognitive. Along the temporal dimension, synchronic and diachronic domains are linked by the requirement that sound changes must lead to viable, stable language states. Along the cognitive dimension, sound change and variation are linked to speech perception and production by requiring non-trivial transformations between acoustic and articulatory representations. The methodological focus of this work is on computational modeling. By formalising and implementing theoretical accounts, modeling can expose theoretical gaps and covert assumptions. To do so, it is necessary to formally assess the functional equivalence of specific implementational choices, as well as their mapping to theoretical structures. This book applies this analytic approach to a series of implemented models of sound change. As theoretical inconsistencies are discovered, possible solutions are proposed, incrementally constructing a set of sufficient properties for a working model. Because internal theoretical consistency is enforced, this model corresponds to an explanatorily adequate theory. And because explicit links between modules are required, this is a theory, not only of sound change, but of many aspects of phonological competence. The book highlights two aspects of modeling work that receive relatively little attention: the formal mapping from model to theory, and the scalability of demonstration models. Focusing on these aspects of modeling makes it clear that any theory of sound change in the specific is impossible without a more general theory of language: of the relationship between perception and production, the relationship between phonetics and phonology, the learning of linguistic units, and the nature of underlying representations. Theories of sound change that do not explicitly address these aspects of language are making tacit, untested assumptions about their properties. Addressing so many aspects of language may seem to complicate the linguist's task. However, as this book shows, it actually helps impose boundary conditions of ecological validity that reduce the theoretical search space

Language Science Press