Search CORE

1,703 research outputs found

Mage - Reactive articulatory feature control of HMM-based parametric speech synthesis

Author: Astrinaki Maria
Dutoit Thierry
King Simon
Ling Zhen-Hua
Moinet Alexis
Richmond Korin
Yamagishi Junichi
Publication venue
Publication date: 01/01/2013
Field of study

In this paper, we present the integration of articulatory control into MAGE, a framework for realtime and interactive (reactive) parametric speech synthesis using hidden Markov models (HMMs). MAGE is based on the speech synthesis engine from HTS and uses acoustic features (spectrum and f0) to model and synthesize speech. In this work, we replace the standard acoustic models with models combining acoustic and articulatory features, such as tongue, lips and jaw positions. We then use feature-space-switched articulatory-to-acoustic regression matrices to enable us to control the spectral acoustic features by manipulating the articulatory features. Combining this synthesis model with MAGE allows us to interactively and intuitively modify phones synthesized in real time, for example transforming one phone into another, by controlling the configuration of the articulators in a visual display. Index Terms: speech synthesis, reactive, articulators 1

CiteSeerX

Edinburgh Research Explorer

Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed

The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Individual differences in speech production and maximum speech performance

Author: Shen C.
Publication venue: Radboud University Nijmegen
Publication date: 01/01/2022
Field of study

MPG.PuRe

Multi-View Multi-Task Representation Learning for Mispronunciation Detection

Author: Ali Ahmed
Chowdhury Shammur Absar
Kheir Yassine El
Publication venue
Publication date: 02/06/2023
Field of study

The disparity in phonology between learner's native (L1) and target (L2) language poses a significant challenge for mispronunciation detection and diagnosis (MDD) systems. This challenge is further intensified by lack of annotated L2 data. This paper proposes a novel MDD architecture that exploits multiple `views' of the same input data assisted by auxiliary tasks to learn more distinctive phonetic representation in a low-resource setting. Using the mono- and multilingual encoders, the model learn multiple views of the input, and capture the sound properties across diverse languages and accents. These encoded representations are further enriched by learning articulatory features in a multi-task setup. Our reported results using the L2-ARCTIC data outperformed the SOTA models, with a phoneme error rate reduction of 11.13% and 8.60% and absolute F1 score increase of 5.89%, and 2.49% compared to the single-view mono- and multilingual systems, with a limited L2 dataset.Comment: 5 page

arXiv.org e-Print Archive

Stages of lexical access

Author: Levelt W.J.M.
Schriefers H.
Publication venue: Dordrecht [etc.] : Martinus Nijhoff
Publication date: 01/01/1987
Field of study

Contains fulltext : 5660.pdf (publisher's version ) (Open Access

Radboud Repository

MPG.PuRe

Lexical Access Model for Italian -- Modeling human speech processing: identification of words in running speech toward lexical access based on the detection of landmarks and other acoustic cues to features

Author: Arango Javier
Chan Ian
Choi Jeung-Yoon
De Nardis Luca
DeCaprio Alec
Di Benedetto Maria-Gabriella
Shattuck-Hufnagel Stefanie
Publication venue
Publication date: 01/01/2021
Field of study

Modelling the process that a listener actuates in deriving the words intended by a speaker requires setting a hypothesis on how lexical items are stored in memory. This work aims at developing a system that imitates humans when identifying words in running speech and, in this way, provide a framework to better understand human speech processing. We build a speech recognizer for Italian based on the principles of Stevens' model of Lexical Access in which words are stored as hierarchical arrangements of distinctive features (Stevens, K. N. (2002). "Toward a model for lexical access based on acoustic landmarks and distinctive features," J. Acoust. Soc. Am., 111(4):1872-1891). Over the past few decades, the Speech Communication Group at the Massachusetts Institute of Technology (MIT) developed a speech recognition system for English based on this approach. Italian will be the first language beyond English to be explored; the extension to another language provides the opportunity to test the hypothesis that words are represented in memory as a set of hierarchically-arranged distinctive features, and reveal which of the underlying mechanisms may have a language-independent nature. This paper also introduces a new Lexical Access corpus, the LaMIT database, created and labeled specifically for this work, that will be provided freely to the speech research community. Future developments will test the hypothesis that specific acoustic discontinuities - called landmarks - that serve as cues to features, are language independent, while other cues may be language-dependent, with powerful implications for understanding how the human brain recognizes speech.Comment: Submitted to Language and Speech, 202

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Stages of lexical access

Author: Levelt W.
Schriefers H.
Publication venue
Publication date: 01/01/1987
Field of study

MPG.PuRe

Remembering with your tongue: articulatory embodiment in memory and speech

Author: St John Alexander
Publication venue
Publication date: 01/01/2015
Field of study

Articulatory factors are typically relegated to a peripheral role in theoretical accounts of cognitive function. For example, verbal short-term memory functions are thought to be serviced by dedicated mechanisms that operate on abstract phonological (i.e., non-articulatory) items. An alternative tested here is that memory functions are supported by motor control processes that embody articulatory detail. To provide evidence for this viewpoint, this thesis focuses on the influence of articulatory effort-minimisation processes on memory and speech

Online Research @ Cardiff

Mage-HMM-based speech synthesis reactively controlled by the articulators

Author: Astrinaki Maria
Dutoit Thierry
King Simon
Ling Zhen-Hua
Moinet Alexis
Richmond Korin
Yamagishi Junichi
Publication venue
Publication date: 01/09/2013
Field of study

Edinburgh Research Explorer

Proceedings of the 18th International Congress of Phonetic Sciences (ICPhS 2015) Satellite Event: The Evolution of Phonetic Capabilities: Causes constraints, consequences

Author
Publication venue: ICPhS
Publication date: 12/08/2015
Field of study

MPG.PuRe