1,280 research outputs found

    Phonetic variability and grammatical knowledge: an articulatory study of Korean place assimilation.

    Get PDF
    The study reported here uses articulatory data to investigate Korean place assimilation of coronal stops followed by labial or velar stops, both within words and across words. The results show that this place-assimilation process is highly variable, both within and across speakers, and is also sensitive to factors such as the place of articulation of the following consonant, the presence of a word boundary and, to some extent, speech rate. Gestures affected by the process are generally reduced categorically (deleted), while sporadic gradient reduction of gestures is also observed. We further compare the results for coronals to our previous findings on the assimilation of labials, discussing implications of the results for grammatical models of phonological/phonetic competence. The results suggest that speakers’ language-particular knowledge of place assimilation has to be relatively detailed and context-sensitive, and has to encode systematic regularities about its obligatory/variable application as well as categorical/gradient realisation

    Experimental phonetic study of the timing of voicing in English obstruents

    Get PDF
    The treatment given to the timing of voicing in three areas of phonetic research -- phonetic taxonomy, speech production modelling, and speech synthesis -- Is considered in the light of an acoustic study of the timing of voicing in British English obstruents. In each case, it is found to be deficient. The underlying cause is the difficulty in applying a rigid segmental approach to an aspect of speech production characterised by important inter-articulator asynchronies, coupled to the limited quantitative data available concerning the systematic properties of the timing of voicing in languages. It is argued that the categories and labels used to describe the timing of voicing In obstruents are Inadequate for fulfilling the descriptive goals of phonetic theory. One possible alternative descriptive strategy is proposed, based on incorporating aspects of the parametric organisation of speech into the descriptive framework. Within the domain of speech production modelling, no satisfactory account has been given of fine-grained variability of the timing of voicing not capable of explanation in terms of general properties of motor programming and utterance execution. The experimental results support claims In the literature that the phonetic control of an utterance may be somewhat less abstract than has been suggestdd in some previous reports. A schematic outline is given, of one way in which the timing of voicing could be controlled in speech production. The success of a speech synthesis-by-rule system depends to a great extent on a comprehensive encoding of the systematic phonetic characteristics of the target language. Only limited success has been achieved in the past thirty years. A set of rules is proposed for generating more naturalistic patterns of voicing in obstruents, reflecting those observed in the experimental component of this study. Consideration Is given to strategies for evaluating the effect of fine-grained phonetic rules In speech synthesis

    Multi-View Multi-Task Representation Learning for Mispronunciation Detection

    Full text link
    The disparity in phonology between learner's native (L1) and target (L2) language poses a significant challenge for mispronunciation detection and diagnosis (MDD) systems. This challenge is further intensified by lack of annotated L2 data. This paper proposes a novel MDD architecture that exploits multiple `views' of the same input data assisted by auxiliary tasks to learn more distinctive phonetic representation in a low-resource setting. Using the mono- and multilingual encoders, the model learn multiple views of the input, and capture the sound properties across diverse languages and accents. These encoded representations are further enriched by learning articulatory features in a multi-task setup. Our reported results using the L2-ARCTIC data outperformed the SOTA models, with a phoneme error rate reduction of 11.13% and 8.60% and absolute F1 score increase of 5.89%, and 2.49% compared to the single-view mono- and multilingual systems, with a limited L2 dataset.Comment: 5 page

    PRESENCE: A human-inspired architecture for speech-based human-machine interaction

    No full text
    Recent years have seen steady improvements in the quality and performance of speech-based human-machine interaction driven by a significant convergence in the methods and techniques employed. However, the quantity of training data required to improve state-of-the-art systems seems to be growing exponentially and performance appears to be asymptotic to a level that may be inadequate for many real-world applications. This suggests that there may be a fundamental flaw in the underlying architecture of contemporary systems, as well as a failure to capitalize on the combinatorial properties of human spoken language. This paper addresses these issues and presents a novel architecture for speech-based human-machine interaction inspired by recent findings in the neurobiology of living systems. Called PRESENCE-"PREdictive SENsorimotor Control and Emulation" - this new architecture blurs the distinction between the core components of a traditional spoken language dialogue system and instead focuses on a recursive hierarchical feedback control structure. Cooperative and communicative behavior emerges as a by-product of an architecture that is founded on a model of interaction in which the system has in mind the needs and intentions of a user and a user has in mind the needs and intentions of the system

    Phonetic Segments and the Organization of Speech

    Get PDF
    According to mainstream linguistic phonetics, speech can be modeled as a string of discrete sound segments or “phones” drawn from a universal phonetic inventory. Recent work has argued that a mature phonetics should refrain from theorizing about speech and speech processing using sound segments, and that the phone concept should be eliminated from linguistic theory. The paper lays out the tenets of the phone methodology and evaluates its prospects in light of the eliminativist arguments. I claim that the eliminativist arguments fail to show that the phone concept should be eliminated from linguistic theory

    Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling

    Get PDF
    Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness

    Advancing Electromyographic Continuous Speech Recognition: Signal Preprocessing and Modeling

    Get PDF
    Speech is the natural medium of human communication, but audible speech can be overheard by bystanders and excludes speech-disabled people. This work presents a speech recognizer based on surface electromyography, where electric potentials of the facial muscles are captured by surface electrodes, allowing speech to be processed nonacoustically. A system which was state-of-the-art at the beginning of this book is substantially improved in terms of accuracy, flexibility, and robustness

    Voice and speech functions (B310-B340)

    Get PDF
    The International Classification of Functioning, Disability and Health for Children and Youth (ICF-CY) domain ‘voice and speech functions’ (b3) includes production and quality of voice (b310), articulation functions (b320), fluency and rhythm of speech (b330) and alternative vocalizations (b340, such as making musical sounds and crying, which are not reviewed here)
    • …
    corecore