26,960 research outputs found

    Voice-preserving Zero-shot Multiple Accent Conversion

    Full text link
    Most people who have tried to learn a foreign language would have experienced difficulties understanding or speaking with a native speaker's accent. For native speakers, understanding or speaking a new accent is likewise a difficult task. An accent conversion system that changes a speaker's accent but preserves that speaker's voice identity, such as timbre and pitch, has the potential for a range of applications, such as communication, language learning, and entertainment. Existing accent conversion models tend to change the speaker identity and accent at the same time. Here, we use adversarial learning to disentangle accent dependent features while retaining other acoustic characteristics. What sets our work apart from existing accent conversion models is the capability to convert an unseen speaker's utterance to multiple accents while preserving its original voice identity. Subjective evaluations show that our model generates audio that sound closer to the target accent and like the original speaker.Comment: Submitted to IEEE ICASSP 202

    Evaluating Methods for Ground-Truth-Free Foreign Accent Conversion

    Full text link
    Foreign accent conversion (FAC) is a special application of voice conversion (VC) which aims to convert the accented speech of a non-native speaker to a native-sounding speech with the same speaker identity. FAC is difficult since the native speech from the desired non-native speaker to be used as the training target is impossible to collect. In this work, we evaluate three recently proposed methods for ground-truth-free FAC, where all of them aim to harness the power of sequence-to-sequence (seq2seq) and non-parallel VC models to properly convert the accent and control the speaker identity. Our experimental evaluation results show that no single method was significantly better than the others in all evaluation axes, which is in contrast to conclusions drawn in previous studies. We also explain the effectiveness of these methods with the training input and output of the seq2seq model and examine the design choice of the non-parallel VC model, and show that intelligibility measures such as word error rates do not correlate well with subjective accentedness. Finally, our implementation is open-sourced to promote reproducible research and help future researchers improve upon the compared systems.Comment: Accepted to the 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). Demo page: https://unilight.github.io/Publication-Demos/publications/fac-evaluate. Code: https://github.com/unilight/seq2seq-v

    The new accent technologies:recognition, measurement and manipulation of accented speech

    Get PDF

    Intonation in neurogenic foreign accent syndrome

    Get PDF
    Foreign accent syndrome (FAS) is a motor speech disorder in which changes to segmental as well as suprasegmental aspects lead to the perception of a foreign accent in speech. This paper focuses on one suprasegmental aspect, namely that of intonation. It provides an in-depth analysis of the intonation system of four speakers with FAS with the aim of establishing the intonational changes that have taken place as well as their underlying origin. Using the autosegmental-metrical framework of intonational analysis, four different levels of intonation, i.e. inventory, distribution, realisation and function, were examined. Results revealed that the speakers with FAS had the same structural inventory at their disposal as the control speakers, but that they differed from the latter in relation to the distribution, implementation and functional use of their inventory. In contrast to previous findings, the current results suggest that these intonational changes cannot be entirely attributed to an underlying intonation deficit but also reflect secondary manifestations of physiological constraints affecting speech support systems and compensatory strategies. These findings have implications for the debate surrounding intonational deficits in FAS, advocating a reconsideration of current assumptions regarding the underlying nature of intonation impairment in FAS

    Multidisciplinary Assessment and Diagnosis of Conversion Disorder in a Patient with Foreign Accent Syndrome

    Get PDF
    Multiple reports have described patients with disordered articulation and prosody, often following acute aphasia, dysarthria, or apraxia of speech, which results in the perception by listeners of a foreign-like accent. These features led to the term foreign accent syndrome (FAS), a speech disorder with perceptual features that suggest an indistinct, non-native speaking accent. Also correctly known as psuedoforeign accent, the speech does not typically match a specific foreign accent, but is rather a constellation of speech features that result in the perception of a foreign accent by listeners. The primary etiologies of FAS are cerebrovascular accidents or traumatic brain injuries which affect cortical and subcortical regions critical to expressive speech and language production. Far fewer cases of FAS associated with psychiatric conditions have been reported. We will present the clinical history, neurological examination, neuropsychological assessment, cognitive-behavioral and biofeedback assessments, and motor speech examination of a patient with FAS without a known vascular, traumatic, or infectious precipitant. Repeated multidisciplinary examinations of this patient provided convergent evidence in support of FAS secondary to conversion disorder. We discuss these findings and their implications for evaluation and treatment of rare neurological and psychiatric conditions

    ACCDIST: A Metric for comparing speakers' accents

    Get PDF
    This paper introduces a new metric for the quantitative assessment of the similarity of speakers' accents. The ACCDIST metric is based on the correlation of inter-segment distance tables across speakers or groups. Basing the metric on segment similarity within a speaker ensures that it is sensitive to the speaker's pronunciation system rather than to his or her voice characteristics. The metric is shown to have an error rate of only 11% on the accent classification of speakers into 14 English regional accents of the British Isles, half the error rate of a metric based on spectral information directly. The metric may also be useful for cluster analysis of accent groups
    • …
    corecore