322 research outputs found

    Towards Personalized Synthesized Voices for Individuals with Vocal Disabilities: Voice Banking and Reconstruction

    Get PDF
    When individuals lose the ability to produce their own speech, due to degenerative diseases such as motor neurone disease (MND) or Parkinson’s, they lose not only a functional means of communication but also a display of their individual and group identity. In order to build personalized synthetic voices, attempts have been made to capture the voice before it is lost, using a process known as voice banking. But, for some patients, the speech deterioration frequently coincides or quickly follows diagnosis. Using HMM-based speech synthesis, it is now possible to build personalized synthetic voices with minimal data recordings and even disordered speech. The power of this approach is that it is possible to use the patient’s recordings to adapt existing voice models pre-trained on many speakers. When the speech has begun to deteriorate, the adapted voice model can be further modified in order to compensate for the disordered characteristics found in the patient’s speech. The University of Edinburgh has initiated a project for voice banking and reconstruction based on this speech synthesis technology. At the current stage of the project, more than fifteen patients with MND have already been recorded and five of them have been delivered a reconstructed voice. In this paper, we present an overview of the project as well as subjective assessments of the reconstructed voices and feedback from patients and their families

    Analysis of Speaker Clustering Strategies for HMM-Based Speech Synthesis

    Get PDF
    This paper describes a method for speaker clustering, with the application of building average voice models for speakeradaptive HMM-based speech synthesis that are a good basis for adapting to specific target speakers. Our main hypothesis is that using perceptually similar speakers to build the average voice model will be better than use unselected speakers, even if the amount of data available from perceptually similar speakers is smaller. We measure the perceived similarities among a group of 30 female speakers in a listening test and then apply multiple linear regression to automatically predict these listener judgements of speaker similarity and thus to identify similar speakers automatically. We then compare a variety of average voice models trained on either speakers who were perceptually judged to be similar to the target speaker, or speakers selected by the multiple linear regression, or a large global set of unselected speakers. We find that the average voice model trained on perceptually similar speakers provides better performance than the global model, even though the latter is trained on more data, confirming our main hypothesis. However, the average voice model using speakers selected automatically by the multiple linear regression does not reach the same level of performance. Index Terms: Statistical parametric speech synthesis, hidden Markov models, speaker adaptatio

    The DeepZen Speech Synthesis System for Blizzard Challenge 2023

    Full text link
    This paper describes the DeepZen text to speech (TTS) system for Blizzard Challenge 2023. The goal of this challenge is to synthesise natural and high-quality speech in French, from a large monospeaker dataset (hub task) and from a smaller dataset by speaker adaptation (spoke task). We participated to both tasks with the same model architecture. Our approach has been to use an auto-regressive model, which retains an advantage for generating natural sounding speech but to improve prosodic control in several ways. Similarly to non-attentive Tacotron, the model uses a duration predictor and gaussian upsampling at inference, but with a simpler unsupervised training. We also model the speaking style at both sentence and word levels by extracting global and local style tokens from the reference speech. At inference, the global and local style tokens are predicted from a BERT model run on text. This BERT model is also used to predict specific pronunciation features like schwa elision and optional liaisons. Finally, a modified version of HifiGAN trained on a large public dataset and fine-tuned on the target voices is used to generate speech waveform. Our team is identified as O in the the Blizzard evaluation and MUSHRA test results show that our system performs second ex aequo in both hub task (median score of 0.75) and spoke task (median score of 0.68), over 18 and 14 participants, respectively.Comment: Blizzard Challenge 202

    Zofia Romanowiczowa (1922–2010)

    Get PDF
    Brak

    Prosodic control of unit-selection speech synthesis: A probabilistic approach

    Get PDF

    Feeling, Knowledge, Self-Preservation: Audre Lorde’s Oppositional Agency and Some Implications for Ethics

    Get PDF
    Throughout her work, Audre Lorde maintains that her self-preservation in the face of oppression depends on acting from the recognition and valorization of her feelings as a deep source of knowledge. This claim, taken as a portrayal of agency, poses challenges to standard positions in ethics, epistemology, and moral psychology. This article examines the oppositional agency articulated by Lorde’s thought, locating feeling, poetry, and the power she calls “the erotic” within her avowed project of self-preservation. It then explores the implications of taking seriously Lorde’s account, particularly for theorists examining ethics and epistemology under nonideal social conditions. For situations of sexual intimacy, for example, Lorde’s account unsettles prevailing assumptions about the role of consent in responsibility between sexual partners. I argue that obligations to solicit consent and respect refusal are not sufficient to acknowledge the value of agency in intimate encounters when agency is oppositional in the way Lorde describes

    A Project Based Approach to Statistics and Data Science

    Full text link
    In an increasingly data-driven world, facility with statistics is more important than ever for our students. At institutions without a statistician, it often falls to the mathematics faculty to teach statistics courses. This paper presents a model that a mathematician asked to teach statistics can follow. This model entails connecting with faculty from numerous departments on campus to develop a list of topics, building a repository of real-world datasets from these faculty, and creating projects where students interface with these datasets to write lab reports aimed at consumers of statistics in other disciplines. The end result is students who are well prepared for interdisciplinary research, who are accustomed to coping with the idiosyncrasies of real data, and who have sharpened their technical writing and speaking skills
    corecore