3,702 research outputs found
The Validation of Speech Corpora
1.2 Intended audience........................
Towards Personalized Synthesized Voices for Individuals with Vocal Disabilities: Voice Banking and Reconstruction
When individuals lose the ability to produce their own speech, due to degenerative diseases such as motor neurone disease (MND) or Parkinson’s, they lose not only a functional means of communication but also a display of their individual and group identity. In order to build personalized synthetic voices, attempts have been made to capture the voice before it is lost, using a process known as voice banking. But, for some patients, the speech deterioration frequently coincides or quickly follows diagnosis. Using HMM-based speech synthesis, it is now possible to build personalized synthetic voices with minimal data recordings and even disordered speech. The power of this approach is that it is possible to use the patient’s recordings to adapt existing voice models pre-trained on many speakers. When the speech has begun to deteriorate, the adapted voice model can be further modified in order to compensate for the disordered characteristics found in the patient’s speech. The University of Edinburgh has initiated a project for voice banking and reconstruction based on this speech synthesis technology. At the current stage of the project, more than fifteen patients with MND have already been recorded and five of them have been delivered a reconstructed voice. In this paper, we present an overview of the project as well as subjective assessments of the reconstructed voices and feedback from patients and their families
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Coco-Nut: Corpus of Japanese Utterance and Voice Characteristics Description for Prompt-based Control
In text-to-speech, controlling voice characteristics is important in
achieving various-purpose speech synthesis. Considering the success of
text-conditioned generation, such as text-to-image, free-form text instruction
should be useful for intuitive and complicated control of voice
characteristics. A sufficiently large corpus of high-quality and diverse voice
samples with corresponding free-form descriptions can advance such control
research. However, neither an open corpus nor a scalable method is currently
available. To this end, we develop Coco-Nut, a new corpus including diverse
Japanese utterances, along with text transcriptions and free-form voice
characteristics descriptions. Our methodology to construct this corpus consists
of 1) automatic collection of voice-related audio data from the Internet, 2)
quality assurance, and 3) manual annotation using crowdsourcing. Additionally,
we benchmark our corpus on the prompt embedding model trained by contrastive
speech-text learning.Comment: Submitted to ASRU202
- …