2,203 research outputs found
VoiceBank-2023: A Multi-Speaker Mandarin Speech Corpus for Constructing Personalized TTS Systems for the Speech Impaired
Services of personalized TTS systems for the Mandarin-speaking speech
impaired are rarely mentioned. Taiwan started the VoiceBanking project in 2020,
aiming to build a complete set of services to deliver personalized Mandarin TTS
systems to amyotrophic lateral sclerosis patients. This paper reports the
corpus design, corpus recording, data purging and correction for the corpus,
and evaluations of the developed personalized TTS systems, for the VoiceBanking
project. The developed corpus is named after the VoiceBank-2023 speech corpus
because of its release year. The corpus contains 29.78 hours of utterances with
prompts of short paragraphs and common phrases spoken by 111 native Mandarin
speakers. The corpus is labeled with information about gender, degree of speech
impairment, types of users, transcription, SNRs, and speaking rates. The
VoiceBank-2023 is available by request for non-commercial use and welcomes all
parties to join the VoiceBanking project to improve the services for the speech
impaired.Comment: submitted to 26th International Conference of the ORIENTAL-COCOSD
System Fingerprint Recognition for Deepfake Audio: An Initial Dataset and Investigation
The malicious use of deep speech synthesis models may pose significant threat
to society. Therefore, many studies have emerged to detect the so-called
``deepfake audio". However, these studies focus on the binary detection of real
audio and fake audio. For some realistic application scenarios, it is needed to
know what tool or model generated the deepfake audio. This raises a question:
Can we recognize the system fingerprints of deepfake audio? Therefore, in this
paper, we propose a deepfake audio dataset for system fingerprint recognition
(SFR) and conduct an initial investigation. We collected the dataset from five
speech synthesis systems using the latest state-of-the-art deep learning
technologies, including both clean and compressed sets. In addition, to
facilitate the further development of system fingerprint recognition methods,
we give researchers some benchmarks that can be compared, and research
findings. The dataset will be publicly available.Comment: 12 pages, 3 figures. arXiv admin note: text overlap with
arXiv:2208.0964
Current trends in multilingual speech processing
In this paper, we describe recent work at Idiap Research Institute in the domain of multilingual speech processing and provide some insights into emerging challenges for the research community. Multilingual speech processing has been a topic of ongoing interest to the research community for many years and the field is now receiving renewed interest owing to two strong driving forces. Firstly, technical advances in speech recognition and synthesis are posing new challenges and opportunities to researchers. For example, discriminative features are seeing wide application by the speech recognition community, but additional issues arise when using such features in a multilingual setting. Another example is the apparent convergence of speech recognition and speech synthesis technologies in the form of statistical parametric methodologies. This convergence enables the investigation of new approaches to unified modelling for automatic speech recognition and text-to-speech synthesis (TTS) as well as cross-lingual speaker adaptation for TTS. The second driving force is the impetus being provided by both government and industry for technologies to help break down domestic and international language barriers, these also being barriers to the expansion of policy and commerce. Speech-to-speech and speech-to-text translation are thus emerging as key technologies at the heart of which lies multilingual speech processin
- …