4,192 research outputs found

    CDSD: Chinese Dysarthria Speech Database

    Full text link
    We present the Chinese Dysarthria Speech Database (CDSD) as a valuable resource for dysarthria research. This database comprises speech data from 24 participants with dysarthria. Among these participants, one recorded an additional 10 hours of speech data, while each recorded one hour, resulting in 34 hours of speech material. To accommodate participants with varying cognitive levels, our text pool primarily consists of content from the AISHELL-1 dataset and speeches by primary and secondary school students. When participants read these texts, they must use a mobile device or the ZOOM F8n multi-track field recorder to record their speeches. In this paper, we elucidate the data collection and annotation processes and present an approach for establishing a baseline for dysarthric speech recognition. Furthermore, we conducted a speaker-dependent dysarthric speech recognition experiment using an additional 10 hours of speech data from one of our participants. Our research findings indicate that, through extensive data-driven model training, fine-tuning limited quantities of specific individual data yields commendable results in speaker-dependent dysarthric speech recognition. However, we observe significant variations in recognition results among different dysarthric speakers. These insights provide valuable reference points for speaker-dependent dysarthric speech recognition.Comment: 9 pages, 3 figure

    Wavelet-based voice morphing

    Get PDF
    This paper presents a new multi-scale voice morphing algorithm. This algorithm enables a user to transform one person's speech pattern into another person's pattern with distinct characteristics, giving it a new identity, while preserving the original content. The voice morphing algorithm performs the morphing at different subbands by using the theory of wavelets and models the spectral conversion using the theory of Radial Basis Function Neural Networks. The results obtained on the TIMIT speech database demonstrate effective transformation of the speaker identity

    Design of a phonetic corpus for speech recognition in catalan

    Get PDF
    In this paper, we present the design of a corpus for speech recognition to be used for the recording of a speech database in Catalan. A previous database in Spanish was the reference in setting the specifications about the characteristics of the sentences and in the minimum number of units required. An analysis of unit frequencies were carried out in order to know which units were relevant for training and to compare the results with the figures from the designed corpus. Three different sub-corpora were generated, one for training, ...Peer ReviewedPostprint (published version

    Emovo Corpus: an Italian Emotional Speech Database

    Get PDF
    This article describes the first emotional corpus, named EMOVO, applicable to Italian language,. It is a database built from the voices of up to 6 actors who played 14 sentences simulating 6 emotional states (disgust, fear, anger, joy, surprise, sadness) plus the neutral state. These emotions are the well-known Big Six found in most of the literature related to emotional speech. The recordings were made with professional equipment in the Fondazione Ugo Bordoni laboratories. The paper also describes a subjective validation test of the corpus, based on emotion-discrimination of two sentences carried out by two different groups of 24 listeners. The test was successful because it yielded an overall recognition accuracy of 80%. It is observed that emotions less easy to recognize are joy and disgust, whereas the most easy to detect are anger, sadness and the neutral state

    Gender classification in two emotional speech database

    No full text
    Gender classification is a challenging problem, which finds applications in speaker indexing, speaker recognition, speaker diarization, annotation and retrieval of multimedia databases, voice synthesis, smart human-computer interaction, biometrics, social robots etc. Although it has been studied for more than thirty years, by no means it is a solved problem. Processing emotional speech in order to identify speakers gender makes the problem even more interesting. A large pool of 1379 features is created including 605 novel features. A branch and bound feature selection algorithm is applied to select a subset of 15 features among the 1379 originally extracted. Support vector machines with various kernels are tested as gender classifiers, when applied to two databases, namely: the Berlin database of Emotional Speech and the Danish Emotional Speech database. The reported classification results outperformthose obtained by state-of-the-art techniques, since a perfect classification accuracy is obtained. © 2008 IEEE
    corecore