2,971 research outputs found

    Spanish Expressive Voices: corpus for emotion research in Spanish

    Get PDF
    A new emotional multimedia database has been recorded and aligned. The database comprises speech and video recordings of one actor and one actress simulating a neutral state and the Big Six emotions: happiness, sadness, anger, surprise, fear and disgust. Due to a careful design and its size (more than 100 minutes per emotion), the recorded database allows comprehensive studies on emotional speech synthesis, prosodic modelling, speech conversion, far-field speech recognition and speech and video-based emotion identification. The database has been automatically labelled for prosodic purposes (5% was manually revised). The whole database has been validated thorough objective and perceptual tests, achieving a validation score as high as 89%

    Review of Research on Speech Technology: Main Contributions From Spanish Research Groups

    Get PDF
    In the last two decades, there has been an important increase in research on speech technology in Spain, mainly due to a higher level of funding from European, Spanish and local institutions and also due to a growing interest in these technologies for developing new services and applications. This paper provides a review of the main areas of speech technology addressed by research groups in Spain, their main contributions in the recent years and the main focus of interest these days. This description is classified in five main areas: audio processing including speech, speaker characterization, speech and language processing, text to speech conversion and spoken language applications. This paper also introduces the Spanish Network of Speech Technologies (RTTH. Red Temática en Tecnologías del Habla) as the research network that includes almost all the researchers working in this area, presenting some figures, its objectives and its main activities developed in the last years

    Speech impairment in Parkinson’s disease: acoustic analysis of unvoiced consonants in Italian native speakers

    Get PDF
    The study of the influence of Parkinson’s Disease (PD) on vocal signals has received much attention over the last decades. Increasing interest has been devoted to articulation and acoustic characterization of different phonemes. Method: In this study we propose the analysis of the Transition Regions (TR) of specific phonetic groups to model the loss of motor control and the difficulty to start/stop movements, typical of PD patients. For this purpose, we extracted 60 features from pre-processed vocal signals and used them as input to several machine learning models. We employed two data sets, containing samples from Italian native speakers, for training and testing. The first dataset - 28 PD patients and 22 Healthy Control (HC) - included recordings in optimal conditions, while in the second one - 26 PD patients and 18 HC- signals were collected at home, using non-professional microphones. Results: We optimized two support vector machine models for the application in controlled noise conditions and home environments, achieving 98% ± 1.1 and 88% ± 2.8 accuracy in 10-fold cross-validation, respectively. Conclusion: This study confirms the high capability of the TRs to discriminate between PD patients and healthy controls, and the feasibility of automatic PD assessment using voice recordings. Moreover, the promising performance of the implemented model discloses the option of voice processing using low-cost devices and domestic recordings, possibly self-managed by the patients themselves

    Emotion recognition based on the energy distribution of plosive syllables

    Get PDF
    We usually encounter two problems during speech emotion recognition (SER): expression and perception problems, which vary considerably between speakers, languages, and sentence pronunciation. In fact, finding an optimal system that characterizes the emotions overcoming all these differences is a promising prospect. In this perspective, we considered two emotional databases: Moroccan Arabic dialect emotional database (MADED), and Ryerson audio-visual database on emotional speech and song (RAVDESS) which present notable differences in terms of type (natural/acted), and language (Arabic/English). We proposed a detection process based on 27 acoustic features extracted from consonant-vowel (CV) syllabic units: \ba, \du, \ki, \ta common to both databases. We tested two classification strategies: multiclass (all emotions combined: joy, sadness, neutral, anger) and binary (neutral vs. others, positive emotions (joy) vs. negative emotions (sadness, anger), sadness vs. anger). These strategies were tested three times: i) on MADED, ii) on RAVDESS, iii) on MADED and RAVDESS. The proposed method gave better recognition accuracy in the case of binary classification. The rates reach an average of 78% for the multi-class classification, 100% for neutral vs. other cases, 100% for the negative emotions (i.e. anger vs. sadness), and 96% for the positive vs. negative emotions

    Methods in prosody

    Get PDF
    This book presents a collection of pioneering papers reflecting current methods in prosody research with a focus on Romance languages. The rapid expansion of the field of prosody research in the last decades has given rise to a proliferation of methods that has left little room for the critical assessment of these methods. The aim of this volume is to bridge this gap by embracing original contributions, in which experts in the field assess, reflect, and discuss different methods of data gathering and analysis. The book might thus be of interest to scholars and established researchers as well as to students and young academics who wish to explore the topic of prosody, an expanding and promising area of study
    corecore