Search CORE

2,971 research outputs found

Speech technologies for the audiovisual and multimedia interaction environments

Author: Alvarez Muniain Aitor
Publication venue
Publication date: 22/07/2016
Field of study

361 p

Archivo Digital para la Docencia y la Investigación

Spanish Expressive Voices: corpus for emotion research in Spanish

Author: Barra Chicote Roberto
Córdoba Herralde Ricardo de
D'haro Enríquez Luis Fernando
Fernández Martínez Fernando
Ferreiros López Javier
Lucas Cuesta Juan Manuel
Lutfi Syaheerah L.
Macías Guarasa Javier
Montero Martínez Juan Manuel
Pardo Muñoz José Manuel
San Segundo Hernández Rubén
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/05/2008
Field of study

A new emotional multimedia database has been recorded and aligned. The database comprises speech and video recordings of one actor and one actress simulating a neutral state and the Big Six emotions: happiness, sadness, anger, surprise, fear and disgust. Due to a careful design and its size (more than 100 minutes per emotion), the recorded database allows comprehensive studies on emotional speech synthesis, prosodic modelling, speech conversion, far-field speech recognition and speech and video-based emotion identification. The database has been automatically labelled for prosodic purposes (5% was manually revised). The whole database has been validated thorough objective and perceptual tests, achieving a validation score as high as 89%

Archivo Digital UPM

Evaluating prosodic cues in Italian: the use of a Telegram chatbot as a CALL tool for Italian L2 learners

Author: DE IACOVO Valentina
Palena Marco
Romano Antonio
Publication venue: Officinaventuno
Publication date: 01/01/2021
Field of study

Institutional Research Information System University of Turin

The Production of Speech Corpora

Author: Baumann Angela
Draxler Christoph
Ellbogen Tania
Schiel Florian
Steffen Alexander
Publication venue
Publication date: 21/03/2012
Field of study

Open Access LMU

Review of Research on Speech Technology: Main Contributions From Spanish Research Groups

Author: Martínez Hinarejos Carlos D.
Ortega Alfonso
San Segundo Hernández Rubén
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2011
Field of study

In the last two decades, there has been an important increase in research on speech technology in Spain, mainly due to a higher level of funding from European, Spanish and local institutions and also due to a growing interest in these technologies for developing new services and applications. This paper provides a review of the main areas of speech technology addressed by research groups in Spain, their main contributions in the recent years and the main focus of interest these days. This description is classified in five main areas: audio processing including speech, speaker characterization, speech and language processing, text to speech conversion and spoken language applications. This paper also introduces the Spanish Network of Speech Technologies (RTTH. Red Temática en Tecnologías del Habla) as the research network that includes almost all the researchers working in this area, presenting some figures, its objectives and its main activities developed in the last years

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

RiuNet

Archivo Digital UPM

Speech impairment in Parkinson’s disease: acoustic analysis of unvoiced consonants in Italian native speakers

Author: Amato F.
Artusi C. A.
Borzi' L.
Imbalzano G.
Lopiano L.
Olmo G.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

The study of the influence of Parkinson’s Disease (PD) on vocal signals has received much attention over the last decades. Increasing interest has been devoted to articulation and acoustic characterization of different phonemes. Method: In this study we propose the analysis of the Transition Regions (TR) of specific phonetic groups to model the loss of motor control and the difficulty to start/stop movements, typical of PD patients. For this purpose, we extracted 60 features from pre-processed vocal signals and used them as input to several machine learning models. We employed two data sets, containing samples from Italian native speakers, for training and testing. The first dataset - 28 PD patients and 22 Healthy Control (HC) - included recordings in optimal conditions, while in the second one - 26 PD patients and 18 HC- signals were collected at home, using non-professional microphones. Results: We optimized two support vector machine models for the application in controlled noise conditions and home environments, achieving 98% ± 1.1 and 88% ± 2.8 accuracy in 10-fold cross-validation, respectively. Conclusion: This study confirms the high capability of the TRs to discriminate between PD patients and healthy controls, and the feasibility of automatic PD assessment using voice recordings. Moreover, the promising performance of the implemented model discloses the option of voice processing using low-cost devices and domestic recordings, possibly self-managed by the patients themselves

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Emotion recognition based on the energy distribution of plosive syllables

Author: Agrima Abdellah
Elmazouzi Laila
Farchi Abdelmajid
Mounir Badia
Mounir Ilham
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2022
Field of study

We usually encounter two problems during speech emotion recognition (SER): expression and perception problems, which vary considerably between speakers, languages, and sentence pronunciation. In fact, finding an optimal system that characterizes the emotions overcoming all these differences is a promising prospect. In this perspective, we considered two emotional databases: Moroccan Arabic dialect emotional database (MADED), and Ryerson audio-visual database on emotional speech and song (RAVDESS) which present notable differences in terms of type (natural/acted), and language (Arabic/English). We proposed a detection process based on 27 acoustic features extracted from consonant-vowel (CV) syllabic units: \ba, \du, \ki, \ta common to both databases. We tested two classification strategies: multiclass (all emotions combined: joy, sadness, neutral, anger) and binary (neutral vs. others, positive emotions (joy) vs. negative emotions (sadness, anger), sadness vs. anger). These strategies were tested three times: i) on MADED, ii) on RAVDESS, iii) on MADED and RAVDESS. The proposed method gave better recognition accuracy in the case of binary classification. The rates reach an average of 78% for the multi-class classification, 100% for neutral vs. other cases, 100% for the negative emotions (i.e. anger vs. sadness), and 96% for the positive vs. negative emotions

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Methods in prosody

Author
Publication venue
Publication date
Field of study

This book presents a collection of pioneering papers reflecting current methods in prosody research with a focus on Romance languages. The rapid expansion of the field of prosody research in the last decades has given rise to a proliferation of methods that has left little room for the critical assessment of these methods. The aim of this volume is to bridge this gap by embracing original contributions, in which experts in the field assess, reflect, and discuss different methods of data gathering and analysis. The book might thus be of interest to scholars and established researchers as well as to students and young academics who wish to explore the topic of prosody, an expanding and promising area of study

OAPEN Library

Automatic Detection of Mild Cognitive Impairment from Spontaneous Speech using ASR

Author: Gosztolya Gábor
Hoffmann Ildikó
Szatlóczki Gréta
Tóth László
Vincze Veronika
Publication venue: 'The International Fiscal Association of Korea'
Publication date: 01/01/2015
Field of study

Repository of the Academy's Library