8 research outputs found

    Speech Emotion Recognition System

    Get PDF
    Speech Emotion Recognition (SER) is a research topic which has a wide range of applications. The features of speech such as, Mel Frequency cepstrum coefficients (MFCC ) are extracted which are uttered in the speech. To classify different emotional states such as boredom, happiness, sadness, neutral, anger, from various emotional sound tracks from a database containing emotional speech SVM is used as classifier. SVM is used for classification of emotions. The accuracy obtained in SVM is very much higher. DOI: 10.17762/ijritcc2321-8169.15057

    Spanish Expressive Voices: corpus for emotion research in Spanish

    Get PDF
    A new emotional multimedia database has been recorded and aligned. The database comprises speech and video recordings of one actor and one actress simulating a neutral state and the Big Six emotions: happiness, sadness, anger, surprise, fear and disgust. Due to a careful design and its size (more than 100 minutes per emotion), the recorded database allows comprehensive studies on emotional speech synthesis, prosodic modelling, speech conversion, far-field speech recognition and speech and video-based emotion identification. The database has been automatically labelled for prosodic purposes (5% was manually revised). The whole database has been validated thorough objective and perceptual tests, achieving a validation score as high as 89%

    Expressive Speech Identifications based on Hidden Markov Model

    Get PDF
    This paper concerns a sub-area of a larger research field of Affective Computing, focusing on the employment of affect-recognition systems using speech modality. It is proposed that speech-based affect identification systems could play an important role as next generation biometric identification systems that are aimed at determining a person’s ‘state of mind’, or psycho-physiological state. The possible areas for the deployment of voice-affect recognition technology are discussed. Additionally, the experiments and results for emotion identification in speech based on a Hidden Markov Models (HMMs) classifier are also presented. The result from experiment suggests that certain speech feature is more precise to identify certain emotional state, and that happiness is the most difficult emotion to detect

    Desarrollo de un Robot-Guía con Integración de un Sistema de Diálogo y Expresión de Emociones: Proyecto ROBINT

    Get PDF
    Este artículo presenta la incorporación de un sistema de diálogo hablado a un robot autónomo, concebido como elemento interactivo en un museo de ciencias capaz de realizar visitas guiadas y establecer diálogos sencillos con los visitantes del mismo. Para hacer más atractivo su funcionamiento, se ha dotado al robot de rasgos (como expresividad gestual o síntesis de voz con emociones) que humanizan sus intervenciones. El reconocedor de voz es un subsistema independiente del locutor (permite reconocer el habla de cualquier persona), que incorpora medidas de confianza para mejorar las prestaciones del reconocimiento, puesto que se logra un filtrado muy importante de habla parásita. En cuanto al sistema de comprensión, hace uso de un sistema de aprendizaje basado en reglas, lo que le permite inferir información explícita de un conjunto de ejemplos, sin que sea necesario generar previamente una gramática o un conjunto de reglas que guíen al módulo de comprensión. Estos subsistemas se han evaluado previamente en una tarea de control por voz de un equipo HIFI, empleando nuestro robot como elemento de interfaz, obteniendo valores de 95,9% de palabras correctamente reconocidas y 92,8% de conceptos reconocidos. En cuanto al sistema de conversión de texto a voz, se ha implementado un conjunto de modificaciones segmentales y prosódicas sobre una voz neutra, que conducen a la generación de emociones en la voz sintetizada por el robot, tales como alegría, enfado, tristeza o sorpresa. La fiabilidad de estas emociones se ha medido con varios experimentos perceptuales que arrojan resultados de identificación superiores al 70% para la mayoría de las emociones, (87% en tristeza, 79,1% en sorpresa)

    Reconocimiento De Emociones Empleando Procesamiento Digital De La Señal De Voz

    Get PDF
    Se presenta en este trabajo una metodología para la caracterización de la señal de voz aplicada en el reconocimiento de estados emocionales. Los diferentes estados emocionales de un hablante producen cambios fisiológicos en el aparato fonador, lo que se ve reflejado en la variación de algunos parámetros de la voz. Las técnicas de procesamiento empleadas son: transformadas tiempo-frecuencia, análisis de predicción lineal y raw data

    Reconocimiento De Emociones Empleando Procesamiento Digital De La Señal De Voz

    Get PDF
    Se presenta en este trabajo una metodología para la caracterización de la señal de voz aplicada en el reconocimiento de estados emocionales. Los diferentes estados emocionales de un hablante producen cambios fisiológicos en el aparato fonador, lo que se ve reflejado en la variación de algunos parámetros de la voz. Las técnicas de procesamiento empleadas son: transformadas tiempo-frecuencia, análisis de predicción lineal y raw data

    Context-aware speech synthesis: A human-inspired model for monitoring and adapting synthetic speech

    Get PDF
    The aim of this PhD thesis is to illustrate the development a computational model for speech synthesis, which mimics the behaviour of human speaker when they adapt their production to their communicative conditions. The PhD project was motivated by the observed differences between state-of-the- art synthesiser’s speech and human production. In particular, synthesiser outcome does not exhibit any adaptation to communicative context such as environmental disturbances, listener’s needs, or speech content meanings, as the human speech does. No evaluation is performed by standard synthesisers to check whether their production is suitable for the communication requirements. Inspired by Lindblom's Hyper and Hypo articulation theory (H&H) theory of speech production, the computational model of Hyper and Hypo articulation theory (C2H) is proposed. This novel computational model for automatic speech production is designed to monitor its outcome and to be able to control the effort involved in the synthetic speech generation. Speech transformations are based on the hypothesis that low-effort attractors for a human speech production system can be identified. Such acoustic configurations are close to minimum possible effort that a speaker can make in speech production. The interpolation/extrapolation along the key dimension of hypo/hyper-articulation can be motivated by energetic considerations of phonetic contrast. The complete reactive speech synthesis is enabled by adding a negative perception feedback loop to the speech production chain in order to constantly assess the communicative effectiveness of the proposed adaptation. The distance to the original communicative intents is the control signal that drives the speech transformations. A hidden Markov model (HMM)-based speech synthesiser along with the continuous adaptation of its statistical models is used to implement the C2H model. A standard version of the synthesis software does not allow for transformations of speech during the parameter generation. Therefore, the generation algorithm of one the most well-known speech synthesis frameworks, HMM/DNN-based speech synthesis framework (HTS), is modified. The short-time implementation of speech intelligibility index (SII), named extended speech intelligibility index (eSII), is also chosen as the main perception measure in the feedback loop to control the transformation. The effectiveness of the proposed model is tested by performing acoustic analysis, objective, and subjective evaluations. A key assessment is to measure the control of the speech clarity in noisy condition, and the similarities between the emerging modifications and human behaviour. Two objective scoring methods are used to assess the speech intelligibility of the implemented system: the speech intelligibility index (SII) and the index based upon the Dau measure (Dau). Results indicate that the intelligibility of C2H-generated speech can be continuously controlled. The effectiveness of reactive speech synthesis and of the phonetic contrast motivated transforms is confirmed by the acoustic and objective results. More precisely, in the maximum-strength hyper-articulation transformations, the improvement with respect to non-adapted speech is above 10% for all intelligibility indices and tested noise conditions
    corecore