3 research outputs found

    Dual-level segmentation method for feature extraction enhancement strategy in speech emotion recognition

    Get PDF
    The speech segmentation approach could be one of the significant factors contributing to a Speech Emotion Recognition (SER) system's overall performance. An utterance may contain more than one perceived emotion, the boundaries between the changes of emotion in an utterance are challenging to determine. Speech segmented through the conventional fixed window did not correspond to the signal changes, due to the random segment point, an arbitrary segmented frame is produced, the segment boundary might be within the sentence or in-between emotional changes. This study introduced an improvement of segment-based segmentation on a fixed-window Relative Time Interval (RTI) by using Signal Change (SC) segmentation approach to discover the signal boundary concerning the signal transition. A segment-based feature extraction enhancement strategy using a dual-level segmentation method was proposed: RTI-SC segmentation utilizing the conventional approach. Instead of segmenting the whole utterance at the relative time interval, this study implements peak analysis to obtain segment boundaries defined by the maximum peak value within each temporary RTI segment. In peak selection, over-segmentation might occur due to connections with the input signal, impacting the boundary selection decision. Two approaches in finding the maximum peaks were implemented, firstly; peak selection by distance allocation, and secondly; peak selection by Maximum function. The substitution of the temporary RTI segment with the segment concerning signal change was intended to capture better high-level statistical-based features within the signal transition. The signal's prosodic, spectral, and wavelet properties were integrated to structure a fine feature set based on the proposed method. 36 low-level descriptors and 12 statistical features and their derivative were extracted on each segment resulted in a fixed vector dimension. Correlation-based Feature Subset Selection (CFS) with the Best First search method was applied for dimensionality reduction before Support Vector Machine (SVM) with Sequential Minimal Optimization (SMO) was implemented for classification. The performance of the feature fusion constructed from the proposed method was evaluated through speaker-dependent and speaker-independent tests on EMO-DB and RAVDESS databases. The result indicated that the prosodic and spectral feature derived from the dual-level segmentation method offered a higher recognition rate for most speaker-independent tasks with a significant improvement of the overall accuracy of 82.2% (150 features), the highest accuracy among other segmentation approaches used in this study. The proposed method outperformed the baseline approach in a single emotion assessment in both full dimensions and an optimized set. The highest accuracy for every emotion was mostly contributed by the proposed method. Using the EMO-DB database, accuracy was enhanced, specifically, happy (67.6%), anger (89%), fear (85.5%), disgust (79.3%), while neutral and sadness emotion obtained a similar accuracy with the baseline method (91%) and (93.5%) respectively. A 100% accuracy for boredom emotion (female speaker) was observed in the speaker-dependent test, the highest single emotion classified, reported in this study

    Evaluation of optimal solutions in multicriteria models for intelligent decision support

    Get PDF
    La memoria se enmarca dentro de la optimización y su uso para la toma de decisiones. La secuencia lógica ha sido la modelación, implementación, resolución y validación que conducen a una decisión. Para esto, hemos utilizado herramientas del análisis multicrerio, optimización multiobjetivo y técnicas de inteligencia artificial. El trabajo se ha estructurado en dos partes (divididas en tres capítulos cada una) que se corresponden con la parte teórica y con la parte experimental. En la primera parte se analiza el contexto del campo de estudio con un análisis del marco histórico y posteriormente se dedica un capítulo a la optimización multicriterio en el se recogen modelos conocidos, junto con aportaciones originales de este trabajo. En el tercer capítulo, dedicado a la inteligencia artificial, se presentan los fundamentos del aprendizaje estadístico , las técnicas de aprendizaje automático y de aprendizaje profundo necesarias para las aportaciones en la segunda parte. La segunda parte contiene siete casos reales a los que se han aplicado las técnicas descritas. En el primer capítulo se estudian dos casos: el rendimiento académico de los estudiantes de la Universidad Industrial de Santander (Colombia) y un sistema objetivo para la asignación del premio MVP en la NBA. En el siguiente capítulo se utilizan técnicas de inteligencia artificial a la similitud musical (detección de plagios en Youtube), la predicción del precio de cierre de una empresa en el mercado bursátil de Nueva York y la clasificación automática de señales espaciales acústicas en entornos envolventes. En el último capítulo a la potencia de la inteligencia artificial se le incorporan técnicas de análisis multicriterio para detectar el fracaso escolar universitario de manera precoz (en la Universidad Industrial de Santander) y, para establecer un ranking de modelos de inteligencia artificial de se recurre a métodos multicriterio. Para acabar la memoria, a pesar de que cada capítulo contiene una conclusión parcial, en el capítulo 8 se recogen las principales conclusiones de toda la memoria y una bibliografía bastante exhaustiva de los temas tratados. Además, el trabajo concluye con tres apéndices que contienen los programas y herramientas, que a pesar de ser útiles para la comprensión de la memoria, se ha preferido poner por separado para que los capítulos resulten más fluidos

    Technology, Science and Culture: A Global Vision, Volume IV

    Get PDF