Search CORE

259 research outputs found

Forecasting with time series imaging

Author: Kang Yanfei
Li Feng
Li Xixi
Publication venue: 'Elsevier BV'
Publication date: 04/06/2020
Field of study

Feature-based time series representations have attracted substantial attention in a wide range of time series analysis methods. Recently, the use of time series features for forecast model averaging has been an emerging research focus in the forecasting community. Nonetheless, most of the existing approaches depend on the manual choice of an appropriate set of features. Exploiting machine learning methods to extract features from time series automatically becomes crucial in state-of-the-art time series analysis. In this paper, we introduce an automated approach to extract time series features based on time series imaging. We first transform time series into recurrence plots, from which local features can be extracted using computer vision algorithms. The extracted features are used for forecast model averaging. Our experiments show that forecasting based on automatically extracted features, with less human intervention and a more comprehensive view of the raw time series data, yields highly comparable performances with the best methods in the largest forecasting competition dataset (M4) and outperforms the top methods in the Tourism forecasting competition dataset

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Comprehensive Study of Automatic Speech Emotion Recognition Systems

Author: Jagtap Sonal
Kawade Rupali
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/08/2023
Field of study

Speech emotion recognition (SER) is the technology that recognizes psychological characteristics and feelings from the speech signals through techniques and methodologies. SER is challenging because of more considerable variations in different languages arousal and valence levels. Various technical developments in artificial intelligence and signal processing methods have encouraged and made it possible to interpret emotions.SER plays a vital role in remote communication. This paper offers a recent survey of SER using machine learning (ML) and deep learning (DL)-based techniques. It focuses on the various feature representation and classification techniques used for SER. Further, it describes details about databases and evaluation metrics used for speech emotion recognition

International Journal on Recent and Innovation Trends in Computing and Communication

Learning Local to Global Feature Aggregation for Speech Emotion Recognition

Author: Li Sunan
Lian Hailun
Lu Cheng
Zhao Yan
Zheng Wenming
Zong Yuan
Publication venue
Publication date: 02/06/2023
Field of study

Transformer has emerged in speech emotion recognition (SER) at present. However, its equal patch division not only damages frequency information but also ignores local emotion correlations across frames, which are key cues to represent emotion. To handle the issue, we propose a Local to Global Feature Aggregation learning (LGFA) for SER, which can aggregate longterm emotion correlations at different scales both inside frames and segments with entire frequency information to enhance the emotion discrimination of utterance-level speech features. For this purpose, we nest a Frame Transformer inside a Segment Transformer. Firstly, Frame Transformer is designed to excavate local emotion correlations between frames for frame embeddings. Then, the frame embeddings and their corresponding segment features are aggregated as different-level complements to be fed into Segment Transformer for learning utterance-level global emotion features. Experimental results show that the performance of LGFA is superior to the state-of-the-art methods.Comment: This paper has been accepted on INTERSPEECH 202

arXiv.org e-Print Archive

Learning Audio Sequence Representations for Acoustic Event Classification

Author: Han Jing
Liu Ding
Qian Kun
Schuller Björn W.
Zhang Zixing
Publication venue
Publication date: 27/07/2017
Field of study

Acoustic Event Classification (AEC) has become a significant task for machines to perceive the surrounding auditory scene. However, extracting effective representations that capture the underlying characteristics of the acoustic events is still challenging. Previous methods mainly focused on designing the audio features in a 'hand-crafted' manner. Interestingly, data-learnt features have been recently reported to show better performance. Up to now, these were only considered on the frame-level. In this paper, we propose an unsupervised learning framework to learn a vector representation of an audio sequence for AEC. This framework consists of a Recurrent Neural Network (RNN) encoder and a RNN decoder, which respectively transforms the variable-length audio sequence into a fixed-length vector and reconstructs the input sequence on the generated vector. After training the encoder-decoder, we feed the audio sequences to the encoder and then take the learnt vectors as the audio sequence representations. Compared with previous methods, the proposed method can not only deal with the problem of arbitrary-lengths of audio streams, but also learn the salient information of the sequence. Extensive evaluation on a large-size acoustic event database is performed, and the empirical results demonstrate that the learnt audio sequence representation yields a significant performance improvement by a large margin compared with other state-of-the-art hand-crafted sequence features for AEC

arXiv.org e-Print Archive

OPUS Augsburg

Generating and protecting against adversarial attacks for deep speech-based emotion recognition models

Author: Baird Alice
Han Jing
Ren Zhao
Schuller Björn
Zhang Zixing
Publication venue
Publication date: 01/01/2020
Field of study

A paper in ICASSP 202

OPUS Augsburg

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY