16 research outputs found
User independent Emotion Recognition with Residual Signal-Image Network
User independent emotion recognition with large scale physiological signals
is a tough problem. There exist many advanced methods but they are conducted
under relatively small datasets with dozens of subjects. Here, we propose
Res-SIN, a novel end-to-end framework using Electrodermal Activity(EDA) signal
images to classify human emotion. We first apply convex optimization-based EDA
(cvxEDA) to decompose signals and mine the static and dynamic emotion changes.
Then, we transform decomposed signals to images so that they can be effectively
processed by CNN frameworks. The Res-SIN combines individual emotion features
and external emotion benchmarks to accelerate convergence. We evaluate our
approach on the PMEmo dataset, the currently largest emotional dataset
containing music and EDA signals. To the best of author's knowledge, our method
is the first attempt to classify large scale subject-independent emotion with
7962 pieces of EDA signals from 457 subjects. Experimental results demonstrate
the reliability of our model and the binary classification accuracy of 73.65%
and 73.43% on arousal and valence dimension can be used as a baseline
Recommended from our members
Predicting music emotion with social media discourse
Predicting the average affect of a piece of music is a task which has been of recent interest in the field of music information retrieval. We investigate the use of sentiment analysis on online social media conversations to predict a song’s valence and arousal. Using four music emotion datasets - DEAM, AMG1608, Deezer, and PmEmo, we create a corpus of social media commentary surrounding the songs contained in these datasets by extracting comments from YouTube, Twitter, and Reddit. Two learning approaches are compared − one bag-of-words model using dictionaries of affective terms to extract emotive features, and a DistilBERT transformer model fine-tuned on our social media discourse to perform direct comment-level valence and arousal prediction. We find that transformer models are better suited to the task of predicting music emotion directly from social media conversations
Music emotion recognition based on segment-level two-stage learning
AbstractIn most Music Emotion Recognition (MER) tasks, researchers tend to use supervised learning models based on music features and corresponding annotation. However, few researchers have considered applying unsupervised learning approaches to labeled data except for feature representation. In this paper, we propose a segment-based two-stage model combining unsupervised learning and supervised learning. In the first stage, we split each music excerpt into contiguous segments and then utilize an autoencoder to generate segment-level feature representation. In the second stage, we feed these time-series music segments to a bidirectional long short-term memory deep learning model to achieve the final music emotion classification. Compared with the whole music excerpts, segments as model inputs could be the proper granularity for model training and augment the scale of training samples to reduce the risk of overfitting during deep learning. Apart from that, we also apply frequency and time masking to segment-level inputs in the unsupervised learning part to enhance training performance. We evaluate our model on two datasets. The results show that our model outperforms state-of-the-art models, some of which even use multimodal architectures. And the performance comparison also evidences the effectiveness of audio segmentation and the autoencoder with masking in an unsupervised way.</jats:p
The multiple voices of musical emotions: source separation for improving music emotion recognition models and their interpretability
Despite the manifold developments in music emotion recognition and related areas, estimating the emotional impact of music still poses many challenges. These are often associated to the complexity of the acoustic codes to emotion and the lack of large amounts of data with robust golden standards. In this paper, we propose a new computational model (EmoMucs) that considers the role of different musical voices in the prediction of the emotions induced by music. We combine source separation algorithms for breaking up music signals into independent song elements (vocals, bass, drums, other) and end-to-end state-of-the-art machine learning techniques for feature extraction and emotion modelling (valence and arousal regression). Through a series of computational experiments on a benchmark dataset using source-specialised models trained independently and different fusion strategies, we demonstrate that EmoMucs outperforms state-of-the-art approaches with the advantage of providing insights into the relative contribution of different musical elements to the emotions perceived by listeners
Digital music interventions for stress with bio-sensing: a survey
Music therapy is used to treat stress and anxiety in patients for a broad range of reasons such as cancer treatment, substance abuse, addressing trauma, and just daily stress in life. However, access to treatment is limited by the need for trained music therapists and the difficulty of quantitatively measuring efficacy in treatment. We present a survey of digital music systems that utilize biosensing for the purpose of reducing stress and anxiety with therapeutic use of music. The survey analyzes biosensing instruments for brain activity, cardiovascular, electrodermal, and respiratory measurements for efficacy in reduction in stress and anxiety. The survey also emphasizes digital music systems where biosensing is utilized to adapt music playback to the subject, forming a biofeedback loop. We also discuss how these digital music systems can use biofeedback coupled with machine learning to provide improved efficacy. Lastly, we posit that such digital music systems can be realized using consumer-grade biosensing wearables coupled with smartphones. Such systems can provide benefit to music therapists as well as to anyone wanting to treat stress from daily living
MusiCoder: A Universal Music-Acoustic Encoder Based on Transformers
Music annotation has always been one of the critical topics in the field of
Music Information Retrieval (MIR). Traditional models use supervised learning
for music annotation tasks. However, as supervised machine learning approaches
increase in complexity, the increasing need for more annotated training data
can often not be matched with available data. In this paper, a new
self-supervised music acoustic representation learning approach named MusiCoder
is proposed. Inspired by the success of BERT, MusiCoder builds upon the
architecture of self-attention bidirectional transformers. Two pre-training
objectives, including Contiguous Frames Masking (CFM) and Contiguous Channels
Masking (CCM), are designed to adapt BERT-like masked reconstruction
pre-training to continuous acoustic frame domain. The performance of MusiCoder
is evaluated in two downstream music annotation tasks. The results show that
MusiCoder outperforms the state-of-the-art models in both music genre
classification and auto-tagging tasks. The effectiveness of MusiCoder indicates
a great potential of a new self-supervised learning approach to understand
music: first apply masked reconstruction tasks to pre-train a transformer-based
model with massive unlabeled music acoustic data, and then finetune the model
on specific downstream tasks with labeled data
ENSA dataset: a dataset of songs by non-superstar artists tested with an emotional analysis based on time-series
This paper presents a novel dataset of songs by non-superstar artists in which a set of musical data is collected, identifying for each song its musical structure, and the emotional perception of the artist through a categorical emotional labeling process. The generation of this preliminary dataset is motivated by the existence of biases that have been detected in the analysis of the most used datasets in the field of emotion-based music recommendation. This new dataset contains 234 min of audio and 60 complete and labeled songs. In addition, an emotional analysis is carried out based on the representation of dynamic emotional perception through a time-series approach, in which the similarity values generated by the dynamic time warping (DTW) algorithm are analyzed and then used to implement a clustering process with the K-means algorithm. In the same way, clustering is also implemented with a Uniform Manifold Approximation and Projection (UMAP) technique, which is a manifold learning and dimension reduction algorithm. The algorithm HDBSCAN is applied for determining the optimal number of clusters. The results obtained from the different clustering strategies are compared and, in a preliminary analysis, a significant consistency is found between them. With the findings and experimental results obtained, a discussion is presented highlighting the importance of working with complete songs, preferably with a well-defined musical structure, considering the emotional variation that characterizes a song during the listening experience, in which the intensity of the emotion usually changes between verse, bridge, and chorus
Understanding Agreement and Disagreement in Listeners’ Perceived Emotion in Live Music Performance
Emotion perception of music is subjective and time dependent. Most computational music emotion recognition (MER) systems overlook time- and listener-dependent factors by averaging emotion judgments across listeners. In this work, we investigate the influence of music, setting (live vs lab vs online), and individual factors on music emotion perception over time. In an initial study, we explore changes in perceived music emotions among audience members during live classical music performances. Fifteen audience members used a mobile application to annotate time-varying emotion judgments based on the valence-arousal model. Inter-rater reliability analyses indicate that consistency in emotion judgments varies significantly across rehearsal segments, with systematic disagreements in certain segments. In a follow-up study, we examine listeners' reasons for their ratings in segments with high and low agreement. We relate these reasons to acoustic features and individual differences. Twenty-one listeners annotated perceived emotions while watching a recorded video of the live performance. They then reflected on their judgments and provided explanations retrospectively. Disagreements were attributed to listeners attending to different musical features or being uncertain about the expressed emotions. Emotion judgments were significantly associated with personality traits, gender, cultural background, and music preference. Thematic analysis of explanations revealed cognitive processes underlying music emotion perception, highlighting attributes less frequently discussed in MER studies, such as instrumentation, arrangement, musical structure, and multimodal factors related to performer expression. Exploratory models incorporating these semantic features and individual factors were developed to predict perceived music emotion over time. Regression analyses confirmed the significance of listener-informed semantic features as independent variables, with individual factors acting as moderators between loudness, pitch range, and arousal. In our final study, we analyzed the effects of individual differences on music emotion perception among 128 participants with diverse backgrounds. Participants annotated perceived emotions for 51 piano performances of different compositions from the Western canon, spanning various era. Linear mixed effects models revealed significant variations in valence and arousal ratings, as well as the frequency of emotion ratings, with regard to several individual factors: music sophistication, music preferences, personality traits, and mood states. Additionally, participants' ratings of arousal, valence, and emotional agreement were significantly associated to the historical time periods of the examined clips. This research highlights the complexity of music emotion perception, revealing it to be a dynamic, individual and context-dependent process. It paves the way for the development of more individually nuanced, time-based models in music psychology, opening up new avenues for personalised music emotion recognition and recommendation, music emotion-driven generation and therapeutic applications