3 research outputs found
From Motion to Emotion : Accelerometer Data Predict Subjective Experience of Music
Music is often discussed to be emotional because it reflects expressive movements in audible form. Thus, a valid approach to measure musical emotion could be to assess movement stimulated by music. In two experiments we evaluated the discriminative power of mobile-device generated acceleration data produced by free movement during music listening for the prediction of ratings on the Geneva Emotion Music Scales (GEMS-9). The quality of prediction for different dimensions of GEMS varied between experiments for tenderness (R12(first experiment) = 0.50, R22(second experiment) = 0.39), nostalgia (R12 = 0.42, R22 = 0.30), wonder (R12 = 0.25, R22 = 0.34), sadness (R12 = 0.24, R22 = 0.35), peacefulness (R12 = 0.20, R22 = 0.35) and joy (R12 = 0.19, R22 = 0.33) and transcendence (R12 = 0.14, R22 = 0.00). For others like power (R12 = 0.42, R22 = 0.49) and tension (R12 = 0.28, R22 = 0.27) results could be almost reproduced. Furthermore, we extracted two principle components from GEMS ratings, one representing arousal and the other one valence of the experienced feeling. Both qualities, arousal and valence, could be predicted by acceleration data, indicating, that they provide information on the quantity and quality of experience. On the one hand, these findings show how music-evoked movement patterns relate to music-evoked feelings. On the other hand, they contribute to integrate findings from the field of embodied music cognition into music recommender systems
Prediction of user emotion and dialogue success using audio spectrograms and convolutional neural networks
In this paper we aim to predict dialogue suc-cess and user satisfaction as well as emo-tion on a turn level. To achieve this, we in-vestigate the use of spectrogram representa-tions, extracted from audio files, in combina-tion with several types of convolutional neuralnetworks. The experiments were performed onthe Let’s Go V2 database, comprising 5065 au-dio files and having labels for subjective andobjective dialogue turn success, as well as theemotional state of the user. Results show thatby using only audio, it is possible to predictturn success with very high accuracy for allthree labels (90%). The best performing inputrepresentation were 1s long mel-spectrogramsin combination with a CNN with a bottleneckarchitecture. The resulting system has the po-tential to be used real-time. Our results signif-icantly surpass the state of the art for dialoguesuccess prediction based only on audio
Prediction of dialogue success with spectral and rhythm acoustic features using DNNS and SVMS
In this paper we investigate the novel use of exclusively audio to predict whether a spoken dialogue will be successful or not, both in a subjective and in an objective manner. To achieve that, multiple spectral and rhythmic features are inputted to support vector machines and deep neural networks. We report results on data from 3267 spoken dialogues, using both the full user response as well as parts of it. Experiments show an average accuracy of 74% can be achieved using just 5 acoustic features, when analysing merely 1 user turn, which allows both a real-time but also a fairly accurate prediction of a dialogue successfulness only after one short interaction unit. From the features tested, those related to speech rate, signal energy and cepstrum are amongst the most informative. Results presented here outperform the state of the art in spoken dialogue success prediction through solely acoustic features