Search CORE

7,787 research outputs found

Multimodal music information processing and retrieval: survey and future challenges

Author: Avanzini Federico
Ntalampiras Stavros
Simonetta Federico
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/02/2019
Field of study

Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion, and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years

arXiv.org e-Print Archive

Crossref

Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emotional Analysis in Videos

Author: Acar Esra
Albayrak Sahin
Hopfgartner Frank
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2015
Field of study

When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation

Crossref

Enlighten

AudioPairBank: Towards A Large-Scale Tag-Pair-Based Audio Content Analysis

Author: Borth Damian
Elizalde Benjamin
Lane Ian
Raj Bhiksha
Sager Sebastian
Schulze Christian
Publication venue
Publication date: 08/01/2018
Field of study

Recently, sound recognition has been used to identify sounds, such as car and river. However, sounds have nuances that may be better described by adjective-noun pairs such as slow car, and verb-noun pairs such as flying insects, which are under explored. Therefore, in this work we investigate the relation between audio content and both adjective-noun pairs and verb-noun pairs. Due to the lack of datasets with these kinds of annotations, we collected and processed the AudioPairBank corpus consisting of a combined total of 1,123 pairs and over 33,000 audio files. One contribution is the previously unavailable documentation of the challenges and implications of collecting audio recordings with these type of labels. A second contribution is to show the degree of correlation between the audio content and the labels through sound recognition experiments, which yielded results of 70% accuracy, hence also providing a performance benchmark. The results and study in this paper encourage further exploration of the nuances in audio and are meant to complement similar research performed on images and text in multimedia analysis.Comment: This paper is a revised version of "AudioSentibank: Large-scale Semantic Ontology of Acoustic Concepts for Audio Content Analysis

arXiv.org e-Print Archive

Directory of Open Access Journals

Music-aided affective interaction between human and service robot

Author: A Gabrielsson
A Makiko
C Bartneck
C Breazeal
CC Pratt
CL Sidner
D Ververidis
DE Rumelhart
E Cowie
EM Schmidt
EM Schmidt
Gil-Jin Jang
HA Rowley
HG Lee
J Han
J Ledoux
Jeong-Sik Park
JS Park
KR Scherer
L Franco
LC De Silva
LM Ignacio
M Paleari
M Richins
N Tosa
O Kwon
P Ahrendt
P Ekman
P Vanroose
R Cowie
R Huang
R Nakatsu
RC Arkin
S Giripunje
T Nwe
T Shibata
X Yang
X Zhu
YE Kim
YH Yang
Yong-Ho Seo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/08/2014
Field of study

This study proposes a music-aided framework for affective interaction of service robots with humans. The framework consists of three systems, respectively, for perception, memory, and expression on the basis of the human brain mechanism. We propose a novel approach to identify human emotions in the perception system. The conventional approaches use speech and facial expressions as representative bimodal indicators for emotion recognition. But, our approach uses the mood of music as a supplementary indicator to more correctly determine emotions along with speech and facial expressions. For multimodal emotion recognition, we propose an effective decision criterion using records of bimodal recognition results relevant to the musical mood. The memory and expression systems also utilize musical data to provide natural and affective reactions to human emotions. For evaluation of our approach, we simulated the proposed human-robot interaction with a service robot, iRobiQ. Our perception system exhibited superior performance over the conventional approach, and most human participants noted favorable reactions toward the music-aided affective interaction.open0

Crossref

ScholarWorks@UNIST

Affective Recommendation of Movies Based on Selected Connotative Features

Author: Luca Canini
Riccardo Leonardi
Sergio Benini
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

The apparent difficulty in assessing emotions elicited by movies and the undeniable high variability in subjects emotional responses to filmic content have been recently tackled by exploring film connotative properties: the set of shooting and editing conventions that help in transmitting meaning to the audience. Connotation provides an intermediate representation which exploits the objectivity of audiovisual descriptors to predict the subjective emotional reaction of single users. This is done without the need of registering users physiological signals neither by employing other people highly variable emotional rates, but just relying on the inter-subjectivity of connotative concepts and on the knowledge of users reactions to similar stimuli. This work extends previous by extracting audiovisual and film grammar descriptors and, driven by users rates on connotative properties, creates a shared framework where movie scenes are placed, compared and recommended according to connotation. We evaluate the potential of the proposed system by asking users to assess the ability of connotation in suggesting filmic content able to target their affective requests

Crossref

Archivio istituzionale della ricerca - Università di Brescia

Motion and emotion : Semantic knowledge for hollywood film indexing

Author: WANG HEE LIN
Publication venue
Publication date: 02/05/2008
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS