24,217 research outputs found

    Multimodal music information processing and retrieval: survey and future challenges

    Full text link
    Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion, and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years

    Fusion of Learned Multi-Modal Representations and Dense Trajectories for Emotional Analysis in Videos

    Get PDF
    When designing a video affective content analysis algorithm, one of the most important steps is the selection of discriminative features for the effective representation of video segments. The majority of existing affective content analysis methods either use low-level audio-visual features or generate handcrafted higher level representations based on these low-level features. We propose in this work to use deep learning methods, in particular convolutional neural networks (CNNs), in order to automatically learn and extract mid-level representations from raw data. To this end, we exploit the audio and visual modality of videos by employing Mel-Frequency Cepstral Coefficients (MFCC) and color values in the HSV color space. We also incorporate dense trajectory based motion features in order to further enhance the performance of the analysis. By means of multi-class support vector machines (SVMs) and fusion mechanisms, music video clips are classified into one of four affective categories representing the four quadrants of the Valence-Arousal (VA) space. Results obtained on a subset of the DEAP dataset show (1) that higher level representations perform better than low-level features, and (2) that incorporating motion information leads to a notable performance gain, independently from the chosen representation

    Towards responsive Sensitive Artificial Listeners

    Get PDF
    This paper describes work in the recently started project SEMAINE, which aims to build a set of Sensitive Artificial Listeners – conversational agents designed to sustain an interaction with a human user despite limited verbal skills, through robust recognition and generation of non-verbal behaviour in real-time, both when the agent is speaking and listening. We report on data collection and on the design of a system architecture in view of real-time responsiveness

    First impressions: A survey on vision-based apparent personality trait analysis

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Personality analysis has been widely studied in psychology, neuropsychology, and signal processing fields, among others. From the past few years, it also became an attractive research area in visual computing. From the computational point of view, by far speech and text have been the most considered cues of information for analyzing personality. However, recently there has been an increasing interest from the computer vision community in analyzing personality from visual data. Recent computer vision approaches are able to accurately analyze human faces, body postures and behaviors, and use these information to infer apparent personality traits. Because of the overwhelming research interest in this topic, and of the potential impact that this sort of methods could have in society, we present in this paper an up-to-date review of existing vision-based approaches for apparent personality trait recognition. We describe seminal and cutting edge works on the subject, discussing and comparing their distinctive features and limitations. Future venues of research in the field are identified and discussed. Furthermore, aspects on the subjectivity in data labeling/evaluation, as well as current datasets and challenges organized to push the research on the field are reviewed.Peer ReviewedPostprint (author's final draft

    A common neural scale for the subjective pleasantness of different primary rewards.

    Get PDF
    When an economic decision is taken, it is between goals with different values, and the values must be on the same scale. Here, we used functional MRI to search for a brain region that represents the subjective pleasantness of two different rewards on the same neural scale. We found activity in the ventral prefrontal cortex that correlated with the subjective pleasantness of two fundamentally different rewards, taste in the mouth and warmth on the hand. The evidence came from two different investigations, a between-group comparison of two independent fMRI studies, and from a within-subject study. In the latter, we showed that neural activity in the same voxels in the ventral prefrontal cortex correlated with the subjective pleasantness of the different rewards. Moreover, the slope and intercept for the regression lines describing the relationship between activations and subjective pleasantness were highly similar for the different rewards. We also provide evidence that the activations did not simply represent multisensory integration or the salience of the rewards. The findings demonstrate the existence of a specific region in the human brain where neural activity scales with the subjective pleasantness of qualitatively different primary rewards. This suggests a principle of brain processing of importance in reward valuation and decision-making

    Multi-Moji: Combining Thermal, Vibrotactile and Visual Stimuli to Expand the Affective Range of Feedback

    Get PDF
    This paper explores the combination of multiple concurrent modalities for conveying emotional information in HCI: temperature, vibration and abstract visual displays. Each modality has been studied individually, but can only convey a limited range of emotions within two-dimensional valencearousal space. This paper is the first to systematically combine multiple modalities to expand the available affective range. Three studies were conducted: Study 1 measured the emotionality of vibrotactile feedback by itself; Study 2 measured the perceived emotional content of three bimodal combinations: vibrotactile + thermal, vibrotactile + visual and visual + thermal. Study 3 then combined all three modalities. Results show that combining modalities increases the available range of emotional states, particularly in the problematic top-right and bottom-left quadrants of the dimensional model. We also provide a novel lookup resource for designers to identify stimuli to convey a range of emotions
    • …
    corecore