4,796 research outputs found

    EMIR: A novel emotion-based music retrieval system

    Get PDF
    Music is inherently expressive of emotion meaning and affects the mood of people. In this paper, we present a novel EMIR (Emotional Music Information Retrieval) System that uses latent emotion elements both in music and non-descriptive queries (NDQs) to detect implicit emotional association between users and music to enhance Music Information Retrieval (MIR). We try to understand the latent emotional intent of queries via machine learning for emotion classification and compare the performance of emotion detection approaches on different feature sets. For this purpose, we extract music emotion features from lyrics and social tags crawled from the Internet, label some for training and model them in high-dimensional emotion space and recognize latent emotion of users by query emotion analysis. The similarity between queries and music is computed by verified BM25 model

    Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval

    Get PDF
    Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multimedia retrieval, with the aim of learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities such as audio and lyrics should be taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). Data in different modalities are converted to the same canonical space where inter modal canonical correlation analysis is utilized as an objective function to calculate the similarity of temporal structures. This is the first study that uses deep architectures for learning the temporal correlation between audio and lyrics. A pre-trained Doc2Vec model followed by fully-connected layers is used to represent lyrics. Two significant contributions are made in the audio branch, as follows: i) We propose an end-to-end network to learn cross-modal correlation between audio and lyrics, where feature extraction and correlation learning are simultaneously performed and joint representation is learned by considering temporal structures. ii) As for feature extraction, we further represent an audio signal by a short sequence of local summaries (VGG16 features) and apply a recurrent neural network to compute a compact feature that better learns temporal structures of music audio. Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval

    Multimodal music information processing and retrieval: survey and future challenges

    Full text link
    Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, motion, and gestural data, video recordings, editorial or cultural tags, lyrics and album cover arts. This paper critically reviews the various approaches adopted in Music Information Processing and Retrieval and highlights how multimodal algorithms can help Music Computing applications. First, we categorize the related literature based on the application they address. Subsequently, we analyze existing information fusion approaches, and we conclude with the set of challenges that Music Information Retrieval and Sound and Music Computing research communities should focus in the next years

    The Use of Rhyme, Rhythm, and Melody as a Form of Repetition Priming to Aid in Encoding, Storage, and Retrieval of Semantic Memories in Alzheimer’s Patients

    Get PDF
    Millions are diagnosed with Alzheimer’s disease annually which can have debilitating effects on patient memory. Thus, finding new ways to help facilitate memory in these patients, especially through non-pharmaceutical means, has become increasingly important. I examined the use of melody, rhyme, and rhythm as encoding mechanisms to aid in the retrieval of long term semantic information by juxtaposing scholarly articles detailing experiments, each of which examined the effects of various facets of memory facilitation; this helped produce an idea of which devices are most effective. Additionally, I surveyed studies highlighting limitations of song implementation to craft an effective plan to aid Alzheimer’s patients. Melody, rhyme, and rhythm provide an organizational structure to facilitate the encoding of information. Specifically, chunking, the grouping of smaller units into larger ‘chunks’, helps facilitate long term encoding in patients, and is the byproduct of the organizational structure of a text. A major drawback of using these devices is the loss in the depth of encoding semantic information; however, it is important to recognize music still assists general content memory. Therefore, Alzheimer’s patients would benefit from the use of melody as it would provide a moral support, helping familiarity with their surroundings, although they would not benefit from instructional song. Future experiments may study the combination of discussed factors in various settings to examine the unique benefits of music on memory in Alzheimer’s patients

    Current Challenges and Visions in Music Recommender Systems Research

    Full text link
    Music recommender systems (MRS) have experienced a boom in recent years, thanks to the emergence and success of online streaming services, which nowadays make available almost all music in the world at the user's fingertip. While today's MRS considerably help users to find interesting music in these huge catalogs, MRS research is still facing substantial challenges. In particular when it comes to build, incorporate, and evaluate recommendation strategies that integrate information beyond simple user--item interactions or content-based descriptors, but dig deep into the very essence of listener needs, preferences, and intentions, MRS research becomes a big endeavor and related publications quite sparse. The purpose of this trends and survey article is twofold. We first identify and shed light on what we believe are the most pressing challenges MRS research is facing, from both academic and industry perspectives. We review the state of the art towards solving these challenges and discuss its limitations. Second, we detail possible future directions and visions we contemplate for the further evolution of the field. The article should therefore serve two purposes: giving the interested reader an overview of current challenges in MRS research and providing guidance for young researchers by identifying interesting, yet under-researched, directions in the field

    Affective Music Information Retrieval

    Full text link
    Much of the appeal of music lies in its power to convey emotions/moods and to evoke them in listeners. In consequence, the past decade witnessed a growing interest in modeling emotions from musical signals in the music information retrieval (MIR) community. In this article, we present a novel generative approach to music emotion modeling, with a specific focus on the valence-arousal (VA) dimension model of emotion. The presented generative model, called \emph{acoustic emotion Gaussians} (AEG), better accounts for the subjectivity of emotion perception by the use of probability distributions. Specifically, it learns from the emotion annotations of multiple subjects a Gaussian mixture model in the VA space with prior constraints on the corresponding acoustic features of the training music pieces. Such a computational framework is technically sound, capable of learning in an online fashion, and thus applicable to a variety of applications, including user-independent (general) and user-dependent (personalized) emotion recognition and emotion-based music retrieval. We report evaluations of the aforementioned applications of AEG on a larger-scale emotion-annotated corpora, AMG1608, to demonstrate the effectiveness of AEG and to showcase how evaluations are conducted for research on emotion-based MIR. Directions of future work are also discussed.Comment: 40 pages, 18 figures, 5 tables, author versio
    corecore