23,288 research outputs found
Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval
Deep cross-modal learning has successfully demonstrated excellent performance in cross-modal multimedia retrieval, with the aim of learning joint representations between different data modalities. Unfortunately, little research focuses on cross-modal correlation learning where temporal structures of different data modalities such as audio and lyrics should be taken into account. Stemming from the characteristic of temporal structures of music in nature, we are motivated to learn the deep sequential correlation between audio and lyrics. In this work, we propose a deep cross-modal correlation learning architecture involving two-branch deep neural networks for audio modality and text modality (lyrics). Data in different modalities are converted to the same canonical space where inter modal canonical correlation analysis is utilized as an objective function to calculate the similarity of temporal structures. This is the first study that uses deep architectures for learning the temporal correlation between audio and lyrics. A pre-trained Doc2Vec model followed by fully-connected layers is used to represent lyrics. Two significant contributions are made in the audio branch, as follows: i) We propose an end-to-end network to learn cross-modal correlation between audio and lyrics, where feature extraction and correlation learning are simultaneously performed and joint representation is learned by considering temporal structures. ii) As for feature extraction, we further represent an audio signal by a short sequence of local summaries (VGG16 features) and apply a recurrent neural network to compute a compact feature that better learns temporal structures of music audio. Experimental results, using audio to retrieve lyrics or using lyrics to retrieve audio, verify the effectiveness of the proposed deep correlation learning architectures in cross-modal music retrieval
Multimodal music information processing and retrieval: survey and future challenges
Towards improving the performance in various music information processing
tasks, recent studies exploit different modalities able to capture diverse
aspects of music. Such modalities include audio recordings, symbolic music
scores, mid-level representations, motion, and gestural data, video recordings,
editorial or cultural tags, lyrics and album cover arts. This paper critically
reviews the various approaches adopted in Music Information Processing and
Retrieval and highlights how multimodal algorithms can help Music Computing
applications. First, we categorize the related literature based on the
application they address. Subsequently, we analyze existing information fusion
approaches, and we conclude with the set of challenges that Music Information
Retrieval and Sound and Music Computing research communities should focus in
the next years
Recommended from our members
Organising music for movies
Purpose - The purpose of this paper is to examine and discuss the classification of commercial popular music when large digital collections are organised for use in films.
Design/methodology/approach - A range of systems are investigated and their organization is discussed, focusing on an analysis of the metadata used by the systems and choices given to the end-user to construct a query. The indexing of the music is compared to a checklist of music facets which has been derived from recent musicological literature on semiotic analysis of popular music. These facets include aspects of communication, cultural and musical expression, codes and competences.
Findings -In addition to bibliographic detail, descriptive metadata is used to organise music in these systems. Genre, subject and mood are used widely; some musical facets also appear. The extent to which attempts are being made to reflect these facets in the organization of these systems is discussed. A number of recommendations are made which may help to improve this process.
Originality/value - This paper discusses an area of creative music search which has not previously been investigated in any depth and makes recommendations based on findings and the literature which may be used in the development of commercial systems as well as making a contribution to the literature
- …