13,697 research outputs found
Multimodal music information processing and retrieval: survey and future challenges
Towards improving the performance in various music information processing
tasks, recent studies exploit different modalities able to capture diverse
aspects of music. Such modalities include audio recordings, symbolic music
scores, mid-level representations, motion, and gestural data, video recordings,
editorial or cultural tags, lyrics and album cover arts. This paper critically
reviews the various approaches adopted in Music Information Processing and
Retrieval and highlights how multimodal algorithms can help Music Computing
applications. First, we categorize the related literature based on the
application they address. Subsequently, we analyze existing information fusion
approaches, and we conclude with the set of challenges that Music Information
Retrieval and Sound and Music Computing research communities should focus in
the next years
The relationship between IR and multimedia databases
Modern extensible database systems support multimedia data through ADTs. However, because of the problems with multimedia query formulation, this support is not sufficient.\ud
\ud
Multimedia querying requires an iterative search process involving many different representations of the objects in the database. The support that is needed is very similar to the processes in information retrieval.\ud
\ud
Based on this observation, we develop the miRRor architecture for multimedia query processing. We design a layered framework based on information retrieval techniques, to provide a usable query interface to the multimedia database.\ud
\ud
First, we introduce a concept layer to enable reasoning over low-level concepts in the database.\ud
\ud
Second, we add an evidential reasoning layer as an intermediate between the user and the concept layer.\ud
\ud
Third, we add the functionality to process the users' relevance feedback.\ud
\ud
We then adapt the inference network model from text retrieval to an evidential reasoning model for multimedia query processing.\ud
\ud
We conclude with an outline for implementation of miRRor on top of the Monet extensible database system
Recommended from our members
Non-Negative Tensor Factorization Applied to Music Genre Classification
Music genre classification techniques are typically applied to the data matrix whose columns are the feature vectors extracted from music recordings. In this paper, a feature vector is extracted using a texture window of one sec, which enables the representation of any 30 sec long music recording as a time sequence of feature vectors, thus yielding a feature matrix. Consequently, by stacking the feature matrices associated to any dataset recordings, a tensor is created, a fact which necessitates studying music genre classification using tensors. First, a novel algorithm for non-negative tensor factorization (NTF) is derived that extends the non-negative matrix factorization. Several variants of the NTF algorithm emerge by employing different cost functions from the class of Bregman divergences. Second, a novel supervised NTF classifier is proposed, which trains a basis for each class separately and employs basis orthogonalization. A variety of spectral, temporal, perceptual, energy, and pitch descriptors is extracted from 1000 recordings of the GTZAN dataset, which are distributed across 10 genre classes. The NTF classifier performance is compared against that of the multilayer perceptron and the support vector machines by applying a stratified 10-fold cross validation. A genre classification accuracy of 78.9% is reported for the NTF classifier demonstrating the superiority of the aforementioned multilinear classifier over several data matrix-based state-of-the-art classifiers
Inpainting of long audio segments with similarity graphs
We present a novel method for the compensation of long duration data loss in
audio signals, in particular music. The concealment of such signal defects is
based on a graph that encodes signal structure in terms of time-persistent
spectral similarity. A suitable candidate segment for the substitution of the
lost content is proposed by an intuitive optimization scheme and smoothly
inserted into the gap, i.e. the lost or distorted signal region. Extensive
listening tests show that the proposed algorithm provides highly promising
results when applied to a variety of real-world music signals
- …