Search CORE

32,109 research outputs found

Sequential Complexity as a Descriptor for Musical Similarity

Author: Dixon S
Foster P
Mauch M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

We propose string compressibility as a descriptor of temporal structure in audio, for the purpose of determining musical similarity. Our descriptors are based on computing track-wise compression rates of quantised audio features, using multiple temporal resolutions and quantisation granularities. To verify that our descriptors capture musically relevant information, we incorporate our descriptors into similarity rating prediction and song year prediction tasks. We base our evaluation on a dataset of 15500 track excerpts of Western popular music, for which we obtain 7800 web-sourced pairwise similarity ratings. To assess the agreement among similarity ratings, we perform an evaluation under controlled conditions, obtaining a rank correlation of 0.33 between intersected sets of ratings. Combined with bag-of-features descriptors, we obtain performance gains of 31.1% and 10.9% for similarity rating prediction and song year prediction. For both tasks, analysis of selected descriptors reveals that representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queen Mary Research Online

Recommended from our members

A database and challenge for acoustic scene classification and event detection

Author: Benetos E.
Giannoulis D.
Lagrange M.
Plumbley M. D.
Rossignol M.
Stowell D.
Publication venue
Publication date: 01/01/2013
Field of study

City Research Online

Weakly Labelled AudioSet Tagging with Attention Neural Networks

Author: Iqbal Turab
Kong Qiuqiang
Plumbley Mark D.
Wang Wenwu
Xu Yong
Yu Changsong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/08/2019
Field of study

Audio tagging is the task of predicting the presence or absence of sound classes within an audio clip. Previous work in audio tagging focused on relatively small datasets limited to recognising a small number of sound classes. We investigate audio tagging on AudioSet, which is a dataset consisting of over 2 million audio clips and 527 classes. AudioSet is weakly labelled, in that only the presence or absence of sound classes is known for each clip, while the onset and offset times are unknown. To address the weakly-labelled audio tagging problem, we propose attention neural networks as a way to attend the most salient parts of an audio clip. We bridge the connection between attention neural networks and multiple instance learning (MIL) methods, and propose decision-level and feature-level attention neural networks for audio tagging. We investigate attention neural networks modeled by different functions, depths and widths. Experiments on AudioSet show that the feature-level attention neural network achieves a state-of-the-art mean average precision (mAP) of 0.369, outperforming the best multiple instance learning (MIL) method of 0.317 and Google's deep neural network baseline of 0.314. In addition, we discover that the audio tagging performance on AudioSet embedding features has a weak correlation with the number of training samples and the quality of labels of each sound class.Comment: 13 page

arXiv.org e-Print Archive

University of Surrey

Surrey Research Insight

IDENTIFICATION OF COVER SONGS USING INFORMATION THEORETIC MEASURES OF SIMILARITY

Author: Dixon S
Foster P
IEEE
Klapuri A
Publication venue
Publication date: 01/01/2013
Field of study

13 pages, 5 figures, 4 tables. v3: Accepted version13 pages, 5 figures, 4 tables. v3: Accepted version13 pages, 5 figures, 4 tables. v3: Accepted versio

Queen Mary Research Online

On Classification with Bags, Groups and Sets

Author: Cheplygina Veronika
Loog Marco
Tax David M. J.
Publication venue: 'Elsevier BV'
Publication date: 07/10/2014
Field of study

Many classification problems can be difficult to formulate directly in terms of the traditional supervised setting, where both training and test samples are individual feature vectors. There are cases in which samples are better described by sets of feature vectors, that labels are only available for sets rather than individual samples, or, if individual labels are available, that these are not independent. To better deal with such problems, several extensions of supervised learning have been proposed, where either training and/or test objects are sets of feature vectors. However, having been proposed rather independently of each other, their mutual similarities and differences have hitherto not been mapped out. In this work, we provide an overview of such learning scenarios, propose a taxonomy to illustrate the relationships between them, and discuss directions for further research in these areas

arXiv.org e-Print Archive

CiteSeerX

Copenhagen University Research Information System

Multi-label Ferns for Efficient Recognition of Musical Instruments in Recordings

Author: A.A. Wieczorkowska
D. Niewiadomy
E..z. Kubera
J.G.A. Barbedo
K. Kashino
L. Breiman
S. Essid
T. Kitahara
W. Jiang
Publication venue
Publication date: 01/01/2014
Field of study

In this paper we introduce multi-label ferns, and apply this technique for automatic classification of musical instruments in audio recordings. We compare the performance of our proposed method to a set of binary random ferns, using jazz recordings as input data. Our main result is obtaining much faster classification and higher F-score. We also achieve substantial reduction of the model size

arXiv.org e-Print Archive

Crossref