2,129 research outputs found
Large scale musical instrument identification
In this paper, automatic musical instrument identification using a variety of classifiers is addressed. Experiments are performed on a large set of recordings that stem from 20 instrument classes. Several features from general audio data classification applications as well as MPEG-7 descriptors are measured for 1000 recordings. Branch-and-bound feature selection is applied in order to select the most discriminating features for instrument classification. The first classifier is based on non-negative matrix factorization (NMF) techniques, where training is performed for each audio class individually. A novel NMF testing method is proposed, where each recording is projected onto several training matrices, which have been Gram-Schmidt orthogonalized. Several NMF variants are utilized besides the standard NMF method, such as the local NMF and the sparse NMF. In addition, 3-layered multilayer perceptrons, normalized Gaussian radial basis function networks, and support vector machines employing a polynomial kernel have also been tested as classifiers. The classification accuracy is high, ranging between 88.7% to 95.3%, outperforming the state-of-the-art techniques tested in the aforementioned experiment
Sequential Complexity as a Descriptor for Musical Similarity
We propose string compressibility as a descriptor of temporal structure in
audio, for the purpose of determining musical similarity. Our descriptors are
based on computing track-wise compression rates of quantised audio features,
using multiple temporal resolutions and quantisation granularities. To verify
that our descriptors capture musically relevant information, we incorporate our
descriptors into similarity rating prediction and song year prediction tasks.
We base our evaluation on a dataset of 15500 track excerpts of Western popular
music, for which we obtain 7800 web-sourced pairwise similarity ratings. To
assess the agreement among similarity ratings, we perform an evaluation under
controlled conditions, obtaining a rank correlation of 0.33 between intersected
sets of ratings. Combined with bag-of-features descriptors, we obtain
performance gains of 31.1% and 10.9% for similarity rating prediction and song
year prediction. For both tasks, analysis of selected descriptors reveals that
representing features at multiple time scales benefits prediction accuracy.Comment: 13 pages, 9 figures, 8 tables. Accepted versio
Clustering by compression
We present a new method for clustering based on compression. The method
doesn't use subject-specific features or background knowledge, and works as
follows: First, we determine a universal similarity distance, the normalized
compression distance or NCD, computed from the lengths of compressed data files
(singly and in pairwise concatenation). Second, we apply a hierarchical
clustering method. The NCD is universal in that it is not restricted to a
specific application area, and works across application area boundaries. A
theoretical precursor, the normalized information distance, co-developed by one
of the authors, is provably optimal but uses the non-computable notion of
Kolmogorov complexity. We propose precise notions of similarity metric, normal
compressor, and show that the NCD based on a normal compressor is a similarity
metric that approximates universality. To extract a hierarchy of clusters from
the distance matrix, we determine a dendrogram (binary tree) by a new quartet
method and a fast heuristic to implement it. The method is implemented and
available as public software, and is robust under choice of different
compressors. To substantiate our claims of universality and robustness, we
report evidence of successful application in areas as diverse as genomics,
virology, languages, literature, music, handwritten digits, astronomy, and
combinations of objects from completely different domains, using statistical,
dictionary, and block sorting compressors. In genomics we presented new
evidence for major questions in Mammalian evolution, based on
whole-mitochondrial genomic analysis: the Eutherian orders and the Marsupionta
hypothesis against the Theria hypothesis.Comment: LaTeX, 27 pages, 20 figure
Perceptual musical similarity metric learning with graph neural networks
Sound retrieval for assisted music composition depends on evaluating similarity between musical instrument sounds, which is partly influenced by playing techniques. Previous methods utilizing Euclidean nearest neighbours over acoustic features show some limitations in retrieving sounds sharing equivalent timbral properties, but potentially generated using a different instrument, playing technique, pitch or dynamic. In this paper, we present a metric learning system designed to approximate human similarity judgments between extended musical playing techniques using graph neural networks. Such structure is a natural candidate for solving similarity retrieval tasks, yet have seen little application in modelling perceptual music similarity. We optimize a Graph Convolutional Network (GCN) over acoustic features via a proxy metric learning loss to learn embeddings that reflect perceptual similarities. Specifically, we construct the graph's adjacency matrix from the acoustic data manifold with an example-wise adaptive k-nearest neighbourhood graph: Adaptive Neighbourhood Graph Neural Network (AN-GNN). Our approach achieves 96.4% retrieval accuracy compared to 38.5% with a Euclidean metric and 86.0% with a multilayer perceptron (MLP), while effectively considering retrievals from distinct playing techniques to the query example
Speaker-independent negative emotion recognition
This work aims to provide a method able to distinguish between negative and non-negative emotions in vocal interaction. A large pool of 1418 features is extracted for that purpose. Several of those features are tested in emotion recognition for the first time. Next, feature selection is applied separately to male and female utterances. In particular, a bidirectional Best First search with backtracking is applied. The first contribution is the demonstration that a significant number of features, first tested here, are retained after feature selection. The selected features are then fed as input to support vector machines with various kernel functions as well as to the K nearest neighbors classifier. The second contribution is in the speaker-independent experiments conducted in order to cope with the limited number of speakers present in the commonly used emotion speech corpora. Speaker-independent systems are known to be more robust and present a better generalization ability than the speaker-dependent ones. Experimental results are reported for the Berlin emotional speech database. The best performing classifier is found to be the support vector machine with the Gaussian radial basis function kernel. Correctly classified utterances are 86.73%±3.95% for male subjects and 91.73%±4.18% for female subjects. The last contribution is in the statistical analysis of the performance of the support vector machine classifier against the K nearest neighbors classifier as well as the statistical analysis of the various support vector machine kernels impact. © 2010 IEEE
Effect of nano black rice husk ash on the chemical and physical properties of porous concrete pavement
Black rice husk is a waste from this agriculture industry. It has been found that majority inorganic element in rice husk is silica. In this study, the effect of Nano from black rice husk ash (BRHA) on the chemical and physical properties of concrete pavement was investigated. The BRHA produced from uncontrolled burning at rice factory was taken. It was then been ground using laboratory mill with steel balls and steel rods. Four different grinding grades of BRHA were examined. A rice husk ash dosage of 10% by weight of binder was used throughout the experiments. The chemical and physical properties of the Nano BRHA mixtures were evaluated using fineness test, X-ray Fluorescence spectrometer (XRF) and X-ray diffraction (XRD). In addition, the compressive strength test was used to evaluate the performance of porous concrete pavement. Generally, the results show that the optimum grinding time was 63 hours. The result also indicated that the use of Nano black rice husk ash ground for 63hours produced concrete with good strengt
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions
This paper introduces a video dataset of spatio-temporally localized Atomic
Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual
actions in 430 15-minute video clips, where actions are localized in space and
time, resulting in 1.58M action labels with multiple labels per person
occurring frequently. The key characteristics of our dataset are: (1) the
definition of atomic visual actions, rather than composite actions; (2) precise
spatio-temporal annotations with possibly multiple annotations for each person;
(3) exhaustive annotation of these atomic actions over 15-minute video clips;
(4) people temporally linked across consecutive segments; and (5) using movies
to gather a varied set of action representations. This departs from existing
datasets for spatio-temporal action recognition, which typically provide sparse
annotations for composite actions in short video clips. We will release the
dataset publicly.
AVA, with its realistic scene and action complexity, exposes the intrinsic
difficulty of action recognition. To benchmark this, we present a novel
approach for action localization that builds upon the current state-of-the-art
methods, and demonstrates better performance on JHMDB and UCF101-24 categories.
While setting a new state of the art on existing datasets, the overall results
on AVA are low at 15.6% mAP, underscoring the need for developing new
approaches for video understanding.Comment: To appear in CVPR 2018. Check dataset page
https://research.google.com/ava/ for detail
- …