72,316 research outputs found
Identifying Cover Songs Using Information-Theoretic Measures of Similarity
This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/This paper investigates methods for quantifying similarity between audio signals, specifically for the task of cover song detection. We consider an information-theoretic approach, where we compute pairwise measures of predictability between time series. We compare discrete-valued approaches operating on quantized audio features, to continuous-valued approaches. In the discrete case, we propose a method for computing the normalized compression distance, where we account for correlation between time series. In the continuous case, we propose to compute information-based measures of similarity as statistics of the prediction error between time series. We evaluate our methods on two cover song identification tasks using a data set comprised of 300 Jazz standards and using the Million Song Dataset. For both datasets, we observe that continuous-valued approaches outperform discrete-valued approaches. We consider approaches to estimating the normalized compression distance (NCD) based on string compression and prediction, where we observe that our proposed normalized compression distance with alignment (NCDA) improves average performance over NCD, for sequential compression algorithms. Finally, we demonstrate that continuous-valued distances may be combined to improve performance with respect to baseline approaches. Using a large-scale filter-and-refine approach, we demonstrate state-of-the-art performance for cover song identification using the Million Song Dataset.The work of P. Foster was supported by an Engineering and Physical Sciences Research Council Doctoral Training Account studentship
Biometric presentation attack detection: beyond the visible spectrum
The increased need for unattended authentication in
multiple scenarios has motivated a wide deployment of biometric
systems in the last few years. This has in turn led to the
disclosure of security concerns specifically related to biometric
systems. Among them, presentation attacks (PAs, i.e., attempts
to log into the system with a fake biometric characteristic or
presentation attack instrument) pose a severe threat to the
security of the system: any person could eventually fabricate
or order a gummy finger or face mask to impersonate someone
else. In this context, we present a novel fingerprint presentation
attack detection (PAD) scheme based on i) a new capture device
able to acquire images within the short wave infrared (SWIR)
spectrum, and i i) an in-depth analysis of several state-of-theart
techniques based on both handcrafted and deep learning
features. The approach is evaluated on a database comprising
over 4700 samples, stemming from 562 different subjects and
35 different presentation attack instrument (PAI) species. The
results show the soundness of the proposed approach with a
detection equal error rate (D-EER) as low as 1.35% even in a
realistic scenario where five different PAI species are considered
only for testing purposes (i.e., unknown attacks
Deep Learning for Audio Signal Processing
Given the recent surge in developments of deep learning, this article
provides a review of the state-of-the-art deep learning techniques for audio
signal processing. Speech, music, and environmental sound processing are
considered side-by-side, in order to point out similarities and differences
between the domains, highlighting general methods, problems, key references,
and potential for cross-fertilization between areas. The dominant feature
representations (in particular, log-mel spectra and raw waveform) and deep
learning models are reviewed, including convolutional neural networks, variants
of the long short-term memory architecture, as well as more audio-specific
neural network models. Subsequently, prominent deep learning application areas
are covered, i.e. audio recognition (automatic speech recognition, music
information retrieval, environmental sound detection, localization and
tracking) and synthesis and transformation (source separation, audio
enhancement, generative models for speech, sound, and music synthesis).
Finally, key issues and future questions regarding deep learning applied to
audio signal processing are identified.Comment: 15 pages, 2 pdf figure
Composition of Jupiter irregular satellites sheds light on their origin
Irregular satellites of Jupiter with their highly eccentric, inclined and
distant orbits suggest that their capture took place just before the giant
planet migration. We aim to improve our understanding of the surface
composition of irregular satellites of Jupiter to gain insight into a narrow
time window when our Solar System was forming. We observed three Jovian
irregular satellites, Himalia, Elara, and Carme, using a medium-resolution
0.8-5.5 micro m spectrograph on the National Aeronautics and Space
Administration (NASA) Infrared Telescope Facility (IRTF). Using a linear
spectral unmixing model we have constrained the major mineral phases on the
surface of these three bodies. Our results confirm that the surface of Himalia,
Elara, and Carme are dominated by opaque materials such as those seen in
carbonaceous chondrite meteorites. Our spectral modeling of NIR spectra of
Himalia and Elara confirm that their surface composition is the same and
magnetite is the dominant mineral. A comparison of the spectral shape of
Himalia with the two large main C-type asteroids, Themis (D 176 km) and Europa
(D 352 km), suggests surface composition similar to Europa. The NIR spectrum of
Carme exhibits blue slope up to 1.5 microm and is spectrally distinct from
those of Himalia and Elara. Our model suggests that it is compositionally
similar to amorphous carbon. Himalia and Elara are compositionally similar but
differ significantly from Carme. These results support the hypotheses that the
Jupiter irregular satellites are captured bodies that were subject to further
breakup events and clustered as families based on their similar physical and
surface compositions
A Corpus-based Study Of Rhythm Patterns
We present a corpus-based study of musical rhythm, based on a collection of 4.8 million bar-length drum patterns extracted from 48,176 pieces of symbolic music. Approaches to the analysis of rhythm in music information retrieval to date have focussed on low-level features for retrieval or on the detection of tempo, beats and drums in audio recordings. Musicological approaches are usually concerned with the description or implementation of manmade music theories. In this paper, we present a quantitative bottom-up approach to the study of rhythm that relies upon well-understood statistical methods from natural language processing. We adapt these methods to our corpus of music, based on the realisation thatâunlike wordsâbarlength drum patterns can be systematically decomposed into sub-patterns both in time and by instrument. We show that, in some respects, our rhythm corpus behaves like natural language corpora, particularly in the sparsity of vocabulary. The same methods that detect word collocations allow us to quantify and rank idiomatic combinations of drum patterns. In other respects, our corpus has properties absent from language corpora, in particular, the high amount of repetition and strong mutual information rates between drum instruments. Our findings may be of direct interest to musicians and musicologists, and can inform the design of ground truth corpora and computational models of musical rhythm. 1
Uncertainty-Aware Organ Classification for Surgical Data Science Applications in Laparoscopy
Objective: Surgical data science is evolving into a research field that aims
to observe everything occurring within and around the treatment process to
provide situation-aware data-driven assistance. In the context of endoscopic
video analysis, the accurate classification of organs in the field of view of
the camera proffers a technical challenge. Herein, we propose a new approach to
anatomical structure classification and image tagging that features an
intrinsic measure of confidence to estimate its own performance with high
reliability and which can be applied to both RGB and multispectral imaging (MI)
data. Methods: Organ recognition is performed using a superpixel classification
strategy based on textural and reflectance information. Classification
confidence is estimated by analyzing the dispersion of class probabilities.
Assessment of the proposed technology is performed through a comprehensive in
vivo study with seven pigs. Results: When applied to image tagging, mean
accuracy in our experiments increased from 65% (RGB) and 80% (MI) to 90% (RGB)
and 96% (MI) with the confidence measure. Conclusion: Results showed that the
confidence measure had a significant influence on the classification accuracy,
and MI data are better suited for anatomical structure labeling than RGB data.
Significance: This work significantly enhances the state of art in automatic
labeling of endoscopic videos by introducing the use of the confidence metric,
and by being the first study to use MI data for in vivo laparoscopic tissue
classification. The data of our experiments will be released as the first in
vivo MI dataset upon publication of this paper.Comment: 7 pages, 6 images, 2 table
- âŚ