72,316 research outputs found

    Identifying Cover Songs Using Information-Theoretic Measures of Similarity

    Get PDF
    This work is licensed under a Creative Commons Attribution 3.0 License. For more information, see http://creativecommons.org/licenses/by/3.0/This paper investigates methods for quantifying similarity between audio signals, specifically for the task of cover song detection. We consider an information-theoretic approach, where we compute pairwise measures of predictability between time series. We compare discrete-valued approaches operating on quantized audio features, to continuous-valued approaches. In the discrete case, we propose a method for computing the normalized compression distance, where we account for correlation between time series. In the continuous case, we propose to compute information-based measures of similarity as statistics of the prediction error between time series. We evaluate our methods on two cover song identification tasks using a data set comprised of 300 Jazz standards and using the Million Song Dataset. For both datasets, we observe that continuous-valued approaches outperform discrete-valued approaches. We consider approaches to estimating the normalized compression distance (NCD) based on string compression and prediction, where we observe that our proposed normalized compression distance with alignment (NCDA) improves average performance over NCD, for sequential compression algorithms. Finally, we demonstrate that continuous-valued distances may be combined to improve performance with respect to baseline approaches. Using a large-scale filter-and-refine approach, we demonstrate state-of-the-art performance for cover song identification using the Million Song Dataset.The work of P. Foster was supported by an Engineering and Physical Sciences Research Council Doctoral Training Account studentship

    Biometric presentation attack detection: beyond the visible spectrum

    Full text link
    The increased need for unattended authentication in multiple scenarios has motivated a wide deployment of biometric systems in the last few years. This has in turn led to the disclosure of security concerns specifically related to biometric systems. Among them, presentation attacks (PAs, i.e., attempts to log into the system with a fake biometric characteristic or presentation attack instrument) pose a severe threat to the security of the system: any person could eventually fabricate or order a gummy finger or face mask to impersonate someone else. In this context, we present a novel fingerprint presentation attack detection (PAD) scheme based on i) a new capture device able to acquire images within the short wave infrared (SWIR) spectrum, and i i) an in-depth analysis of several state-of-theart techniques based on both handcrafted and deep learning features. The approach is evaluated on a database comprising over 4700 samples, stemming from 562 different subjects and 35 different presentation attack instrument (PAI) species. The results show the soundness of the proposed approach with a detection equal error rate (D-EER) as low as 1.35% even in a realistic scenario where five different PAI species are considered only for testing purposes (i.e., unknown attacks

    Deep Learning for Audio Signal Processing

    Full text link
    Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

    Composition of Jupiter irregular satellites sheds light on their origin

    Get PDF
    Irregular satellites of Jupiter with their highly eccentric, inclined and distant orbits suggest that their capture took place just before the giant planet migration. We aim to improve our understanding of the surface composition of irregular satellites of Jupiter to gain insight into a narrow time window when our Solar System was forming. We observed three Jovian irregular satellites, Himalia, Elara, and Carme, using a medium-resolution 0.8-5.5 micro m spectrograph on the National Aeronautics and Space Administration (NASA) Infrared Telescope Facility (IRTF). Using a linear spectral unmixing model we have constrained the major mineral phases on the surface of these three bodies. Our results confirm that the surface of Himalia, Elara, and Carme are dominated by opaque materials such as those seen in carbonaceous chondrite meteorites. Our spectral modeling of NIR spectra of Himalia and Elara confirm that their surface composition is the same and magnetite is the dominant mineral. A comparison of the spectral shape of Himalia with the two large main C-type asteroids, Themis (D 176 km) and Europa (D 352 km), suggests surface composition similar to Europa. The NIR spectrum of Carme exhibits blue slope up to 1.5 microm and is spectrally distinct from those of Himalia and Elara. Our model suggests that it is compositionally similar to amorphous carbon. Himalia and Elara are compositionally similar but differ significantly from Carme. These results support the hypotheses that the Jupiter irregular satellites are captured bodies that were subject to further breakup events and clustered as families based on their similar physical and surface compositions

    A Corpus-based Study Of Rhythm Patterns

    Get PDF
    We present a corpus-based study of musical rhythm, based on a collection of 4.8 million bar-length drum patterns extracted from 48,176 pieces of symbolic music. Approaches to the analysis of rhythm in music information retrieval to date have focussed on low-level features for retrieval or on the detection of tempo, beats and drums in audio recordings. Musicological approaches are usually concerned with the description or implementation of manmade music theories. In this paper, we present a quantitative bottom-up approach to the study of rhythm that relies upon well-understood statistical methods from natural language processing. We adapt these methods to our corpus of music, based on the realisation that—unlike words—barlength drum patterns can be systematically decomposed into sub-patterns both in time and by instrument. We show that, in some respects, our rhythm corpus behaves like natural language corpora, particularly in the sparsity of vocabulary. The same methods that detect word collocations allow us to quantify and rank idiomatic combinations of drum patterns. In other respects, our corpus has properties absent from language corpora, in particular, the high amount of repetition and strong mutual information rates between drum instruments. Our findings may be of direct interest to musicians and musicologists, and can inform the design of ground truth corpora and computational models of musical rhythm. 1

    Uncertainty-Aware Organ Classification for Surgical Data Science Applications in Laparoscopy

    Get PDF
    Objective: Surgical data science is evolving into a research field that aims to observe everything occurring within and around the treatment process to provide situation-aware data-driven assistance. In the context of endoscopic video analysis, the accurate classification of organs in the field of view of the camera proffers a technical challenge. Herein, we propose a new approach to anatomical structure classification and image tagging that features an intrinsic measure of confidence to estimate its own performance with high reliability and which can be applied to both RGB and multispectral imaging (MI) data. Methods: Organ recognition is performed using a superpixel classification strategy based on textural and reflectance information. Classification confidence is estimated by analyzing the dispersion of class probabilities. Assessment of the proposed technology is performed through a comprehensive in vivo study with seven pigs. Results: When applied to image tagging, mean accuracy in our experiments increased from 65% (RGB) and 80% (MI) to 90% (RGB) and 96% (MI) with the confidence measure. Conclusion: Results showed that the confidence measure had a significant influence on the classification accuracy, and MI data are better suited for anatomical structure labeling than RGB data. Significance: This work significantly enhances the state of art in automatic labeling of endoscopic videos by introducing the use of the confidence metric, and by being the first study to use MI data for in vivo laparoscopic tissue classification. The data of our experiments will be released as the first in vivo MI dataset upon publication of this paper.Comment: 7 pages, 6 images, 2 table
    • …
    corecore