9 research outputs found

    Deep Learning and Music Adversaries

    Get PDF
    OA Monitor ExerciseOA Monitor ExerciseAn {\em adversary} is essentially an algorithm intent on making a classification system perform in some particular way given an input, e.g., increase the probability of a false negative. Recent work builds adversaries for deep learning systems applied to image object recognition, which exploits the parameters of the system to find the minimal perturbation of the input image such that the network misclassifies it with high confidence. We adapt this approach to construct and deploy an adversary of deep learning systems applied to music content analysis. In our case, however, the input to the systems is magnitude spectral frames, which requires special care in order to produce valid input audio signals from network-derived perturbations. For two different train-test partitionings of two benchmark datasets, and two different deep architectures, we find that this adversary is very effective in defeating the resulting systems. We find the convolutional networks are more robust, however, compared with systems based on a majority vote over individually classified audio frames. Furthermore, we integrate the adversary into the training of new deep systems, but do not find that this improves their resilience against the same adversary

    The scientific evaluation of music content analysis systems: Valid empirical foundations for future real-world impact

    Get PDF
    We discuss the problem of music content analysis within the formal framework of experimental design

    Dataset artefacts in anti-spoofing systems: a case study on the ASVspoof 2017 benchmark

    No full text
    The Automatic Speaker Verification Spoofing and Countermeasures Challenges motivate research in protecting speech biometric systems against a variety of different access attacks. The 2017 edition focused on replay spoofing attacks, and involved participants building and training systems on a provided dataset (ASVspoof 2017). More than 60 research papers have so far been published with this dataset, but none have sought to answer why countermeasures appear successful in detecting spoofing attacks. This article shows how artefacts inherent to the dataset may be contributing to the apparent success of published systems. We first inspect the ASVspoof 2017 dataset and summarize various artefacts present in the dataset. Second, we demonstrate how countermeasure models can exploit these artefacts to appear successful in this dataset. Third, for reliable and robust performance estimates on this dataset we propose discarding nonspeech segments and silence before and after the speech utterance during training and inference. We create speech start and endpoint annotations in the dataset and demonstrate how using them helps countermeasure models become less vulnerable from being manipulated using artefacts found in the dataset. Finally, we provide several new benchmark results for both frame-level and utterance-level models that can serve as new baselines on this dataset

    ?`El Caballo Viejo? Latin Genre Recognition with Deep Learning and Spectral Periodicity

    No full text
    The ``winning'' system in the 2013 MIREX Latin Genre Classification Task was a deep neural network trained with simple features. An explanation for its winning performance has yet to be found. In previous work, we built similar systems using the {\em BALLROOM} music dataset, and found their performances to be greatly affected by slightly changing the tempo of the music of a test recording. In the MIREX task, however, systems are trained and tested using the {\em Latin Music Dataset (LMD)}, which is 4.5 times larger than {\em BALLROOM}, and which does not seem to show as strong a relationship between tempo and label as {\em BALLROOM}. In this paper, we reproduce the ``winning'' deep learning system using {\em LMD}, and measure the effects of time dilation on its performance. We find that tempo changes of at most ±6\pm 6\% greatly diminish and improve its performance. Interpreted with the low-level nature of the input features, this supports the conclusion that the system is exploiting some low-level absolute time characteristics to reproduce ground truth in {\em LMD}

    Working Toward Computer-Augmented Music Traditions

    No full text
    We discuss our work in modelling and generating music transcriptions using deep recurrent neural networks. In contrast to similar work, we focus on creating a rich evaluation methodology that seeks to address questions related to what a model has learned about the music, how useful it is for music practices, and its broader implications for music tradition. We engage with a specific homophonic music practice (session music), and present several examples of using our models for music composition in and out of the conventions of that idiom. We are currently exploring how these computer models can contribute to the tradition by engaging with its practitioner
    corecore