7,952 research outputs found

    Invariances and Data Augmentation for Supervised Music Transcription

    Full text link
    This paper explores a variety of models for frame-based music transcription, with an emphasis on the methods needed to reach state-of-the-art on human recordings. The translation-invariant network discussed in this paper, which combines a traditional filterbank with a convolutional neural network, was the top-performing model in the 2017 MIREX Multiple Fundamental Frequency Estimation evaluation. This class of models shares parameters in the log-frequency domain, which exploits the frequency invariance of music to reduce the number of model parameters and avoid overfitting to the training data. All models in this paper were trained with supervision by labeled data from the MusicNet dataset, augmented by random label-preserving pitch-shift transformations.Comment: 6 page

    Improving Trust in Deep Neural Networks with Nearest Neighbors

    Get PDF
    Deep neural networks are used increasingly for perception and decision-making in UAVs. For example, they can be used to recognize objects from images and decide what actions the vehicle should take. While deep neural networks can perform very well at complex tasks, their decisions may be unintuitive to a human operator. When a human disagrees with a neural network prediction, due to the black box nature of deep neural networks, it can be unclear whether the system knows something the human does not or whether the system is malfunctioning. This uncertainty is problematic when it comes to ensuring safety. As a result, it is important to develop technologies for explaining neural network decisions for trust and safety. This paper explores a modification to the deep neural network classification layer to produce both a predicted label and an explanation to support its prediction. Specifically, at test time, we replace the final output layer of the neural network classifier by a k-nearest neighbor classifier. The nearest neighbor classifier produces 1) a predicted label through voting and 2) the nearest neighbors involved in the prediction, which represent the most similar examples from the training dataset. Because prediction and explanation are derived from the same underlying process, this approach guarantees that the explanations are always relevant to the predictions. We demonstrate the approach on a convolutional neural network for a UAV image classification task. We perform experiments using a forest trail image dataset and show empirically that the hybrid classifier can produce intuitive explanations without loss of predictive performance compared to the original neural network. We also show how the approach can be used to help identify potential issues in the network and training process
    corecore