5 research outputs found

    Deep Image Features in Music Information Retrieval

    Get PDF
    Applications of Convolutional Neural Networks (CNNs) to variousproblems have been the subject of a number of recent studiesranging from image classification and object detection to scene parsing, segmentation 3D volumetric images and action recognition in videos. In this study, the CNNs were applied to a Music Information Retrieval (MIR), in particular to musical genre recognition.The model was trained on ILSVRC-2012 (more than 1 million natural images) to perform image classification and was reused to perform genre classification using spectrograms images. Harmonic and percussion separation was applied, because it is characteristic formusical genre.At final stage, the evaluation of various strategies of merging Support Vector Machines (SVMs) was performed on well known in MIR community - GTZAN dataset.Even though, the model was trained on natural images, the results achieved in this studywere close to the state-of-the-art.

    Music Artist Classification with WaveNet Classifier for Raw Waveform Audio Data

    Full text link
    Models for music artist classification usually were operated in the frequency domain, in which the input audio samples are processed by the spectral transformation. The WaveNet architecture, originally designed for speech and music generation. In this paper, we propose an end-to-end architecture in the time domain for this task. A WaveNet classifier was introduced which directly models the features from a raw audio waveform. The WaveNet takes the waveform as the input and several downsampling layers are subsequent to discriminate which artist the input belongs to. In addition, the proposed method is applied to singer identification. The model achieving the best performance obtains an average F1 score of 0.854 on benchmark dataset of Artist20, which is a significant improvement over the related works. In order to show the effectiveness of feature learning of the proposed method, the bottleneck layer of the model is visualized.Comment: 12 page

    AI and Tempo Estimation: A Review

    Full text link
    The author's goal in this paper is to explore how artificial intelligence (AI) has been utilised to inform our understanding of and ability to estimate at scale a critical aspect of musical creativity - musical tempo. The central importance of tempo to musical creativity can be seen in how it is used to express specific emotions (Eerola and Vuoskoski 2013), suggest particular musical styles (Li and Chan 2011), influence perception of expression (Webster and Weir 2005) and mediate the urge to move one's body in time to the music (Burger et al. 2014). Traditional tempo estimation methods typically detect signal periodicities that reflect the underlying rhythmic structure of the music, often using some form of autocorrelation of the amplitude envelope (Lartillot and Toiviainen 2007). Recently, AI-based methods utilising convolutional or recurrent neural networks (CNNs, RNNs) on spectral representations of the audio signal have enjoyed significant improvements in accuracy (Aarabi and Peeters 2022). Common AI-based techniques include those based on probability (e.g., Bayesian approaches, hidden Markov models (HMM)), classification and statistical learning (e.g., support vector machines (SVM)), and artificial neural networks (ANNs) (e.g., self-organising maps (SOMs), CNNs, RNNs, deep learning (DL)). The aim here is to provide an overview of some of the more common AI-based tempo estimation algorithms and to shine a light on notable benefits and potential drawbacks of each. Limitations of AI in this field in general are also considered, as is the capacity for such methods to account for idiosyncrasies inherent in tempo perception, i.e., how well AI-based approaches are able to think and act like humans.Comment: 9 page
    corecore