5 research outputs found
Deep Image Features in Music Information Retrieval
Applications of Convolutional Neural Networks (CNNs) to variousproblems have been the subject of a number of recent studiesranging from image classification and object detection to scene parsing, segmentation 3D volumetric images and action recognition in videos. In this study, the CNNs were applied to a Music Information Retrieval (MIR), in particular to musical genre recognition.The model was trained on ILSVRC-2012 (more than 1 million natural images) to perform image classification and was reused to perform genre classification using spectrograms images. Harmonic and percussion separation was applied, because it is characteristic formusical genre.At final stage, the evaluation of various strategies of merging Support Vector Machines (SVMs) was performed on well known in MIR community - GTZAN dataset.Even though, the model was trained on natural images, the results achieved in this studywere close to the state-of-the-art.
Music Artist Classification with WaveNet Classifier for Raw Waveform Audio Data
Models for music artist classification usually were operated in the frequency
domain, in which the input audio samples are processed by the spectral
transformation. The WaveNet architecture, originally designed for speech and
music generation. In this paper, we propose an end-to-end architecture in the
time domain for this task. A WaveNet classifier was introduced which directly
models the features from a raw audio waveform. The WaveNet takes the waveform
as the input and several downsampling layers are subsequent to discriminate
which artist the input belongs to. In addition, the proposed method is applied
to singer identification. The model achieving the best performance obtains an
average F1 score of 0.854 on benchmark dataset of Artist20, which is a
significant improvement over the related works. In order to show the
effectiveness of feature learning of the proposed method, the bottleneck layer
of the model is visualized.Comment: 12 page
AI and Tempo Estimation: A Review
The author's goal in this paper is to explore how artificial intelligence
(AI) has been utilised to inform our understanding of and ability to estimate
at scale a critical aspect of musical creativity - musical tempo. The central
importance of tempo to musical creativity can be seen in how it is used to
express specific emotions (Eerola and Vuoskoski 2013), suggest particular
musical styles (Li and Chan 2011), influence perception of expression (Webster
and Weir 2005) and mediate the urge to move one's body in time to the music
(Burger et al. 2014). Traditional tempo estimation methods typically detect
signal periodicities that reflect the underlying rhythmic structure of the
music, often using some form of autocorrelation of the amplitude envelope
(Lartillot and Toiviainen 2007). Recently, AI-based methods utilising
convolutional or recurrent neural networks (CNNs, RNNs) on spectral
representations of the audio signal have enjoyed significant improvements in
accuracy (Aarabi and Peeters 2022). Common AI-based techniques include those
based on probability (e.g., Bayesian approaches, hidden Markov models (HMM)),
classification and statistical learning (e.g., support vector machines (SVM)),
and artificial neural networks (ANNs) (e.g., self-organising maps (SOMs), CNNs,
RNNs, deep learning (DL)). The aim here is to provide an overview of some of
the more common AI-based tempo estimation algorithms and to shine a light on
notable benefits and potential drawbacks of each. Limitations of AI in this
field in general are also considered, as is the capacity for such methods to
account for idiosyncrasies inherent in tempo perception, i.e., how well
AI-based approaches are able to think and act like humans.Comment: 9 page