8 research outputs found
AI and Tempo Estimation: A Review
The author's goal in this paper is to explore how artificial intelligence
(AI) has been utilised to inform our understanding of and ability to estimate
at scale a critical aspect of musical creativity - musical tempo. The central
importance of tempo to musical creativity can be seen in how it is used to
express specific emotions (Eerola and Vuoskoski 2013), suggest particular
musical styles (Li and Chan 2011), influence perception of expression (Webster
and Weir 2005) and mediate the urge to move one's body in time to the music
(Burger et al. 2014). Traditional tempo estimation methods typically detect
signal periodicities that reflect the underlying rhythmic structure of the
music, often using some form of autocorrelation of the amplitude envelope
(Lartillot and Toiviainen 2007). Recently, AI-based methods utilising
convolutional or recurrent neural networks (CNNs, RNNs) on spectral
representations of the audio signal have enjoyed significant improvements in
accuracy (Aarabi and Peeters 2022). Common AI-based techniques include those
based on probability (e.g., Bayesian approaches, hidden Markov models (HMM)),
classification and statistical learning (e.g., support vector machines (SVM)),
and artificial neural networks (ANNs) (e.g., self-organising maps (SOMs), CNNs,
RNNs, deep learning (DL)). The aim here is to provide an overview of some of
the more common AI-based tempo estimation algorithms and to shine a light on
notable benefits and potential drawbacks of each. Limitations of AI in this
field in general are also considered, as is the capacity for such methods to
account for idiosyncrasies inherent in tempo perception, i.e., how well
AI-based approaches are able to think and act like humans.Comment: 9 page
Recommended from our members
A computational study on outliers in world music
The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as ‘outliers’. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the ‘uniqueness’ of the music of each country
Final Research Report on Auto-Tagging of Music
The deliverable D4.7 concerns the work achieved by IRCAM until M36 for the “auto-tagging of music”. The deliverable is a research report. The software libraries resulting from the research have been integrated into Fincons/HearDis! Music Library Manager or are used by TU Berlin. The final software libraries are described in D4.5.
The research work on auto-tagging has concentrated on four aspects:
1) Further improving IRCAM’s machine-learning system ircamclass. This has been done by developing the new MASSS audio features, including audio augmentation and audio segmentation into ircamclass. The system has then been applied to train HearDis! “soft” features (Vocals-1, Vocals-2, Pop-Appeal, Intensity, Instrumentation, Timbre, Genre, Style). This is described in Part 3.
2) Developing two sets of “hard” features (i.e. related to musical or musicological concepts) as specified by HearDis! (for integration into Fincons/HearDis! Music Library Manager) and TU Berlin (as input for the prediction model of the GMBI attributes). Such features are either derived from previously estimated higher-level concepts (such as structure, key or succession of chords) or by developing new signal processing algorithm (such as HPSS) or main melody estimation. This is described in Part 4.
3) Developing audio features to characterize the audio quality of a music track. The goal is to describe the quality of the audio independently of its apparent encoding. This is then used to estimate audio degradation or music decade. This is to be used to ensure that playlists contain tracks with similar audio quality. This is described in Part 5.
4) Developing innovative algorithms to extract specific audio features to improve music mixes. So far, innovative techniques (based on various Blind Audio Source Separation algorithms and Convolutional Neural Network) have been developed for singing voice separation, singing voice segmentation, music structure boundaries estimation, and DJ cue-region estimation. This is described in Part 6.EC/H2020/688122/EU/Artist-to-Business-to-Business-to-Consumer Audio Branding System/ABC D
Scale transform in rhythmic similarity of music
As a special case of the Mellin transform, the scale transform has been applied in various signal processing areas, in order to get a signal description that is invariant to scale changes. In this paper, the scale transform is applied to autocorrelation sequences derived from music signals. It is shown that two such sequences, when derived from similar rhythms with different tempo, differ mainly by a scaling factor. By using the scale transform, the proposed descriptors are robust to tempo changes, and are specially suited for the comparison of pieces with different tempi but similar rhythm. As music with such characteristics is widely encountered in traditional forms of music, the performance of the descriptors in a classification task of Greek traditional dances and Turkish traditional songs is evaluated. On these datasets accuracies compared to non-tempo robust approaches are improved by more than 20%, while on a dataset of Western music the achieved accuracy improves compared to previously presented results.QC 20161031</p