2,189 research outputs found
Pop Music Highlighter: Marking the Emotion Keypoints
The goal of music highlight extraction is to get a short consecutive segment
of a piece of music that provides an effective representation of the whole
piece. In a previous work, we introduced an attention-based convolutional
recurrent neural network that uses music emotion classification as a surrogate
task for music highlight extraction, for Pop songs. The rationale behind that
approach is that the highlight of a song is usually the most emotional part.
This paper extends our previous work in the following two aspects. First,
methodology-wise we experiment with a new architecture that does not need any
recurrent layers, making the training process faster. Moreover, we compare a
late-fusion variant and an early-fusion variant to study which one better
exploits the attention mechanism. Second, we conduct and report an extensive
set of experiments comparing the proposed attention-based methods against a
heuristic energy-based method, a structural repetition-based method, and a few
other simple feature-based methods for this task. Due to the lack of
public-domain labeled data for highlight extraction, following our previous
work we use the RWC POP 100-song data set to evaluate how the detected
highlights overlap with any chorus sections of the songs. The experiments
demonstrate the effectiveness of our methods over competing methods. For
reproducibility, we open source the code and pre-trained model at
https://github.com/remyhuang/pop-music-highlighter/.Comment: Transactions of the ISMIR vol. 1, no.
Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features
The analysis of the structure of musical pieces is a task that remains a
challenge for Artificial Intelligence, especially in the field of Deep
Learning. It requires prior identification of structural boundaries of the
music pieces. This structural boundary analysis has recently been studied with
unsupervised methods and \textit{end-to-end} techniques such as Convolutional
Neural Networks (CNN) using Mel-Scaled Log-magnitude Spectograms features
(MLS), Self-Similarity Matrices (SSM) or Self-Similarity Lag Matrices (SSLM) as
inputs and trained with human annotations. Several studies have been published
divided into unsupervised and \textit{end-to-end} methods in which
pre-processing is done in different ways, using different distance metrics and
audio characteristics, so a generalized pre-processing method to compute model
inputs is missing. The objective of this work is to establish a general method
of pre-processing these inputs by comparing the inputs calculated from
different pooling strategies, distance metrics and audio characteristics, also
taking into account the computing time to obtain them. We also establish the
most effective combination of inputs to be delivered to the CNN in order to
establish the most efficient way to extract the limits of the structure of the
music pieces. With an adequate combination of input matrices and pooling
strategies we obtain a measurement accuracy of 0.411 that outperforms the
current one obtained under the same conditions
Barwise Music Structure Analysis with the Correlation Block-Matching Segmentation Algorithm
Music Structure Analysis (MSA) is a Music Information Retrieval task
consisting of representing a song in a simplified, organized manner by breaking
it down into sections typically corresponding to ``chorus'', ``verse'',
``solo'', etc. In this work, we extend an MSA algorithm called the Correlation
Block-Matching (CBM) algorithm introduced by (Marmoret et al., 2020, 2022b).
The CBM algorithm is a dynamic programming algorithm that segments
self-similarity matrices, which are a standard description used in MSA and in
numerous other applications. In this work, self-similarity matrices are
computed from the feature representation of an audio signal and time is sampled
at the bar-scale. This study examines three different standard similarity
functions for the computation of self-similarity matrices. Results show that,
in optimal conditions, the proposed algorithm achieves a level of performance
which is competitive with supervised state-of-the-art methods while only
requiring knowledge of bar positions. In addition, the algorithm is made
open-source and is highly customizable.Comment: 19 pages, 13 figures, 11 tables, 1 algorithm, published in
Transactions of the International Society for Music Information Retrieva
- …