79 research outputs found
A regularity-constrained Viterbi algorithm and its application to the structural segmentation of songs
International audienceThis paper presents a general approach for the structural segmentation of songs. It is formalized as a cost optimization problem that combines properties of the musical content and prior regularity assumption on the segment length. A versatile implementation of this approach is proposed by means of a Viterbi algorithm, and the design of the costs are discussed. We then present two systems derived from this approach, based on acoustic and symbolic features respectively. The advantages of the regularity constraint are evaluated on a database of 100 popular songs by showing a significant improvement of the segmentation performance in terms of F-measure
Supplementary material to the article: Estimating the structural segmentation of popular music pieces under regularity constraints
This document gathers descriptions of the structural segmentation systems considered in the IEEE/ACM TASLP paper by the same authors
A music structure inference algorithm based on symbolic data analysis
International audienceThe present document describes a music structure inference algorithm submitted to the MIREX 2011 evaluation campaign (structural segmentation task). It consists of 3 stages : symbolic feature extraction, structural segment boundary estimation, and structural segment clustering. We consider as inputs chord estimations from the system of Ueda et al., expressed at the 2-beat scale. Beats and downbeats are estimated by the system of Davies et al. The structural segmentation step uses a regularity-constrained Viterbi approach. It assumes that the structure of pop songs is generally based on a few typical segments, whose sizes are called structural pulsation periods. The segments are then clustered according to their similarity, through the minimization of an adaptive model selection criterion
Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features
The analysis of the structure of musical pieces is a task that remains a
challenge for Artificial Intelligence, especially in the field of Deep
Learning. It requires prior identification of structural boundaries of the
music pieces. This structural boundary analysis has recently been studied with
unsupervised methods and \textit{end-to-end} techniques such as Convolutional
Neural Networks (CNN) using Mel-Scaled Log-magnitude Spectograms features
(MLS), Self-Similarity Matrices (SSM) or Self-Similarity Lag Matrices (SSLM) as
inputs and trained with human annotations. Several studies have been published
divided into unsupervised and \textit{end-to-end} methods in which
pre-processing is done in different ways, using different distance metrics and
audio characteristics, so a generalized pre-processing method to compute model
inputs is missing. The objective of this work is to establish a general method
of pre-processing these inputs by comparing the inputs calculated from
different pooling strategies, distance metrics and audio characteristics, also
taking into account the computing time to obtain them. We also establish the
most effective combination of inputs to be delivered to the CNN in order to
establish the most efficient way to extract the limits of the structure of the
music pieces. With an adequate combination of input matrices and pooling
strategies we obtain a measurement accuracy of 0.411 that outperforms the
current one obtained under the same conditions
Design of Soft Viterbi Algorithm Decoder Enhanced With Non-Transmittable Codewords for Storage Media
Viterbi Algorithm Decoder Enhanced with Non-transmittable Codewords is one of
the best decoding algorithm which effectively improves forward error correction
performance. HoweverViterbi decoder enhanced with NTCs is not yet designed to
work in storage media devices. Currently Reed Solomon (RS) Algorithm is almost
the dominant algorithm used in correcting error in storage media. Conversely,
recent studies show that there still exist low reliability of data in storage
media while the demand for storage media increases drastically. This study
proposes a design of the Soft Viterbi Algorithm decoder enhanced with
Non-transmittable Codewords (SVAD-NTCs) to be used in storage media for error
correction. Matlab simulation was used in this design in order to investigate
behavior and effectiveness of SVAD-NTCs in correcting errors in data retrieving
from storage media.Sample data of one million bits are randomly generated,
Additive White Gaussian Noise (AWGN) was used as data distortion model and
Binary Phase- Shift Keying (BPSK) was applied for simulation modulation.
Results show that,behaviors of SVAD-NTC performance increase as you increase
the NTCs, but beyond 6NTCs there is no significant change and SVAD-NTCs design
drastically reduce the total residual error from 216,878 of Reed Solomon to
23,900
Semiotic Description of Music Structure: an Introduction to the Quaero/Metiss Structural Annotations
12 pagesInternational audienceInterest has been steadily growing in semantic audio and music information retrieval for the description of music structure, i.e., the global organization of music pieces in terms of large-scale structural units. This article presents a detailed methodology for the semiotic description of music structure, based on concepts and criteria which are formulated as generically as possible. We sum up the essential principles and practices developed during an annotation effort deployed by our research group (Metiss) on audio data in the context of the Quaero project, which has led to the public release of over 380 annotations of pop songs from three different data sets. The paper also includes a few case studies and a concise statistical overview of the annotated data
Self-Similarity-Based and Novelty-based loss for music structure analysis
Music Structure Analysis (MSA) is the task aiming at identifying musical
segments that compose a music track and possibly label them based on their
similarity. In this paper we propose a supervised approach for the task of
music boundary detection. In our approach we simultaneously learn features and
convolution kernels. For this we jointly optimize -- a loss based on the
Self-Similarity-Matrix (SSM) obtained with the learned features, denoted by
SSM-loss, and -- a loss based on the novelty score obtained applying the
learned kernels to the estimated SSM, denoted by novelty-loss. We also
demonstrate that relative feature learning, through self-attention, is
beneficial for the task of MSA. Finally, we compare the performances of our
approach to previously proposed approaches on the standard RWC-Pop, and various
subsets of SALAMI
Methodological and musicological investigation of the System & Contrast model for musical form description
The semiotic description of music structure aims at representing the high-level organization of music pieces in a concise, generic and reproducible way as a low-rate stream of arbitrary symbols from a limited alphabet, which results into a sequence of " semiotic units ". In this context, the purpose of the System & Contrast model is to address the internal organization of the semiotic units. In this report, the System & Contrast model is approached from different angles in relation to varied disciplines : cognitive psychology, music analysis and information theory. After establishing a number of links between the System & Contrast model and other approaches of music structure, the model is illustrated on studio-based popular music pieces, as well as on music from the classical Viennese period
A Cross-Cultural Analysis of Music Structure
PhDMusic signal analysis is a research field concerning the extraction of meaningful information
from musical audio signals. This thesis analyses the music signals from the note-level
to the song-level in a bottom-up manner and situates the research in two Music information
retrieval (MIR) problems: audio onset detection (AOD) and music structural
segmentation (MSS).
Most MIR tools are developed for and evaluated on Western music with specific musical
knowledge encoded. This thesis approaches the investigated tasks from a cross-cultural
perspective by developing audio features and algorithms applicable for both Western and
non-Western genres. Two Chinese Jingju databases are collected to facilitate respectively
the AOD and MSS tasks investigated.
New features and algorithms for AOD are presented relying on fusion techniques. We
show that fusion can significantly improve the performance of the constituent baseline
AOD algorithms. A large-scale parameter analysis is carried out to identify the relations
between system configurations and the musical properties of different music types.
Novel audio features are developed to summarise music timbre, harmony and rhythm for
its structural description. The new features serve as effective alternatives to commonly
used ones, showing comparable performance on existing datasets, and surpass them on
the Jingju dataset. A new segmentation algorithm is presented which effectively captures
the structural characteristics of Jingju. By evaluating the presented audio features and
different segmentation algorithms incorporating different structural principles for the
investigated music types, this thesis also identifies the underlying relations between audio
features, segmentation methods and music genres in the scenario of music structural
analysis.China Scholarship Council
EPSRC C4DM Travel Funding,
EPSRC Fusing Semantic and Audio Technologies for Intelligent Music Production and
Consumption (EP/L019981/1),
EPSRC Platform Grant on Digital Music (EP/K009559/1),
European Research Council project CompMusic, International Society for Music Information Retrieval Student Grant,
QMUL Postgraduate Research Fund,
QMUL-BUPT Joint Programme Funding
Women in Music Information Retrieval Grant
Proceedings of the 7th Sound and Music Computing Conference
Proceedings of the SMC2010 - 7th Sound and Music Computing Conference, July 21st - July 24th 2010
- âŠ