Search CORE

35 research outputs found

Self-Similarity-Based and Novelty-based loss for music structure analysis

Author: Peeters Geoffroy
Publication venue
Publication date: 05/09/2023
Field of study

Music Structure Analysis (MSA) is the task aiming at identifying musical segments that compose a music track and possibly label them based on their similarity. In this paper we propose a supervised approach for the task of music boundary detection. In our approach we simultaneously learn features and convolution kernels. For this we jointly optimize -- a loss based on the Self-Similarity-Matrix (SSM) obtained with the learned features, denoted by SSM-loss, and -- a loss based on the novelty score obtained applying the learned kernels to the estimated SSM, denoted by novelty-loss. We also demonstrate that relative feature learning, through self-attention, is beneficial for the task of MSA. Finally, we compare the performances of our approach to previously proposed approaches on the standard RWC-Pop, and various subsets of SALAMI

arXiv.org e-Print Archive

The Temperament Police: The Truth, the Ground Truth, and Nothing but the Truth

Author: Benetos E.
Dixon S.
Tidhar D.
Publication venue: 'University of Miami'
Publication date: 01/01/2011
Field of study

The tuning system of a keyboard instrument is chosen so that frequently used musical intervals sound as consonant as possible. Temperament refers to the compromise arising from the fact that not all intervals can be maximally consonant simultaneously. Recent work showed that it is possible to estimate temperament from audio recordings with no prior knowledge of the musical score, using a conservative (high precision, low recall) automatic transcription algorithm followed by frequency estimation using quadratic interpolation and bias correction from the log magnitude spectrum. In this paper we develop a harpsichord-specific transcription system to analyse over 500 recordings of solo harpsichord music for which the temperament is specified on the CD sleeve notes. We compare the measured temperaments with the annotations and discuss the differences between temperament as a theoretical construct and as a practical issue for professional performers and tuners. The implications are that ground truth is not always scientific truth, and that content-based analysis has an important role in the study of historical performance practice. 1

CiteSeerX

City Research Online

Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription

Author: Emmanouil Benetos
Simon Dixon
Student Member
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated

CiteSeerX

City Research Online

Crossref

Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features

Author: Beltran Jose R.
Diaz-Guerra David
Hernandez-Olivan Carlos
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 01/01/2021
Field of study

The analysis of the structure of musical pieces is a task that remains a challenge for Artificial Intelligence, especially in the field of Deep Learning. It requires prior identification of structural boundaries of the music pieces. This structural boundary analysis has recently been studied with unsupervised methods and \textit{end-to-end} techniques such as Convolutional Neural Networks (CNN) using Mel-Scaled Log-magnitude Spectograms features (MLS), Self-Similarity Matrices (SSM) or Self-Similarity Lag Matrices (SSLM) as inputs and trained with human annotations. Several studies have been published divided into unsupervised and \textit{end-to-end} methods in which pre-processing is done in different ways, using different distance metrics and audio characteristics, so a generalized pre-processing method to compute model inputs is missing. The objective of this work is to establish a general method of pre-processing these inputs by comparing the inputs calculated from different pooling strategies, distance metrics and audio characteristics, also taking into account the computing time to obtain them. We also establish the most effective combination of inputs to be delivered to the CNN in order to establish the most efficient way to extract the limits of the structure of the music pieces. With an adequate combination of input matrices and pooling strategies we obtain a measurement accuracy

F_1

of 0.411 that outperforms the current one obtained under the same conditions

arXiv.org e-Print Archive

Repositorio Universidad de Zaragoza

Re-UNIR

Towards a (better) Definition of Annotated MIR Corpora

Author: Fort Karen
Peeters Geoffroy
Publication venue: HAL CCSD
Publication date: 08/10/2012
Field of study

International audienceToday, annotated MIR corpora are provided by various re- search labs or companies, each one using its own annota- tion methodology, concept definitions, and formats. This is not an issue as such. However, the lack of descriptions of the methodology used--how the corpus was actually an- notated, and by whom--and of the annotated concepts, i.e. what is actually described, is a problem with respect to the sustainability, usability, and sharing of the corpora. Ex- perience shows that it is essential to define precisely how annotations are supplied and described. We propose here a survey and consolidation report on the nature of the an- notated corpora used and shared in MIR, with proposals for the axis against which corpora can be described so to enable effective comparison and the inherent influence this has on tasks performed using them

HAL-Paris 13

Subjective Similarity of Music: Data Collection for Individuality Analysis

Author: Chiyomi Miyajima
Kazuya Takeda
Norihide Kitaoka
Shota Kawabuchi
Publication venue
Publication date: 11/04/2020
Field of study

Abstract-We describe a method of estimating subjective music similarity from acoustic music similarity. Recently, there have been many studies on the topic of music information retrieval, but there continues to be difficulty improving retrieval precision. For this reason, in this study we analyze the individuality of subjective music similarity. We collected subjective music similarity evaluation data for individuality analysis using songs in the RWC music database, a widely used database in the field of music information processing. A total of 27 subjects listened to pairs of music tracks, and evaluated each pair as similar or dissimilar. They also selected the components of the music (melody, tempo/rhythm, vocals, instruments) that were similar. Each subject evaluated the same 200 pairs of songs, thus the individuality of the evaluation can be easily analyzed. Using the collected data, we trained individualized distance functions between songs, in order to estimate subjective similarity and analyze individuality

CiteSeerX

On the Use of Perceptual Properties for Melody Estimation

Author: Liao Wei-Hsiang
Roebel Axel
Su Alvin. Wen-Yu
Yeh Chunghsin
Publication venue: HAL CCSD
Publication date: 01/01/2011
Field of study

cote interne IRCAM: Liao11aInternational audienceThis paper is about the use of perceptual principles for melody estimation. The melody stream is understood as generated by the most dominant source. Since the source with the strongest energy may not be perceptually the most dominant one, it is proposed to study the perceptual properties for melody estimation: loudness, masking effect and timbre similarity. The related criteria are integrated into a melody estimation system and their respective contributions are evaluated. The effectiveness of these perceptual criteria is confirmed by the evaluation results using more than one hundred excerpts of music recordings