35 research outputs found

    Self-Similarity-Based and Novelty-based loss for music structure analysis

    Full text link
    Music Structure Analysis (MSA) is the task aiming at identifying musical segments that compose a music track and possibly label them based on their similarity. In this paper we propose a supervised approach for the task of music boundary detection. In our approach we simultaneously learn features and convolution kernels. For this we jointly optimize -- a loss based on the Self-Similarity-Matrix (SSM) obtained with the learned features, denoted by SSM-loss, and -- a loss based on the novelty score obtained applying the learned kernels to the estimated SSM, denoted by novelty-loss. We also demonstrate that relative feature learning, through self-attention, is beneficial for the task of MSA. Finally, we compare the performances of our approach to previously proposed approaches on the standard RWC-Pop, and various subsets of SALAMI

    The Temperament Police: The Truth, the Ground Truth, and Nothing but the Truth

    Get PDF
    The tuning system of a keyboard instrument is chosen so that frequently used musical intervals sound as consonant as possible. Temperament refers to the compromise arising from the fact that not all intervals can be maximally consonant simultaneously. Recent work showed that it is possible to estimate temperament from audio recordings with no prior knowledge of the musical score, using a conservative (high precision, low recall) automatic transcription algorithm followed by frequency estimation using quadratic interpolation and bias correction from the log magnitude spectrum. In this paper we develop a harpsichord-specific transcription system to analyse over 500 recordings of solo harpsichord music for which the temperament is specified on the CD sleeve notes. We compare the measured temperaments with the annotations and discuss the differences between temperament as a theoretical construct and as a practical issue for professional performers and tuners. The implications are that ground truth is not always scientific truth, and that content-based analysis has an important role in the study of historical performance practice. 1

    Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription

    Get PDF
    In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated

    Music Boundary Detection using Convolutional Neural Networks: A comparative analysis of combined input features

    Get PDF
    The analysis of the structure of musical pieces is a task that remains a challenge for Artificial Intelligence, especially in the field of Deep Learning. It requires prior identification of structural boundaries of the music pieces. This structural boundary analysis has recently been studied with unsupervised methods and \textit{end-to-end} techniques such as Convolutional Neural Networks (CNN) using Mel-Scaled Log-magnitude Spectograms features (MLS), Self-Similarity Matrices (SSM) or Self-Similarity Lag Matrices (SSLM) as inputs and trained with human annotations. Several studies have been published divided into unsupervised and \textit{end-to-end} methods in which pre-processing is done in different ways, using different distance metrics and audio characteristics, so a generalized pre-processing method to compute model inputs is missing. The objective of this work is to establish a general method of pre-processing these inputs by comparing the inputs calculated from different pooling strategies, distance metrics and audio characteristics, also taking into account the computing time to obtain them. We also establish the most effective combination of inputs to be delivered to the CNN in order to establish the most efficient way to extract the limits of the structure of the music pieces. With an adequate combination of input matrices and pooling strategies we obtain a measurement accuracy F1F_1 of 0.411 that outperforms the current one obtained under the same conditions

    Towards a (better) Definition of Annotated MIR Corpora

    No full text
    International audienceToday, annotated MIR corpora are provided by various re- search labs or companies, each one using its own annota- tion methodology, concept definitions, and formats. This is not an issue as such. However, the lack of descriptions of the methodology used--how the corpus was actually an- notated, and by whom--and of the annotated concepts, i.e. what is actually described, is a problem with respect to the sustainability, usability, and sharing of the corpora. Ex- perience shows that it is essential to define precisely how annotations are supplied and described. We propose here a survey and consolidation report on the nature of the an- notated corpora used and shared in MIR, with proposals for the axis against which corpora can be described so to enable effective comparison and the inherent influence this has on tasks performed using them

    Subjective Similarity of Music: Data Collection for Individuality Analysis

    Get PDF
    Abstract-We describe a method of estimating subjective music similarity from acoustic music similarity. Recently, there have been many studies on the topic of music information retrieval, but there continues to be difficulty improving retrieval precision. For this reason, in this study we analyze the individuality of subjective music similarity. We collected subjective music similarity evaluation data for individuality analysis using songs in the RWC music database, a widely used database in the field of music information processing. A total of 27 subjects listened to pairs of music tracks, and evaluated each pair as similar or dissimilar. They also selected the components of the music (melody, tempo/rhythm, vocals, instruments) that were similar. Each subject evaluated the same 200 pairs of songs, thus the individuality of the evaluation can be easily analyzed. Using the collected data, we trained individualized distance functions between songs, in order to estimate subjective similarity and analyze individuality

    On the Use of Perceptual Properties for Melody Estimation

    Get PDF
    cote interne IRCAM: Liao11aInternational audienceThis paper is about the use of perceptual principles for melody estimation. The melody stream is understood as generated by the most dominant source. Since the source with the strongest energy may not be perceptually the most dominant one, it is proposed to study the perceptual properties for melody estimation: loudness, masking effect and timbre similarity. The related criteria are integrated into a melody estimation system and their respective contributions are evaluated. The effectiveness of these perceptual criteria is confirmed by the evaluation results using more than one hundred excerpts of music recordings
    corecore