Search CORE

80 research outputs found

Recommended from our members

Singing voice separation with deep U-Net convolutional networks

Author: Bittner R.
Humphrey E.
Jansson A.
Kumar A.
Montecchio N.
Weyde T.
Publication venue
Publication date: 23/10/2017
Field of study

The decomposition of a music audio signal into its vocal and backing track components is analogous to image-to-image translation, where a mixed spectrogram is transformed into its constituent sources. We propose a novel application of the U-Net architecture — initially developed for medical imaging — for the task of source separation, given its proven capacity for recreating the fine, low-level detail required for high-quality audio reproduction. Through both quantitative evaluation and subjective assessment, experiments demonstrate that the proposed algorithm achieves state-of-the-art performance

City Research Online

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Improving Structure Evaluation Through Automatic Hierarchy Expansion

Author: Kinnaird Katherine M.
McFee Brian
Publication venue: Smith ScholarWorks
Publication date: 01/11/2019
Field of study

Structural segmentation is the task of partitioning a recording into non-overlapping time intervals, and labeling each segment with an identifying marker such as A, B, or verse. Hierarchical structure annotation expands this idea to allow an annotator to segment a song with multiple levels of granularity. While there has been recent progress in developing evaluation criteria for comparing two hierarchical annotations of the same recording, the existing methods have known deficiencies when dealing with inexact label matchings and sequential label repetition. In this article, we investigate methods for automatically enhancing structural annotations by inferring (and expanding) hierarchical information from the segment labels. The proposed method complements existing techniques for comparing hierarchical structural annotations by coarsening or refining labels with variation markers to either collapse similarly labeled segments together, or separate identically labeled segments from each other. Using the multi-level structure annotations provided in the SALAMI dataset, we demonstrate that automatic hierarchy expansion allows structure comparison methods to more accurately assess similarity between annotations

Smith College: Smith ScholarWorks

A Comparison of Deep Learning Methods for Timbre Analysis in Polyphonic Automatic Music Transcription

Author: Beltrán Blázquez José Ramón
Hernández-López Carlos
Hernández-Oliván Carlos
Zay Pinilla Ignacio
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

Automatic music transcription (AMT) is a critical problem in the field of music information retrieval (MIR). When AMT is faced with deep neural networks, the variety of timbres of different instruments can be an issue that has not been studied in depth yet. The goal of this work is to address AMT transcription by analyzing how timbre affect monophonic transcription in a first approach based on the CREPE neural network and then to improve the results by performing polyphonic music transcription with different timbres with a second approach based on the Deep Salience model that performs polyphonic transcription based on the Constant-Q Transform. The results of the first method show that the timbre and envelope of the onsets have a high impact on the AMT results and the second method shows that the developed model is less dependent on the strength of the onsets than other state-of-the-art models that deal with AMT on piano sounds such as Google Magenta Onset and Frames (OaF). Our polyphonic transcription model for non-piano instruments outperforms the state-of-the-art model, such as for bass instruments, which has an F-score of 0.9516 versus 0.7102. In our latest experiment we also show how adding an onset detector to our model can outperform the results given in this work

Repositorio Universidad de Zaragoza

Pitchclass2vec: Symbolic Music Structure Segmentation with Chord Embeddings

Author: Lazzari Nicolas
Poltronieri Andrea
Presutti Valentina
Publication venue
Publication date: 01/01/2022
Field of study

Structure perception is a fundamental aspect of music cognition in humans. Historically, the hierarchical organization of music into structures served as a narrative device for conveying meaning, creating expectancy, and evoking emotions in the listener. Thereby, musical structures play an essential role in music composition, as they shape the musical discourse through which the composer organises his ideas. In this paper, we present a novel music segmentation method, pitchclass2vec, based on symbolic chord annotations, which are embedded into continuous vector representations using both natural language processing techniques and custom-made encodings. Our algorithm is based on long-short term memory (LSTM) neural network and outperforms the state-of-the-art techniques based on symbolic chord annotations in the field

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

A standard format proposal for hierarchical analyses and representations

Author: Marsden Alan Alexander
Rizo David
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/08/2016
Field of study

In the realm of digital musicology, standardizations efforts to date have mostly concentrated on the representation of music. Analyses of music are increasingly being generated or communicated by digital means. We demonstrate that the same arguments for the desirability of standardization in the representation of music apply also to the representation of analyses of music: proper preservation, sharing of data, and facilitation of digital processing. We concentrate here on analyses which can be described as hierarchical and show that this covers a broad range of existing analytical formats. We propose an extension of MEI (Music Encoding Initiative) to allow the encoding of analyses unambiguously associated with and aligned to a representation of the music analysed, making use of existing mechanisms within MEI's parent TEI (Text Encoding Initiative) for the representation of trees and graphs

Lancaster E-Prints

Deep Polyphonic ADSR Piano Note Transcription

Author: Böck Sebastian
Kelz Rainer
Widmer Gerhard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/06/2019
Field of study

We investigate a late-fusion approach to piano transcription, combined with a strong temporal prior in the form of a handcrafted Hidden Markov Model (HMM). The network architecture under consideration is compact in terms of its number of parameters and easy to train with gradient descent. The network outputs are fused over time in the final stage to obtain note segmentations, with an HMM whose transition probabilities are chosen based on a model of attack, decay, sustain, release (ADSR) envelopes, commonly used for sound synthesis. The note segments are then subject to a final binary decision rule to reject too weak note segment hypotheses. We obtain state-of-the-art results on the MAPS dataset, and are able to outperform other approaches by a large margin, when predicting complete note regions from onsets to offsets.Comment: 5 pages, 2 figures, published as ICASSP'1

arXiv.org e-Print Archive

Crossref

Recommended from our members

An RNN-based Music Language Model for Improving Automatic Music Transcription

Author: Benetos E.
Cherla S.
Dixon S.
Garcez A.
Sigtia S.
Weyde T.
Publication venue: International Society for Music Information Retrieval
Publication date: 01/01/2014
Field of study

In this paper, we investigate the use of Music Language Models (MLMs) for improving Automatic Music Transcription performance. The MLMs are trained on sequences of symbolic polyphonic music from the Nottingham dataset. We train Recurrent Neural Network (RNN)-based models, as they are capable of capturing complex temporal structure present in symbolic music data. Similar to the function of language models in automatic speech recognition, we use the MLMs to generate a prior probability for the occurrence of a sequence. The acoustic AMT model is based on probabilistic latent component analysis, and prior information from the MLM is incorporated into the transcription framework using Dirichlet priors. We test our hybrid models on a dataset of multiple-instrument polyphonic music and report a significant 3% improvement in terms of F-measure, when compared to using an acoustic-only model

City Research Online

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

AN EVALUATION OF AUDIO FEATURE EXTRACTION TOOLBOXES

Author: Moffat D
Reiss JD
Ronan D
Publication venue
Publication date: 18/01/2016
Field of study

Audio feature extraction underpins a massive proportion of audio processing, music information retrieval, audio effect design and audio synthesis. Design, analysis, synthesis and evaluation often rely on audio features, but there are a large and diverse range of feature extraction tools presented to the community. An evaluation of existing audio feature extraction libraries was undertaken. Ten libraries and toolboxes were evaluated with the Cranfield Model for evaluation of information retrieval systems, reviewing the cov-erage, effort, presentation and time lag of a system. Comparisons are undertaken of these tools and example use cases are presented as to when toolboxes are most suitable. This paper allows a soft-ware engineer or researcher to quickly and easily select a suitable audio feature extraction toolbox. 1

CiteSeerX

Queen Mary Research Online