Search CORE

2 research outputs found

Deep Polyphonic ADSR Piano Note Transcription

Author: Böck Sebastian
Kelz Rainer
Widmer Gerhard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/06/2019
Field of study

We investigate a late-fusion approach to piano transcription, combined with a strong temporal prior in the form of a handcrafted Hidden Markov Model (HMM). The network architecture under consideration is compact in terms of its number of parameters and easy to train with gradient descent. The network outputs are fused over time in the final stage to obtain note segmentations, with an HMM whose transition probabilities are chosen based on a model of attack, decay, sustain, release (ADSR) envelopes, commonly used for sound synthesis. The note segments are then subject to a final binary decision rule to reject too weak note segment hypotheses. We obtain state-of-the-art results on the MAPS dataset, and are able to outperform other approaches by a large margin, when predicting complete note regions from onsets to offsets.Comment: 5 pages, 2 figures, published as ICASSP'1

arXiv.org e-Print Archive

Crossref

A PARALLEL FUSION APPROACH TO PIANO MUSIC TRANSCRIPTION BASED ON CONVOLUTIONAL NEURAL NETWORK

Author: Cong F
Guo L
IEEE
Liu S
Wiggins GA
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

In this paper, a supervised approach based on Convolutional Neural Networks (CNN) for polyphonic piano transcription is presented. The system consists of pitch detection model, onset/offset detection model, and note search model. The pitch detection model is a single-channel CNN predicting the probabilities of pitches contained in one frame of the audio. The onset/offset model based on dual-channel CNN is used for estimating the probabilities of each pitch's onset or offset in a frame. The note search model is rule-based; it integrates the outputs of the pitch model and onset/offset model to determine the final onset, offset and pitch of notes in audio. Two experiments with different dataset conditions are accomplished to compare with state-of-the-art approaches on the same datasets. Experimental results reveal that the proposed approach preforms better in both frame- and note-based metrics

Crossref

Queen Mary Research Online