Search CORE

31,690 research outputs found

ISOLATED INSTRUMENT TRANSCRIPTION USING A DEEP BELIEF NETWORK

Author: Abram Hindle
Gregory Burlet
Publication venue
Publication date: 24/04/2020
Field of study

ABSTRACT Automatic music transcription is a difficult task that has provoked extensive research on transcription systems that are predominantly general purpose, processing any number or type of instruments sounding simultaneously. This paper presents a polyphonic transcription system that is constrained to processing the output of a single instrument with an upper bound on polyphony. For example, a guitar has six strings and is limited to producing six notes simultaneously. The transcription system consists of a novel pitch estimation algorithm that uses a deep belief network and multi-label learning techniques to generate multiple pitch estimates for each audio analysis frame, such that the polyphony does not exceed that of the instrument. The implemented transcription system is evaluated on a compiled dataset of synthesized guitar recordings. Comparing these results to a prior single-instrument polyphonic transcription system that received exceptional results, this paper demonstrates the effectiveness of deep, multi-label learning for the task of polyphonic transcription

CiteSeerX

Table 5: Effect of number of hidden layers on transcription.

Author: Barbancho
Bello
Benetos
Benetos
Benetos
Bengio
Bengio
Bergstra
Boulanger-Lewandowski
Burlet
Burlet
Burlet
Costantini
Courbariaux
Dessein
Dixon
Hainsworth
Heijink
Hinton
Hinton
Hinton
Huang
Humphrey
Humphrey
Humphrey
Klapuri
Klapuri
Klapuri
Lee
Maher
Marolt
Martin
Moorer
Nam
Poliner
Radicioni
Radisavljevic
Raphael
Ryynänen
Sigtia
Singer
Smaragdis
Tang
Tuohy
Tuohy
Tzanetakis
Utgoff
Yeh
Zhang
Zhou
Zhou
Publication venue: 'PeerJ'
Publication date
Field of study

Crossref

An End-to-End Neural Network for Polyphonic Piano Music Transcription

Author: Benetos E
Dixon S
Sigtia S
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/02/2016
Field of study

We present a supervised neural network model for polyphonic piano music transcription. The architecture of the proposed model is analogous to speech recognition systems and comprises an acoustic model and a music language model. The acoustic model is a neural network used for estimating the probabilities of pitches in a frame of audio. The language model is a recurrent neural network that models the correlations between pitch combinations over time. The proposed model is general and can be used to transcribe polyphonic music without imposing any constraints on the polyphony. The acoustic and language model predictions are combined using a probabilistic graphical model. Inference over the output variables is performed using the beam search algorithm. We perform two sets of experiments. We investigate various neural network architectures for the acoustic models and also investigate the effect of combining acoustic and music language model predictions using the proposed architecture. We compare performance of the neural network based acoustic models with two popular unsupervised acoustic models. Results show that convolutional neural network acoustic models yields the best performance across all evaluation metrics. We also observe improved performance with the application of the music language models. Finally, we present an efficient variant of beam search that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications

arXiv.org e-Print Archive

Crossref

Queen Mary Research Online

An End-to-End Neural Network for Polyphonic Music Transcription

Author: Benetos E
Dixon S
Sigtia S
Publication venue: 'Center for Open Science'
Publication date: 18/11/2015
Field of study

We present a neural network model for polyphonic music transcription. The architecture of the proposed model is analogous to speech recognition systems and comprises an acoustic model and a music language mode}. The acoustic model is a neural network used for estimating the probabilities of pitches in a frame of audio. The language model is a recurrent neural network that models the correlations between pitch combinations over time. The proposed model is general and can be used to transcribe polyphonic music without imposing any constraints on the polyphony or the number or type of instruments. The acoustic and language model predictions are combined using a probabilistic graphical model. Inference over the output variables is performed using the beam search algorithm. We investigate various neural network architectures for the acoustic models and compare their performance to two popular state-of-the-art acoustic models. We also present an efficient variant of beam search that improves performance and reduces run-times by an order of magnitude, making the model suitable for real-time applications. We evaluate the model's performance on the MAPS dataset and show that the proposed model outperforms state-of-the-art transcription systems

Queen Mary Research Online

Deep Learning for Audio Signal Processing

Author: Chang Shuo-yiin
Li Bo
Purwins Hendrik
Sainath Tara
Schlüter Jan
Virtanen Tuomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

arXiv.org e-Print Archive

VBN

POLYPHONIC MUSIC SEQUENCE TRANSDUCTION WITH METER-CONSTRAINED LSTM NETWORKS

Author: Benetos E
IEEE
Ycart A
Publication venue
Publication date: 13/02/2018
Field of study

Queen Mary Research Online

16th Sound and Music Computing Conference SMC 2019 (28–31 May 2019, Malaga, Spain)

Author: A. Peinado
A.M. Barbancho
F. Avanzini
I. Barbancho
L.J. Tard&#243
S. Serafin
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

The 16th Sound and Music Computing Conference (SMC 2019) took place in Malaga, Spain, 28-31 May 2019 and it was organized by the Application of Information and Communication Technologies Research group (ATIC) of the University of Malaga (UMA). The SMC 2019 associated Summer School took place 25-28 May 2019. The First International Day of Women in Inclusive Engineering, Sound and Music Computing Research (WiSMC 2019) took place on 28 May 2019. The SMC 2019 TOPICS OF INTEREST included a wide selection of topics related to acoustics, psychoacoustics, music, technology for music, audio analysis, musicology, sonification, music games, machine learning, serious games, immersive audio, sound synthesis, etc

AIR Universita degli studi di Milano

VBN