Search CORE

1,353 research outputs found

Automatic music transcription: challenges and future directions

Author: Anssi Klapuri
Anssi Klapuri
Dimitrios Giannoulis
E. Benetos
Emmanouil Benetos
Emmanouil Benetos
Holger Kirchhoff
Holger Kirchhoff
See Profile
Simon Dixon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Automatic music transcription is considered by many to be a key enabling technology in music signal processing. However, the performance of transcription systems is still significantly below that of a human expert, and accuracies reported in recent years seem to have reached a limit, although the field is still very active. In this paper we analyse limitations of current methods and identify promising directions for future research. Current transcription methods use general purpose models which are unable to capture the rich diversity found in music signals. One way to overcome the limited performance of transcription systems is to tailor algorithms to specific use-cases. Semi-automatic approaches are another way of achieving a more reliable transcription. Also, the wealth of musical scores and corresponding audio data now available are a rich potential source of training data, via forced alignment of audio to scores, but large scale utilisation of such data has yet to be attempted. Other promising approaches include the integration of information from multiple algorithms and different musical aspects

CiteSeerX

City Research Online

Queen Mary Research Online

Deep Learning for Audio Signal Processing

Author: Chang Shuo-yiin
Li Bo
Purwins Hendrik
Sainath Tara
Schlüter Jan
Virtanen Tuomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

arXiv.org e-Print Archive

VBN

Musical source separation using time-frequency source priors

Author: E. Vincent
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Coded excitation and sub-band processing for blood velocity estmation in medical ultrasound

Author: Gran Fredrik
Jensen Jørgen Arendt
Udesen Jesper
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2007
Field of study

Online Research Database In Technology

Monaural Audio Separation Using Spectral Template and Isolated Note Information

Author: Anil Lal
Wenwu Wang
Publication venue: 'IntechOpen'
Publication date: 10/10/2012
Field of study

IntechOpen

C-HiLasso: A Collaborative Hierarchical Sparse Modeling Framework

Author: Eldar Yonina
Ramírez Ignacio
Sapiro Guillermo
Sprechmann Pablo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Sparse modeling is a powerful framework for data analysis and processing. Traditionally, encoding in this framework is performed by solving an L1-regularized linear regression problem, commonly referred to as Lasso or Basis Pursuit. In this work we combine the sparsity-inducing property of the Lasso model at the individual feature level, with the block-sparsity property of the Group Lasso model, where sparse groups of features are jointly encoded, obtaining a sparsity pattern hierarchically structured. This results in the Hierarchical Lasso (HiLasso), which shows important practical modeling advantages. We then extend this approach to the collaborative case, where a set of simultaneously coded signals share the same sparsity pattern at the higher (group) level, but not necessarily at the lower (inside the group) level, obtaining the collaborative HiLasso model (C-HiLasso). Such signals then share the same active groups, or classes, but not necessarily the same active set. This model is very well suited for applications such as source identification and separation. An efficient optimization procedure, which guarantees convergence to the global optimum, is developed for these new models. The underlying presentation of the new framework and optimization approach is complemented with experimental examples and theoretical results regarding recovery guarantees for the proposed models

arXiv.org e-Print Archive

CiteSeerX

University of Minnesota Digital Conservancy

Recommended from our members

Signal separation of musical instruments: simulation-based methods for musical signal decomposition and transcription

Author: Walmsley Paul Jospeh
Publication venue: University of Cambridge
Publication date: 29/05/2001
Field of study

This thesis presents techniques for the modelling of musical signals, with particular regard to monophonic and polyphonic pitch estimation. Musical signals are modelled as a set of notes, each comprising of a set of harmonically-related sinusoids. An hierarchical model is presented that is very general and applicable to any signal that can be decomposed as the sum of basis functions. Parameter estimation is posed within a Bayesian framework, allowing for the incorporation of prior information about model parameters. The resulting posterior distribution is of variable dimension and so reversible jump MCMC simulation techniques are employed for the parameter estimation task. The extension of the model to time-varying signals with high posterior correlations between model parameters is described. The parameters and hyperparameters of several frames of data are estimated jointly to achieve a more robust detection. A general model for the description of time-varying homogeneous and heterogeneous multiple component signals is developed, and then applied to the analysis of musical signals. The importance of high level musical and perceptual psychological knowledge in the formulation of the model is highlighted, and attention is drawn to the limitation of pure signal processing techniques for dealing with musical signals. Gestalt psychological grouping principles motivate the hierarchical signal model, and component identifiability is considered in terms of perceptual streaming where each component establishes its own context. A major emphasis of this thesis is the practical application of MCMC techniques, which are generally deemed to be too slow for many applications. Through the design of efficient transition kernels highly optimised for harmonic models, and by careful choice of assumptions and approximations, implementations approaching the order of realtime are viable.Engineering and Physical Sciences Research Counci

Apollo (Cambridge)

A computational framework for sound segregation in music signals

Author: Martins Luís Gustavo Pereira Marques
Publication venue
Publication date: 01/01/2008
Field of study

Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200

Repositório Aberto da Universidade do Porto

The DESAM toolbox: spectral analysis of musical audio

Author: Badeau Roland
Bertin Nancy
Daudet Laurent
David Bertrand
Derrien Olivier
Echeveste Jose
Lagrange Mathieu
Marchand Sylvain
Publication venue: HAL CCSD
Publication date: 01/09/2010
Field of study

International audienceIn this paper is presented the DESAM Toolbox, a set of Matlab functions dedicated to the estimation of widely used spectral models for music signals. Although those models can be used in Music Information Retrieval (MIR) tasks, the core functions of the toolbox do not focus on any specific application. It is rather aimed at providing a range of state-of-the-art signal processing tools that decompose music files according to different signal models, giving rise to different ``mid-level'' representations. After motivating the need for such a toolbox, this paper offers an overview of the overall organization of the toolbox, and describes all available functionalities

HAL-CentraleSupelec

HAL AMU

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1