Search CORE

40,598 research outputs found

Optical Music Recognition with Convolutional Sequence-to-Sequence Models

Author: Ullrich Karen
van der Wel Eelco
Publication venue
Publication date: 16/07/2017
Field of study

Optical Music Recognition (OMR) is an important technology within Music Information Retrieval. Deep learning models show promising results on OMR tasks, but symbol-level annotated data sets of sufficient size to train such models are not available and difficult to develop. We present a deep learning architecture called a Convolutional Sequence-to-Sequence model to both move towards an end-to-end trainable OMR pipeline, and apply a learning process that trains on full sentences of sheet music instead of individually labeled symbols. The model is trained and evaluated on a human generated data set, with various image augmentations based on real-world scenarios. This data set is the first publicly available set in OMR research with sufficient size to train and evaluate deep learning models. With the introduced augmentations a pitch recognition accuracy of 81% and a duration accuracy of 94% is achieved, resulting in a note level accuracy of 80%. Finally, the model is compared to commercially available methods, showing a large improvements over these applications.Comment: ISMIR 201

arXiv.org e-Print Archive

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Recognition of Harmonic Sounds in Polyphonic Audio using a Missing Feature Approach: Extended Report

Author: Giannoulis Dimitrios
Klapuri Anssi
Plumbley Mark
Publication venue
Publication date: 01/01/2013
Field of study

A method based on local spectral features and missing feature techniques is proposed for the recognition of harmonic sounds in mixture signals. A mask estimation algorithm is proposed for identifying spectral regions that contain reliable information for each sound source and then bounded marginalization is employed to treat the feature vector elements that are determined as unreliable. The proposed method is tested on musical instrument sounds due to the extensive availability of data but it can be applied on other sounds (i.e. animal sounds, environmental sounds), whenever these are harmonic. In simulations the proposed method clearly outperformed a baseline method for mixture signals

Crossref

University of Surrey

Queen Mary Research Online

Surrey Research Insight

Musical notes classification with Neuromorphic Auditory System using FPGA and a Convolutional Spiking Network

Author: Cerezuela Escudero Elena
Domínguez Morales Manuel Jesús
Jiménez Fernández Ángel Francisco
Jiménez Moreno Gabriel
Linares Barranco Alejandro
Paz Vicente Rafael
Publication venue: IEEE Computer Society
Publication date: 01/01/2015
Field of study

In this paper, we explore the capabilities of a sound classification system that combines both a novel FPGA cochlear model implementation and a bio-inspired technique based on a trained convolutional spiking network. The neuromorphic auditory system that is used in this work produces a form of representation that is analogous to the spike outputs of the biological cochlea. The auditory system has been developed using a set of spike-based processing building blocks in the frequency domain. They form a set of band pass filters in the spike-domain that splits the audio information in 128 frequency channels, 64 for each of two audio sources. Address Event Representation (AER) is used to communicate the auditory system with the convolutional spiking network. A layer of convolutional spiking network is developed and trained on a computer with the ability to detect two kinds of sound: artificial pure tones in the presence of white noise and electronic musical notes. After the training process, the presented system is able to distinguish the different sounds in real-time, even in the presence of white noise.Ministerio de Economía y Competitividad TEC2012-37868-C04-0

idUS. Depósito de Investigación Universidad de Sevilla

Deep Learning for Audio Signal Processing

Author: Chang Shuo-yiin
Li Bo
Purwins Hendrik
Sainath Tara
Schlüter Jan
Virtanen Tuomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2019
Field of study

Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross-fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e. audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.Comment: 15 pages, 2 pdf figure

arXiv.org e-Print Archive

VBN

Drum Transcription via Classification of Bar-level Rhythmic Patterns

Author: 15th International Society for Music Information Retrieval Conference
Dixon S
Mauch M
Thompson L
Publication venue
Publication date: 01/01/2014
Field of study

acceptedMatthias Mauch is supported by a Royal Academy of Engineering Research Fellowshi

Queen Mary Research Online

D-touch: A Consumer-Grade Tangible Interface Module and Musical Applications

Author: Costanza Enrico
Robinson John
Shelley Simon B
Publication venue
Publication date: 01/01/2003
Field of study

We define a class of tangible media applications that can be implemented on consumer-grade personal computers. These applications interpret user manipulation of physical objects in a restricted space and produce unlocalized outputs. We propose a generic approach to the implementation of such interfaces using flexible fiducial markers, which identify objects to a robust and fast video-processing algorithm, so they can be recognized and tracked in real time. We describe an implementation of the technology, then report two new, flexible music performance applications that demonstrate and validate it

Southampton (e-Prints Soton)

UCL Discovery