Search CORE

10,427 research outputs found

Score-Informed Source Separation for Musical Audio Recordings [An overview]

Author: Ewert S
Mueller M
Pardo B
Plumbley MD
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

(c) 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works

CiteSeerX

Crossref

Queen Mary Research Online

Surrey Research Insight

The Sound Manifesto

Author: Bisnovatyi Ilia
O'Donnell Michael J.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 08/07/2000
Field of study

Computing practice today depends on visual output to drive almost all user interaction. Other senses, such as audition, may be totally neglected, or used tangentially, or used in highly restricted specialized ways. We have excellent audio rendering through D-A conversion, but we lack rich general facilities for modeling and manipulating sound comparable in quality and flexibility to graphics. We need co-ordinated research in several disciplines to improve the use of sound as an interactive information channel. Incremental and separate improvements in synthesis, analysis, speech processing, audiology, acoustics, music, etc. will not alone produce the radical progress that we seek in sonic practice. We also need to create a new central topic of study in digital audio research. The new topic will assimilate the contributions of different disciplines on a common foundation. The key central concept that we lack is sound as a general-purpose information channel. We must investigate the structure of this information channel, which is driven by the co-operative development of auditory perception and physical sound production. Particular audible encodings, such as speech and music, illuminate sonic information by example, but they are no more sufficient for a characterization than typography is sufficient for a characterization of visual information.Comment: To appear in the conference on Critical Technologies for the Future of Computing, part of SPIE's International Symposium on Optical Science and Technology, 30 July to 4 August 2000, San Diego, C

arXiv.org e-Print Archive

Crossref

ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

Author: Cohen Michael
Govindarajan Krishna
Grossberg Stephen
Wyse Lonce
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/06/2003
Field of study

Multiple sound sources often contain harmonics that overlap and may be degraded by environmental noise. The auditory system is capable of teasing apart these sources into distinct mental objects, or streams. Such an "auditory scene analysis" enables the brain to solve the cocktail party problem. A neural network model of auditory scene analysis, called the AIRSTREAM model, is presented to propose how the brain accomplishes this feat. The model clarifies how the frequency components that correspond to a give acoustic source may be coherently grouped together into distinct streams based on pitch and spatial cues. The model also clarifies how multiple streams may be distinguishes and seperated by the brain. Streams are formed as spectral-pitch resonances that emerge through feedback interactions between frequency-specific spectral representaion of a sound source and its pitch. First, the model transforms a sound into a spatial pattern of frequency-specific activation across a spectral stream layer. The sound has multiple parallel representations at this layer. A sound's spectral representation activates a bottom-up filter that is sensitive to harmonics of the sound's pitch. The filter activates a pitch category which, in turn, activate a top-down expectation that allows one voice or instrument to be tracked through a noisy multiple source environment. Spectral components are suppressed if they do not match harmonics of the top-down expectation that is read-out by the selected pitch, thereby allowing another stream to capture these components, as in the "old-plus-new-heuristic" of Bregman. Multiple simultaneously occuring spectral-pitch resonances can hereby emerge. These resonance and matching mechanisms are specialized versions of Adaptive Resonance Theory, or ART, which clarifies how pitch representations can self-organize durin learning of harmonic bottom-up filters and top-down expectations. The model also clarifies how spatial location cues can help to disambiguate two sources with similar spectral cures. Data are simulated from psychophysical grouping experiments, such as how a tone sweeping upwards in frequency creates a bounce percept by grouping with a downward sweeping tone due to proximity in frequency, even if noise replaces the tones at their interection point. Illusory auditory percepts are also simulated, such as the auditory continuity illusion of a tone continuing through a noise burst even if the tone is not present during the noise, and the scale illusion of Deutsch whereby downward and upward scales presented alternately to the two ears are regrouped based on frequency proximity, leading to a bounce percept. Since related sorts of resonances have been used to quantitatively simulate psychophysical data about speech perception, the model strengthens the hypothesis the ART-like mechanisms are used at multiple levels of the auditory system. Proposals for developing the model to explain more complex streaming data are also provided.Air Force Office of Scientific Research (F49620-01-1-0397, F49620-92-J-0225); Office of Naval Research (N00014-01-1-0624); Advanced Research Projects Agency (N00014-92-J-4015); British Petroleum (89A-1204); National Science Foundation (IRI-90-00530); American Society of Engineering Educatio

Boston University Institutional Repository (OpenBU)

ARSTREAM: A Neural Network Model of Auditory Scene Analysis and Source Segregation

Author: Grossberg Stephen
Govindarajan Krishna
Wyse Lonce
Cohen Michael
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/01/1997
Field of study

Boston University Institutional Repository (OpenBU)

Post-tonal analytical techniques: Stravinsky’s symphonies of wind instruments

Author: Matthews Jeremy
Publication venue
Publication date: 01/01/1998
Field of study

The analysis of post-tonal music remains problematic. Analytical methodologies designed specifically for tonal or atonal music require substantial modification if they are to effectively analyse post-tonal works. Stravinsky's Symphonies of Wind Instruments is a fine model of post-tonal originality and is a difficult piece for analysis. Following a discussion of various analytical approaches, this paper presents a detailed analytical examination of Symphonies of Wind Instruments. The paper closes by developing conclusions and acknowledging the continuing advances of music analysis

Durham e-Theses

Score-Informed Source Separation for Music Signals

Author: Ewert Sebastian
Publication venue: Dagstuhl Follow-Ups. Multimodal Music Processing
Publication date: 01/01/2012
Field of study

In recent years, the processing of audio recordings by exploiting additional musical knowledge has turned out to be a promising research direction. In particular, additional note information as specified by a musical score or a MIDI file has been employed to support various audio processing tasks such as source separation, audio parameterization, performance analysis, or instrument equalization. In this contribution, we provide an overview of approaches for score-informed source separation and illustrate their potential by discussing innovative applications and interfaces. Additionally, to illustrate some basic principles behind these approaches, we demonstrate how score information can be integrated into the well-known non-negative matrix factorization (NMF) framework. Finally, we compare this approach to advanced methods based on parametric models

Dagstuhl Research Online Publication Server

Stable super-resolution limit and smallest singular value of restricted Fourier matrices

Author: Li Weilin
Liao Wenjing
Publication venue
Publication date: 16/10/2018
Field of study

Super-resolution refers to the process of recovering the locations and amplitudes of a collection of point sources, represented as a discrete measure, given

M+1

of its noisy low-frequency Fourier coefficients. The recovery process is highly sensitive to noise whenever the distance

\Delta

between the two closest point sources is less than

1/M

. This paper studies the {\it fundamental difficulty of super-resolution} and the {\it performance guarantees of a subspace method called MUSIC} in the regime that

\Delta<1/M

. The most important quantity in our theory is the minimum singular value of the Vandermonde matrix whose nodes are specified by the source locations. Under the assumption that the nodes are closely spaced within several well-separated clumps, we derive a sharp and non-asymptotic lower bound for this quantity. Our estimate is given as a weighted

\ell^2

sum, where each term only depends on the configuration of each individual clump. This implies that, as the noise increases, the super-resolution capability of MUSIC degrades according to a power law where the exponent depends on the cardinality of the largest clump. Numerical experiments validate our theoretical bounds for the minimum singular value and the resolution limit of MUSIC. When there are

S

point sources located on a grid with spacing

1/N

, the fundamental difficulty of super-resolution can be quantitatively characterized by a min-max error, which is the reconstruction error incurred by the best possible algorithm in the worst-case scenario. We show that the min-max error is closely related to the minimum singular value of Vandermonde matrices, and we provide a non-asymptotic and sharp estimate for the min-max error, where the dominant term is

(N/M)^{2S-1}

.Comment: 47 pages, 8 figure

arXiv.org e-Print Archive