3 research outputs found
Audio source separation using hierarchical phase-invariant models
2009 ISCA Tutorial and Research Workshop on Non-linear Speech Processing (NOLISP)International audienceAudio source separation consists of analyzing a given audio recording so as to estimate the signal produced by each sound source for listening or information retrieval purposes. In the last five years, algorithms based on hierarchical phase-invariant models such as single or multichannel hidden Markov models (HMMs) or nonnegative matrix factorization (NMF) have become popular. In this paper, we provide an overview of these models and discuss their advantages compared to established algorithms such as nongaussianity-based frequency-domain independent component analysis (FDICA) and sparse component analysis (SCA) for the separation of complex mixtures involving many sources or reverberation.We argue how hierarchical phase-invariant modeling could form the basis of future modular source separation systems
Probabilistic Modeling Paradigms for Audio Source Separation
This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems
Prediction in polyphony: modelling musical auditory scene analysis
PhDHow do we know that a melody is a melody? In other words, how does the human brain extract
melody from a polyphonic musical context? This thesis begins with a theoretical presentation
of musical auditory scene analysis (ASA) in the context of predictive coding and rule-based
approaches and takes methodological and analytical steps to evaluate selected components of
a proposed integrated framework for musical ASA, unified by prediction. Predictive coding
has been proposed as a grand unifying model of perception, action and cognition and is based
on the idea that brains process error to refine models of the world. Existing models of ASA
tackle distinct subsets of ASA and are currently unable to integrate all the acoustic and
extensive contextual information needed to parse auditory scenes. This thesis proposes a
framework capable of integrating all relevant information contributing to the understanding of
musical auditory scenes, including auditory features, musical features, attention, expectation
and listening experience, and examines a subset of ASA issues – timbre perception in relation
to musical training, modelling temporal expectancies, the relative salience of musical
parameters and melody extraction – using probabilistic approaches. Using behavioural
methods, attention is shown to influence streaming perception based on timbre more than
instrumental experience. Using probabilistic methods, information content (IC) for temporal
aspects of music as generated by IDyOM (information dynamics of music; Pearce, 2005), are
validated and, along with IC for pitch and harmonic aspects of the music, are subsequently
linked to perceived complexity but not to salience. Furthermore, based on the hypotheses that
a melody is internally coherent and the most complex voice in a piece of polyphonic music,
IDyOM has been extended to extract melody from symbolic representations of chorales by J.S.
Bach and a selection of string quartets by W.A. Mozart