1,505 research outputs found
Approximate Message Passing for Underdetermined Audio Source Separation
Approximate message passing (AMP) algorithms have shown great promise in
sparse signal reconstruction due to their low computational requirements and
fast convergence to an exact solution. Moreover, they provide a probabilistic
framework that is often more intuitive than alternatives such as convex
optimisation. In this paper, AMP is used for audio source separation from
underdetermined instantaneous mixtures. In the time-frequency domain, it is
typical to assume a priori that the sources are sparse, so we solve the
corresponding sparse linear inverse problem using AMP. We present a block-based
approach that uses AMP to process multiple time-frequency points
simultaneously. Two algorithms known as AMP and vector AMP (VAMP) are evaluated
in particular. Results show that they are promising in terms of artefact
suppression.Comment: Paper accepted for 3rd International Conference on Intelligent Signal
Processing (ISP 2017
Free-energy and the brain
If one formulates Helmholtz's ideas about perception in terms of modern-day theories one arrives at a model of perceptual inference and learning that can explain a remarkable range of neurobiological facts. Using constructs from statistical physics it can be shown that the problems of inferring what cause our sensory input and learning causal regularities in the sensorium can be resolved using exactly the same principles. Furthermore, inference and learning can proceed in a biologically plausible fashion. The ensuing scheme rests on Empirical Bayes and hierarchical models of how sensory information is generated. The use of hierarchical models enables the brain to construct prior expectations in a dynamic and context-sensitive fashion. This scheme provides a principled way to understand many aspects of the brain's organisation and responses.In this paper, we suggest that these perceptual processes are just one emergent property of systems that conform to a free-energy principle. The free-energy considered here represents a bound on the surprise inherent in any exchange with the environment, under expectations encoded by its state or configuration. A system can minimise free-energy by changing its configuration to change the way it samples the environment, or to change its expectations. These changes correspond to action and perception respectively and lead to an adaptive exchange with the environment that is characteristic of biological systems. This treatment implies that the system's state and structure encode an implicit and probabilistic model of the environment. We will look at models entailed by the brain and how minimisation of free-energy can explain its dynamics and structure
Towards music perception by redundancy reduction and unsupervised learning in probabilistic models
PhDThe study of music perception lies at the intersection of several disciplines: perceptual
psychology and cognitive science, musicology, psychoacoustics, and acoustical
signal processing amongst others. Developments in perceptual theory over the last
fifty years have emphasised an approach based on Shannon’s information theory and
its basis in probabilistic systems, and in particular, the idea that perceptual systems
in animals develop through a process of unsupervised learning in response to natural
sensory stimulation, whereby the emerging computational structures are well adapted
to the statistical structure of natural scenes. In turn, these ideas are being applied to
problems in music perception.
This thesis is an investigation of the principle of redundancy reduction through
unsupervised learning, as applied to representations of sound and music.
In the first part, previous work is reviewed, drawing on literature from some of the
fields mentioned above, and an argument presented in support of the idea that perception
in general and music perception in particular can indeed be accommodated within
a framework of unsupervised learning in probabilistic models.
In the second part, two related methods are applied to two different low-level representations.
Firstly, linear redundancy reduction (Independent Component Analysis)
is applied to acoustic waveforms of speech and music. Secondly, the related method of
sparse coding is applied to a spectral representation of polyphonic music, which proves
to be enough both to recognise that the individual notes are the important structural elements,
and to recover a rough transcription of the music.
Finally, the concepts of distance and similarity are considered, drawing in ideas
about noise, phase invariance, and topological maps. Some ecologically and information
theoretically motivated distance measures are suggested, and put in to practice in
a novel method, using multidimensional scaling (MDS), for visualising geometrically
the dependency structure in a distributed representation.Engineering and Physical Science Research Counci
NamedMask: Distilling Segmenters from Complementary Foundation Models
The goal of this work is to segment and name regions of images without access
to pixel-level labels during training. To tackle this task, we construct
segmenters by distilling the complementary strengths of two foundation models.
The first, CLIP (Radford et al. 2021), exhibits the ability to assign names to
image content but lacks an accessible representation of object structure. The
second, DINO (Caron et al. 2021), captures the spatial extent of objects but
has no knowledge of object names. Our method, termed NamedMask, begins by using
CLIP to construct category-specific archives of images. These images are
pseudo-labelled with a category-agnostic salient object detector bootstrapped
from DINO, then refined by category-specific segmenters using the CLIP archive
labels. Thanks to the high quality of the refined masks, we show that a
standard segmentation architecture trained on these archives with appropriate
data augmentation achieves impressive semantic segmentation abilities for both
single-object and multi-object images. As a result, our proposed NamedMask
performs favourably against a range of prior work on five benchmarks including
the VOC2012, COCO and large-scale ImageNet-S datasets.Comment: Tech report. Code: https://github.com/NoelShin/namedmas
Statistical single channel source separation
PhD ThesisSingle channel source separation (SCSS) principally is one of the challenging fields
in signal processing and has various significant applications. Unlike conventional
SCSS methods which were based on linear instantaneous model, this research sets out
to investigate the separation of single channel in two types of mixture which is
nonlinear instantaneous mixture and linear convolutive mixture. For the nonlinear
SCSS in instantaneous mixture, this research proposes a novel solution based on a
two-stage process that consists of a Gaussianization transform which efficiently
compensates for the nonlinear distortion follow by a maximum likelihood estimator to
perform source separation. For linear SCSS in convolutive mixture, this research
proposes new methods based on nonnegative matrix factorization which decomposes a
mixture into two-dimensional convolution factor matrices that represent the spectral
basis and temporal code. The proposed factorization considers the convolutive mixing
in the decomposition by introducing frequency constrained parameters in the model.
The method aims to separate the mixture into its constituent spectral-temporal source
components while alleviating the effect of convolutive mixing. In addition, family of
Itakura-Saito divergence has been developed as a cost function which brings the
beneficial property of scale-invariant. Two new statistical techniques are proposed,
namely, Expectation-Maximisation (EM) based algorithm framework which
maximizes the log-likelihood of a mixed signals, and the maximum a posteriori
approach which maximises the joint probability of a mixed signal using multiplicative
update rules. To further improve this research work, a novel method that incorporates
adaptive sparseness into the solution has been proposed to resolve the ambiguity and
hence, improve the algorithm performance. The theoretical foundation of the proposed
solutions has been rigorously developed and discussed in details. Results have
concretely shown the effectiveness of all the proposed algorithms presented in this
thesis in separating the mixed signals in single channel and have outperformed others
available methods.Universiti Teknikal Malaysia Melaka(UTeM),
Ministry of Higher Education of Malaysi
Iconic Indexing for Video Search
Submitted for the degree of Doctor of Philosophy, Queen Mary, University of London
- …