447 research outputs found
Audio Source Separation Using Sparse Representations
This is the author's final version of the article, first published as A. Nesbit, M. G. Jafari, E. Vincent and M. D. Plumbley. Audio Source Separation Using Sparse Representations. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 10, pp. 246-264. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch010file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04file: NesbitJafariVincentP11-audio.pdf:n\NesbitJafariVincentP11-audio.pdf:PDF owner: markp timestamp: 2011.02.04The authors address the problem of audio source separation, namely, the recovery of audio signals from recordings of mixtures of those signals. The sparse component analysis framework is a powerful method for achieving this. Sparse orthogonal transforms, in which only few transform coefficients differ significantly from zero, are developed; once the signal has been transformed, energy is apportioned from each transform coefficient to each estimated source, and, finally, the signal is reconstructed using the inverse transform. The overriding aim of this chapter is to demonstrate how this framework, as exemplified here by two different decomposition methods which adapt to the signal to represent it sparsely, can be used to solve different problems in different mixing scenarios. To address the instantaneous (neither delays nor echoes) and underdetermined (more sources than mixtures) mixing model, a lapped orthogonal transform is adapted to the signal by selecting a basis from a library of predetermined bases. This method is highly related to the windowing methods used in the MPEG audio coding framework. In considering the anechoic (delays but no echoes) and determined (equal number of sources and mixtures) mixing case, a greedy adaptive transform is used based on orthogonal basis functions that are learned from the observed data, instead of being selected from a predetermined library of bases. This is found to encode the signal characteristics, by introducing a feedback system between the bases and the observed data. Experiments on mixtures of speech and music signals demonstrate that these methods give good signal approximations and separation performance, and indicate promising directions for future research
Superposition frames for adaptive time-frequency analysis and fast reconstruction
In this article we introduce a broad family of adaptive, linear
time-frequency representations termed superposition frames, and show that they
admit desirable fast overlap-add reconstruction properties akin to standard
short-time Fourier techniques. This approach stands in contrast to many
adaptive time-frequency representations in the extant literature, which, while
more flexible than standard fixed-resolution approaches, typically fail to
provide efficient reconstruction and often lack the regular structure necessary
for precise frame-theoretic analysis. Our main technical contributions come
through the development of properties which ensure that this construction
provides for a numerically stable, invertible signal representation. Our
primary algorithmic contributions come via the introduction and discussion of
specific signal adaptation criteria in deterministic and stochastic settings,
based respectively on time-frequency concentration and nonstationarity
detection. We conclude with a short speech enhancement example that serves to
highlight potential applications of our approach.Comment: 16 pages, 6 figures; revised versio
State of the art in 2D content representation and compression
Livrable D1.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D3.1 du projet
Sparse Approximation and Dictionary Learning with Applications to Audio Signals
PhDOver-complete transforms have recently become the focus of a wide wealth of research in
signal processing, machine learning, statistics and related fields. Their great modelling
flexibility allows to find sparse representations and approximations of data that in turn
prove to be very efficient in a wide range of applications. Sparse models express signals as
linear combinations of a few basis functions called atoms taken from a so-called dictionary.
Finding the optimal dictionary from a set of training signals of a given class is the objective
of dictionary learning and the main focus of this thesis. The experimental evidence
presented here focuses on the processing of audio signals, and the role of sparse algorithms
in audio applications is accordingly highlighted.
The first main contribution of this thesis is the development of a pitch-synchronous
transform where the frame-by-frame analysis of audio data is adapted so that each frame
analysing periodic signals contains an integer number of periods. This algorithm presents
a technique for adapting transform parameters to the audio signal to be analysed, it
is shown to improve the sparsity of the representation if compared to a non pitchsynchronous
approach and further evaluated in the context of source separation by binary
masking.
A second main contribution is the development of a novel model and relative algorithm
for dictionary learning of convolved signals, where the observed variables are sparsely approximated
by the atoms contained in a convolved dictionary. An algorithm is devised to
learn the impulse response applied to the dictionary and experimental results on synthetic
data show the superior approximation performance of the proposed method compared to
a state-of-the-art dictionary learning algorithm.
Finally, a third main contribution is the development of methods for learning dictionaries
that are both well adapted to a training set of data and mutually incoherent. Two
novel algorithms namely the incoherent k-svd and the iterative projections and rotations
(ipr) algorithm are introduced and compared to different techniques published in the
literature in a sparse approximation context. The ipr algorithm in particular is shown
to outperform the benchmark techniques in learning very incoherent dictionaries while
maintaining a good signal-to-noise ratio of the representation
A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity
The richness of natural images makes the quest for optimal representations in
image processing and computer vision challenging. The latter observation has
not prevented the design of image representations, which trade off between
efficiency and complexity, while achieving accurate rendering of smooth regions
as well as reproducing faithful contours and textures. The most recent ones,
proposed in the past decade, share an hybrid heritage highlighting the
multiscale and oriented nature of edges and patterns in images. This paper
presents a panorama of the aforementioned literature on decompositions in
multiscale, multi-orientation bases or dictionaries. They typically exhibit
redundancy to improve sparsity in the transformed domain and sometimes its
invariance with respect to simple geometric deformations (translation,
rotation). Oriented multiscale dictionaries extend traditional wavelet
processing and may offer rotation invariance. Highly redundant dictionaries
require specific algorithms to simplify the search for an efficient (sparse)
representation. We also discuss the extension of multiscale geometric
decompositions to non-Euclidean domains such as the sphere or arbitrary meshed
surfaces. The etymology of panorama suggests an overview, based on a choice of
partially overlapping "pictures". We hope that this paper will contribute to
the appreciation and apprehension of a stream of current research directions in
image understanding.Comment: 65 pages, 33 figures, 303 reference
Fast dictionary-based compression for inverted indexes
Dictionary-based compression schemes provide fast decoding operation, typically at the expense of reduced compression effectiveness compared to statistical or probability-based approaches. In this work, we apply dictionary-based techniques to the compression of inverted lists, showing that the high degree of regularity that these integer sequences exhibit is a good match for certain types of dictionary methods, and that an important new trade-off balance between compression effectiveness and compression efficiency can be achieved. Our observations are supported by experiments using the document-level inverted index data for two large text collections, and a wide range of other index compression implementations as reference points. Those experiments demonstrate that the gap between efficiency and effectiveness can be substantially narrowed
Wavelet methods in speech recognition
In this thesis, novel wavelet techniques are developed to improve parametrization of
speech signals prior to classification. It is shown that non-linear operations carried out
in the wavelet domain improve the performance of a speech classifier and consistently
outperform classical Fourier methods. This is because of the localised nature of the
wavelet, which captures correspondingly well-localised time-frequency features
within the speech signal. Furthermore, by taking advantage of the approximation
ability of wavelets, efficient representation of the non-stationarity inherent in speech
can be achieved in a relatively small number of expansion coefficients. This is an
attractive option when faced with the so-called 'Curse of Dimensionality' problem of
multivariate classifiers such as Linear Discriminant Analysis (LDA) or Artificial
Neural Networks (ANNs). Conventional time-frequency analysis methods such as the
Discrete Fourier Transform either miss irregular signal structures and transients due to
spectral smearing or require a large number of coefficients to represent such
characteristics efficiently. Wavelet theory offers an alternative insight in the
representation of these types of signals.
As an extension to the standard wavelet transform, adaptive libraries of wavelet and
cosine packets are introduced which increase the flexibility of the transform. This
approach is observed to be yet more suitable for the highly variable nature of speech
signals in that it results in a time-frequency sampled grid that is well adapted to
irregularities and transients. They result in a corresponding reduction in the
misclassification rate of the recognition system. However, this is necessarily at the
expense of added computing time.
Finally, a framework based on adaptive time-frequency libraries is developed which
invokes the final classifier to choose the nature of the resolution for a given
classification problem. The classifier then performs dimensionaIity reduction on the
transformed signal by choosing the top few features based on their discriminant power. This approach is compared and contrasted to an existing discriminant wavelet
feature extractor.
The overall conclusions of the thesis are that wavelets and their relatives are capable
of extracting useful features for speech classification problems. The use of adaptive
wavelet transforms provides the flexibility within which powerful feature extractors
can be designed for these types of application
Coded Aperture Hyperspectral Image Reconstruction
This article belongs to the Special Issue Computational Spectral Imaging[Abstract] In this work, we study and analyze the reconstruction of hyperspectral images that are sampled with a CASSI device. The sensing procedure was modeled with the help of the CS theory, which enabled efficient mechanisms for the reconstruction of the hyperspectral images from their compressive measurements. In particular, we considered and compared four different type of estimation algorithms: OMP, GPSR, LASSO, and IST. Furthermore, the large dimensions of hyperspectral images required the implementation of a practical block CASSI model to reconstruct the images with an acceptable delay and affordable computational cost. In order to consider the particularities of the block model and the dispersive effects in the CASSI-like sensing procedure, the problem was reformulated, as well as the construction of the variables involved. For this practical CASSI setup, we evaluated the performance of the overall system by considering the aforementioned algorithms and the different factors that impacted the reconstruction procedure. Finally, the obtained results were analyzed and discussed from a practical perspective.This work was funded by the Xunta de Galicia (by Grant ED431C 2020/15 and Grant ED431G 2019/01 to support the Centro de Investigación de Galicia “CITIC”), the Agencia Estatal de Investigación of Spain (by Grants RED2018-102668-T and PID2019-104958RB-C42), and the ERDF funds of the EU (FEDER Galicia 2014-2020 and AEI/FEDER Programs, UE).Xunta de Galicia; ED431C 2020/15Xunta de Galicia; ED431G 2019/0
- …