586 research outputs found
Object-based Modeling of Audio for Coding and Source Separation
This thesis studies several data decomposition algorithms for obtaining an object-based representation of an audio signal. The estimation of the representation parameters are coupled with audio-specific criteria, such as the spectral redundancy, sparsity, perceptual relevance and spatial position of sounds. The objective is to obtain an audio signal representation that is composed of meaningful entities called audio objects that reflect the properties of real-world sound objects and events. The estimation of the object-based model is based on magnitude spectrogram redundancy using non-negative matrix factorization with extensions to multichannel and complex-valued data. The benefits of working with object-based audio representations over the conventional time-frequency bin-wise processing are studied. The two main applications of the object-based audio representations proposed in this thesis are spatial audio coding and sound source separation from multichannel microphone array recordings.
In the proposed spatial audio coding algorithm, the audio objects are estimated from the multichannel magnitude spectrogram. The audio objects are used for recovering the content of each original channel from a single downmixed signal, using time-frequency filtering. The perceptual relevance of modeling the audio signal is considered in the estimation of the parameters of the object-based model, and the sparsity of the model is utilized in encoding its parameters. Additionally, a quantization of the model parameters is proposed that reflects the perceptual relevance of each quantized element.
The proposed object-based spatial audio coding algorithm is evaluated via listening tests and comparing the overall perceptual quality to conventional time-frequency block-wise methods at the same bitrates. The proposed approach is found to produce comparable coding efficiency while providing additional functionality via the object-based coding domain representation, such as the blind separation of the mixture of sound sources in the encoded channels.
For the sound source separation from multichannel audio recorded by a microphone array, a method combining an object-based magnitude model and spatial covariance matrix estimation is considered. A direction of arrival-based model for the spatial covariance matrices of the sound sources is proposed. Unlike the conventional approaches, the estimation of the parameters of the proposed spatial covariance matrix model ensures a spatially coherent solution for the spatial parameterization of the sound sources. The separation quality is measured with objective criteria and the proposed method is shown to improve over the state-of-the-art sound source separation methods, with recordings done using a small microphone array
Recommended from our members
Non-Negative Tensor Factorization Applied to Music Genre Classification
Music genre classification techniques are typically applied to the data matrix whose columns are the feature vectors extracted from music recordings. In this paper, a feature vector is extracted using a texture window of one sec, which enables the representation of any 30 sec long music recording as a time sequence of feature vectors, thus yielding a feature matrix. Consequently, by stacking the feature matrices associated to any dataset recordings, a tensor is created, a fact which necessitates studying music genre classification using tensors. First, a novel algorithm for non-negative tensor factorization (NTF) is derived that extends the non-negative matrix factorization. Several variants of the NTF algorithm emerge by employing different cost functions from the class of Bregman divergences. Second, a novel supervised NTF classifier is proposed, which trains a basis for each class separately and employs basis orthogonalization. A variety of spectral, temporal, perceptual, energy, and pitch descriptors is extracted from 1000 recordings of the GTZAN dataset, which are distributed across 10 genre classes. The NTF classifier performance is compared against that of the multilayer perceptron and the support vector machines by applying a stratified 10-fold cross validation. A genre classification accuracy of 78.9% is reported for the NTF classifier demonstrating the superiority of the aforementioned multilinear classifier over several data matrix-based state-of-the-art classifiers
Shift & 2D Rotation Invariant Sparse Coding for Multivariate Signals
International audienceClassical dictionary learning algorithms (DLA) allow unicomponent signals to be processed. Due to our interest in two-dimensional (2D) motion signals, we wanted to mix the two components to provide rotation invariance. So, multicomponent frameworks are examined here. In contrast to the well-known multichannel framework, a multivariate framework is first introduced as a tool to easily solve our problem and to preserve the data structure. Within this multivariate framework, we then present sparse coding methods: multivariate orthogonal matching pursuit (M-OMP), which provides sparse approximation for multivariate signals, and multivariate DLA (M-DLA), which empirically learns the characteristic patterns (or features) that are associated to a multivariate signals set, and combines shift-invariance and online learning. Once the multivariate dictionary is learned, any signal of this considered set can be approximated sparsely. This multivariate framework is introduced to simply present the 2D rotation invariant (2DRI) case. By studying 2D motions that are acquired in bivariate real signals, we want the decompositions to be independent of the orientation of the movement execution in the 2D space. The methods are thus specified for the 2DRI case to be robust to any rotation: 2DRI-OMP and 2DRI-DLA. Shift and rotation invariant cases induce a compact learned dictionary and provide robust decomposition. As validation, our methods are applied to 2D handwritten data to extract the elementary features of this signals set, and to provide rotation invariant decomposition
Tensor Decompositions for Signal Processing Applications From Two-way to Multiway Component Analysis
The widespread use of multi-sensor technology and the emergence of big
datasets has highlighted the limitations of standard flat-view matrix models
and the necessity to move towards more versatile data analysis tools. We show
that higher-order tensors (i.e., multiway arrays) enable such a fundamental
paradigm shift towards models that are essentially polynomial and whose
uniqueness, unlike the matrix methods, is guaranteed under verymild and natural
conditions. Benefiting fromthe power ofmultilinear algebra as theirmathematical
backbone, data analysis techniques using tensor decompositions are shown to
have great flexibility in the choice of constraints that match data properties,
and to find more general latent components in the data than matrix-based
methods. A comprehensive introduction to tensor decompositions is provided from
a signal processing perspective, starting from the algebraic foundations, via
basic Canonical Polyadic and Tucker models, through to advanced cause-effect
and multi-view data analysis schemes. We show that tensor decompositions enable
natural generalizations of some commonly used signal processing paradigms, such
as canonical correlation and subspace techniques, signal separation, linear
regression, feature extraction and classification. We also cover computational
aspects, and point out how ideas from compressed sensing and scientific
computing may be used for addressing the otherwise unmanageable storage and
manipulation problems associated with big datasets. The concepts are supported
by illustrative real world case studies illuminating the benefits of the tensor
framework, as efficient and promising tools for modern signal processing, data
analysis and machine learning applications; these benefits also extend to
vector/matrix data through tensorization. Keywords: ICA, NMF, CPD, Tucker
decomposition, HOSVD, tensor networks, Tensor Train
Very Low Bitrate Spatial Audio Coding with Dimensionality Reduction
International audienceIn this paper, we show that tensor compression techniques based on randomization and partial observations are very useful for spatial audio object coding. In this application, we aim at transmitting several audio signals called objects from a coder to a decoder. A common strategy is to transmit only the downmix of the objects along some small information permitting reconstruction at the decoder. In practice , this is done by transmitting compressed versions of the objects spectrograms and separating the mix with Wiener filters. Previous research used nonnegative tensor factorizations in this context, with bitrates as low as 1 kbps per object. Building on recent advances on tensor compression, we show that the computation time for encoding can be extremely reduced. Then, we demonstrate how the mixture can be exploited at the de-coder to avoid the transmission of many parameters, permitting bi-trates as low as 0.1 kbps per object for comparable performance
- …