71 research outputs found

    An MDL framework for sparse coding and dictionary learning

    Full text link
    The power of sparse signal modeling with learned over-complete dictionaries has been demonstrated in a variety of applications and fields, from signal processing to statistical inference and machine learning. However, the statistical properties of these models, such as under-fitting or over-fitting given sets of data, are still not well characterized in the literature. As a result, the success of sparse modeling depends on hand-tuning critical parameters for each data and application. This work aims at addressing this by providing a practical and objective characterization of sparse models by means of the Minimum Description Length (MDL) principle -- a well established information-theoretic approach to model selection in statistical inference. The resulting framework derives a family of efficient sparse coding and dictionary learning algorithms which, by virtue of the MDL principle, are completely parameter free. Furthermore, such framework allows to incorporate additional prior information to existing models, such as Markovian dependencies, or to define completely new problem formulations, including in the matrix analysis area, in a natural way. These virtues will be demonstrated with parameter-free algorithms for the classic image denoising and classification problems, and for low-rank matrix recovery in video applications

    Sparse and Low-rank Modeling for Automatic Speech Recognition

    Get PDF
    This thesis deals with exploiting the low-dimensional multi-subspace structure of speech towards the goal of improving acoustic modeling for automatic speech recognition (ASR). Leveraging the parsimonious hierarchical nature of speech, we hypothesize that whenever a speech signal is measured in a high-dimensional feature space, the true class information is embedded in low-dimensional subspaces whereas noise is scattered as random high-dimensional erroneous estimations in the features. In this context, the contribution of this thesis is twofold: (i) identify sparse and low-rank modeling approaches as excellent tools for extracting the class-specific low-dimensional subspaces in speech features, and (ii) employ these tools under novel ASR frameworks to enrich the acoustic information present in the speech features towards the goal of improving ASR. Techniques developed in this thesis focus on deep neural network (DNN) based posterior features which, under the sparse and low-rank modeling approaches, unveil the underlying class-specific low-dimensional subspaces very elegantly. In this thesis, we tackle ASR tasks of varying difficulty, ranging from isolated word recognition (IWR) and connected digit recognition (CDR) to large-vocabulary continuous speech recognition (LVCSR). For IWR and CDR, we propose a novel \textit{Compressive Sensing} (CS) perspective towards ASR. Here exemplar-based speech recognition is posed as a problem of recovering sparse high-dimensional word representations from compressed low-dimensional phonetic representations. In the context of LVCSR, this thesis argues that albeit their power in representation learning, DNN based acoustic models still have room for improvement in exploiting the \textit{union of low-dimensional subspaces} structure of speech data. Therefore, this thesis proposes to enhance DNN posteriors by projecting them onto the manifolds of the underlying classes using principal component analysis (PCA) or compressive sensing based dictionaries. Projected posteriors are shown to be more accurate training targets for learning better acoustic models, resulting in improved ASR performance. The proposed approach is evaluated on both close-talk and far-field conditions, confirming the importance of sparse and low-rank modeling of speech in building a robust ASR framework. Finally, the conclusions of this thesis are further consolidated by an information theoretic analysis approach which explicitly quantifies the contribution of proposed techniques in improving ASR

    Second generation sparse models

    Get PDF
    Sparse data models, where data is assumed to be well represented as a linear combination of a few elements from a learned dictionary, have gained considerable attention in recent years, and their use has led to state-of-the-art results in many applications. The success of these models is largely attributed to two critical features: the use of sparsity as a robust mechanism for regularizing the linear coefficients that represent the data, and the flexibility provided by overcomplete dictionaries that are learned from the data. These features are controlled by two critical hyper-parameters: the desired sparsity of the coefficients, and the size of the dictionaries to be learned. However, lacking theoretical guidelines for selecting these critical parameters, applications based on sparse models often require hand-tuning and cross-validation to select them, for each application, and each data set. This can be both inefficient and ineffective. On the other hand, there are multiple scenarios in which imposing additional constraints to the produced representations, including the sparse codes and the dictionary itself, can result in further improvements. This thesis is about improving and/or extending current sparse models by addressing the two issues discussed above, providing the elements for a new generation of more powerful and flexible sparse models. First, we seek to gain a better understanding of sparse models as data modeling tools, so that critical parameters can be selected automatically, efficiently, and in a principled way. Secondly, we explore new sparse modeling formulations for effectively exploiting the prior information present in different scenarios. In order to achieve these goals, we combine ideas and tools from information theory, statistics, machine learning, and optimization theory. The theoretical contributions are complemented with applications in audio, image and video processing

    A Unified Framework for Sparse Non-Negative Least Squares using Multiplicative Updates and the Non-Negative Matrix Factorization Problem

    Full text link
    We study the sparse non-negative least squares (S-NNLS) problem. S-NNLS occurs naturally in a wide variety of applications where an unknown, non-negative quantity must be recovered from linear measurements. We present a unified framework for S-NNLS based on a rectified power exponential scale mixture prior on the sparse codes. We show that the proposed framework encompasses a large class of S-NNLS algorithms and provide a computationally efficient inference procedure based on multiplicative update rules. Such update rules are convenient for solving large sets of S-NNLS problems simultaneously, which is required in contexts like sparse non-negative matrix factorization (S-NMF). We provide theoretical justification for the proposed approach by showing that the local minima of the objective function being optimized are sparse and the S-NNLS algorithms presented are guaranteed to converge to a set of stationary points of the objective function. We then extend our framework to S-NMF, showing that our framework leads to many well known S-NMF algorithms under specific choices of prior and providing a guarantee that a popular subclass of the proposed algorithms converges to a set of stationary points of the objective function. Finally, we study the performance of the proposed approaches on synthetic and real-world data.Comment: To appear in Signal Processin

    A Reverse Hierarchy Model for Predicting Eye Fixations

    Full text link
    A number of psychological and physiological evidences suggest that early visual attention works in a coarse-to-fine way, which lays a basis for the reverse hierarchy theory (RHT). This theory states that attention propagates from the top level of the visual hierarchy that processes gist and abstract information of input, to the bottom level that processes local details. Inspired by the theory, we develop a computational model for saliency detection in images. First, the original image is downsampled to different scales to constitute a pyramid. Then, saliency on each layer is obtained by image super-resolution reconstruction from the layer above, which is defined as unpredictability from this coarse-to-fine reconstruction. Finally, saliency on each layer of the pyramid is fused into stochastic fixations through a probabilistic model, where attention initiates from the top layer and propagates downward through the pyramid. Extensive experiments on two standard eye-tracking datasets show that the proposed method can achieve competitive results with state-of-the-art models.Comment: CVPR 2014, 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR). CVPR 201

    Modelling of the switching behavior of functional connectivity microstates (FCÎĽstates) as a novel biomarker for mild cognitive impairment

    Get PDF
    It is evident the need for designing and validating novel biomarkers for the detection of mild cognitive impairment (MCI). MCI patients have a high risk of developing Alzheimer’s disease (AD), and for that reason the introduction of novel and reliable biomarkers is of significant clinical importance. Motivated by recent findings about the rich information of dynamic functional connectivity graphs (DFCGs) about brain (dys)function, we introduced a novel approach of identifying MCI based on magnetoencephalographic (MEG) resting state recordings. The activity of different brain rhythms {δ, θ, α1, α2, β1, β2, γ1, γ2} was first beamformed with linear constrained minimum norm variance in the MEG data to determine ninety anatomical regions of interest (ROIs). A dynamic functional connectivity graph (DFCG) was then estimated using the imaginary part of phase lag value (iPLV) for both intra-frequency coupling (8) and also cross-frequency coupling pairs (28). We analysed DFCG profiles of neuromagnetic resting state recordings of 18 Mild Cognitive Impairment (MCI) patients and 20 healthy controls. We followed our model of identifying the dominant intrinsic coupling mode (DICM) across MEG sources and temporal segments that further leads to the construction of an integrated DFCG (iDFCG). We then filtered statistically and topologically every snapshot of the iDFCG with data-driven approaches. Estimation of the normalized Laplacian transformation for every temporal segment of the iDFCG and the related eigenvalues created a 2D map based on the network metric time series of the eigenvalues (NMTSeigs). NMTSeigs preserves the non-stationarity of the fluctuated synchronizability of iDCFG for each subject. Employing the initial set of 20 healthy elders and 20 MCI patients, as training set, we built an overcomplete dictionary set of network microstates (nμstates). Afterward, we tested the whole procedure in an extra blind set of 20 subjects for external validation. We succeeded a high classification accuracy on the blind dataset (85 %) which further supports the proposed Markovian modelling of the evolution of brain states. The adaptation of appropriate neuroinformatic tools that combine advanced signal processing and network neuroscience tools could manipulate properly the non-stationarity of time-resolved FC patterns revealing a robust biomarker for MCI

    A sparsity-based framework for resolution enhancement in optical fault analysis of integrated circuits

    Full text link
    The increasing density and smaller length scales in integrated circuits (ICs) create resolution challenges for optical failure analysis techniques. Due to flip-chip bonding and dense metal layers on the front side, optical analysis of ICs is restricted to backside imaging through the silicon substrate, which limits the spatial resolution due to the minimum wavelength of transmission and refraction at the planar interface. The state-of-the-art backside analysis approach is to use aplanatic solid immersion lenses in order to achieve the highest possible numerical aperture of the imaging system. Signal processing algorithms are essential to complement the optical microscopy efforts to increase resolution through hardware modifications in order to meet the resolution requirements of new IC technologies. The focus of this thesis is the development of sparsity-based image reconstruction techniques to improve resolution of static IC images and dynamic optical measurements of device activity. A physics-based observation model is exploited in order to take advantage of polarization diversity in high numerical aperture systems. Multiple-polarization observation data are combined to produce a single enhanced image with higher resolution. In the static IC image case, two sparsity paradigms are considered. The first approach, referred to as analysis-based sparsity, creates enhanced resolution imagery by solving a linear inverse problem while enforcing sparsity through non-quadratic regularization functionals appropriate to IC features. The second approach, termed synthesis-based sparsity, is based on sparse representations with respect to overcomplete dictionaries. The domain of IC imaging is particularly suitable for the application of overcomplete dictionaries because the images are highly structured; they contain predictable building blocks derivable from the corresponding computer-aided design layouts. This structure provides a strong and natural a-priori dictionary for image reconstruction. In the dynamic case, an extension of the synthesis-based sparsity paradigm is formulated. Spatial regions of active areas with the same behavior over time or over frequency are coupled by an overcomplete dictionary consisting of space-time or space-frequency blocks. This extended dictionary enables resolution improvement through sparse representation of dynamic measurements. Additionally, extensions to darkfield subsurface microscopy of ICs and focus determination based on image stacks are provided. The resolution improvement ability of the proposed methods has been validated on both simulated and experimental data

    State of the art in 2D content representation and compression

    Get PDF
    Livrable D1.3 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D3.1 du projet
    • …
    corecore