11 research outputs found

    Shaping Level Sets with Submodular Functions

    Get PDF
    International audienceWe consider a class of sparsity-inducing regularization terms based on submodular functions. While previous work has focused on non-decreasing functions, we explore symmetric submodular functions and their \lova extensions. We show that the Lovasz extension may be seen as the convex envelope of a function that depends on level sets (i.e., the set of indices whose corresponding components of the underlying predictor are greater than a given constant): this leads to a class of convex structured regularization terms that impose prior knowledge on the level sets, and not only on the supports of the underlying predictors. We provide a unified set of optimization algorithms, such as proximal operators, and theoretical guarantees (allowed level sets and recovery conditions). By selecting specific submodular functions, we give a new interpretation to known norms, such as the total variation; we also define new norms, in particular ones that are based on order statistics with application to clustering and outlier detection, and on noisy cuts in graphs with application to change point detection in the presence of outliers

    Theoretical Properties of the Overlapping Groups Lasso

    Full text link
    We present two sets of theoretical results on the grouped lasso with overlap of Jacob, Obozinski and Vert (2009) in the linear regression setting. This method allows for joint selection of predictors in sparse regression, allowing for complex structured sparsity over the predictors encoded as a set of groups. This flexible framework suggests that arbitrarily complex structures can be encoded with an intricate set of groups. Our results show that this strategy results in unexpected theoretical consequences for the procedure. In particular, we give two sets of results: (1) finite sample bounds on prediction and estimation, and (2) asymptotic distribution and selection. Both sets of results give insight into the consequences of choosing an increasingly complex set of groups for the procedure, as well as what happens when the set of groups cannot recover the true sparsity pattern. Additionally, these results demonstrate the differences and similarities between the the grouped lasso procedure with and without overlapping groups. Our analysis shows the set of groups must be chosen with caution - an overly complex set of groups will damage the analysis.Comment: 20 pages, submitted to Annals of Statistic

    An Algorithmic Theory of Dependent Regularizers, Part 1: Submodular Structure

    Full text link
    We present an exploration of the rich theoretical connections between several classes of regularized models, network flows, and recent results in submodular function theory. This work unifies key aspects of these problems under a common theory, leading to novel methods for working with several important models of interest in statistics, machine learning and computer vision. In Part 1, we review the concepts of network flows and submodular function optimization theory foundational to our results. We then examine the connections between network flows and the minimum-norm algorithm from submodular optimization, extending and improving several current results. This leads to a concise representation of the structure of a large class of pairwise regularized models important in machine learning, statistics and computer vision. In Part 2, we describe the full regularization path of a class of penalized regression problems with dependent variables that includes the graph-guided LASSO and total variation constrained models. This description also motivates a practical algorithm. This allows us to efficiently find the regularization path of the discretized version of TV penalized models. Ultimately, our new algorithms scale up to high-dimensional problems with millions of variables

    Optimization with Sparsity-Inducing Penalties

    Get PDF
    Sparse estimation methods are aimed at using or obtaining parsimonious representations of data or models. They were first dedicated to linear variable selection but numerous extensions have now emerged such as structured sparsity or kernel selection. It turns out that many of the related estimation problems can be cast as convex optimization problems by regularizing the empirical risk with appropriate non-smooth norms. The goal of this paper is to present from a general perspective optimization tools and techniques dedicated to such sparsity-inducing penalties. We cover proximal methods, block-coordinate descent, reweighted â„“2\ell_2-penalized techniques, working-set and homotopy methods, as well as non-convex formulations and extensions, and provide an extensive set of experiments to compare various algorithms from a computational point of view

    Iterative hard clustering of features

    Get PDF
    We seek to group features in supervised learning problems by constraining the prediction vector coefficients to take only a small number of values. This problem includes non-convex constraints and is solved using projected gradient descent. We prove exact recovery results using restricted eigenvalue conditions. We then extend these results to combine sparsity and grouping constraints, and develop an efficient projection algorithm on the set of grouped and sparse vectors. Numerical experiments illustrate the performance of our algorithms on both synthetic and real data sets

    Group-structured and independent subspace based dictionary learning

    Get PDF
    Thanks to the several successful applications, sparse signal representation has become one of the most actively studied research areas in mathematics. However, in the traditional sparse coding problem the dictionary used for representation is assumed to be known. In spite of the popularity of sparsity and its recently emerged structured sparse extension, interestingly, very few works focused on the learning problem of dictionaries to these codes. In the first part of the paper, we develop a dictionary learning method which is (i) online, (ii) enables overlapping group structures with (iii) non-convex sparsity-inducing regularization and (iv) handles the partially observable case. To the best of our knowledge, current methods can exhibit two of these four desirable properties at most. We also investigate several interesting special cases of our framework and demonstrate its applicability in inpainting of natural signals, structured sparse non-negative matrix factorization of faces and collaborative filtering. Complementing the sparse direction we formulate a novel component-wise acting, epsilon-sparse coding scheme in reproducing kernel Hilbert spaces and show its equivalence to a generalized class of support vector machines. Moreover, we embed support vector machines to multilayer perceptrons and show that for this novel kernel based approximation approach the backpropagation procedure of multilayer perceptrons can be generalized. In the second part of the paper, we focus on dictionary learning making use of independent subspace assumption instead of structured sparsity. The corresponding problem is called independent subspace analysis (ISA), or independent component analysis (ICA) if all the hidden, independent sources are one-dimensional. One of the most fundamental results of this research field is the ISA separation principle, which states that the ISA problem can be solved by traditional ICA up to permutation. This principle (i) forms the basis of the state-of-the-art ISA solvers and (ii) enables one to estimate the unknown number and the dimensions of the sources efficiently. We (i) extend the ISA problem to several new directions including the controlled, the partially observed, the complex valued and the nonparametric case and (ii) derive separation principle based solution techniques for the generalizations. This solution approach (i) makes it possible to apply state-of-the-art algorithms for the obtained subproblems (in the ISA example ICA and clustering) and (ii) handles the case of unknown dimensional sources. Our extensive numerical experiments demonstrate the robustness and efficiency of our approach

    Learning with Structured Sparsity: From Discrete to Convex and Back.

    Get PDF
    In modern-data analysis applications, the abundance of data makes extracting meaningful information from it challenging, in terms of computation, storage, and interpretability. In this setting, exploiting sparsity in data has been essential to the development of scalable methods to problems in machine learning, statistics and signal processing. However, in various applications, the input variables exhibit structure beyond simple sparsity. This motivated the introduction of structured sparsity models, which capture such sophisticated structures, leading to a significant performance gains and better interpretability. Structured sparse approaches have been successfully applied in a variety of domains including computer vision, text processing, medical imaging, and bioinformatics. The goal of this thesis is to improve on these methods and expand their success to a wider range of applications. We thus develop novel methods to incorporate general structure a priori in learning problems, which balance computational and statistical efficiency trade-offs. To achieve this, our results bring together tools from the rich areas of discrete and convex optimization. Applying structured sparsity approaches in general is challenging because structures encountered in practice are naturally combinatorial. An effective approach to circumvent this computational challenge is to employ continuous convex relaxations. We thus start by introducing a new class of structured sparsity models, able to capture a large range of structures, which admit tight convex relaxations amenable to efficient optimization. We then present an in-depth study of the geometric and statistical properties of convex relaxations of general combinatorial structures. In particular, we characterize which structure is lost by imposing convexity and which is preserved. We then focus on the optimization of the convex composite problems that result from the convex relaxations of structured sparsity models. We develop efficient algorithmic tools to solve these problems in a non-Euclidean setting, leading to faster convergence in some cases. Finally, to handle structures that do not admit meaningful convex relaxations, we propose to use, as a heuristic, a non-convex proximal gradient method, efficient for several classes of structured sparsity models. We further extend this method to address a probabilistic structured sparsity model, we introduce to model approximately sparse signals
    corecore