298 research outputs found

    Structured Sparsity: Discrete and Convex approaches

    Full text link
    Compressive sensing (CS) exploits sparsity to recover sparse or compressible signals from dimensionality reducing, non-adaptive sensing mechanisms. Sparsity is also used to enhance interpretability in machine learning and statistics applications: While the ambient dimension is vast in modern data analysis problems, the relevant information therein typically resides in a much lower dimensional space. However, many solutions proposed nowadays do not leverage the true underlying structure. Recent results in CS extend the simple sparsity idea to more sophisticated {\em structured} sparsity models, which describe the interdependency between the nonzero components of a signal, allowing to increase the interpretability of the results and lead to better recovery performance. In order to better understand the impact of structured sparsity, in this chapter we analyze the connections between the discrete models and their convex relaxations, highlighting their relative advantages. We start with the general group sparse model and then elaborate on two important special cases: the dispersive and the hierarchical models. For each, we present the models in their discrete nature, discuss how to solve the ensuing discrete problems and then describe convex relaxations. We also consider more general structures as defined by set functions and present their convex proxies. Further, we discuss efficient optimization solutions for structured sparsity problems and illustrate structured sparsity in action via three applications.Comment: 30 pages, 18 figure

    From Common to Special: When Multi-Attribute Learning Meets Personalized Opinions

    Full text link
    Visual attributes, which refer to human-labeled semantic annotations, have gained increasing popularity in a wide range of real world applications. Generally, the existing attribute learning methods fall into two categories: one focuses on learning user-specific labels separately for different attributes, while the other one focuses on learning crowd-sourced global labels jointly for multiple attributes. However, both categories ignore the joint effect of the two mentioned factors: the personal diversity with respect to the global consensus; and the intrinsic correlation among multiple attributes. To overcome this challenge, we propose a novel model to learn user-specific predictors across multiple attributes. In our proposed model, the diversity of personalized opinions and the intrinsic relationship among multiple attributes are unified in a common-to-special manner. To this end, we adopt a three-component decomposition. Specifically, our model integrates a common cognition factor, an attribute-specific bias factor and a user-specific bias factor. Meanwhile Lasso and group Lasso penalties are adopted to leverage efficient feature selection. Furthermore, theoretical analysis is conducted to show that our proposed method could reach reasonable performance. Eventually, the empirical study carried out in this paper demonstrates the effectiveness of our proposed method

    An Efficient Primal-Dual Prox Method for Non-Smooth Optimization

    Full text link
    We study the non-smooth optimization problems in machine learning, where both the loss function and the regularizer are non-smooth functions. Previous studies on efficient empirical loss minimization assume either a smooth loss function or a strongly convex regularizer, making them unsuitable for non-smooth optimization. We develop a simple yet efficient method for a family of non-smooth optimization problems where the dual form of the loss function is bilinear in primal and dual variables. We cast a non-smooth optimization problem into a minimax optimization problem, and develop a primal dual prox method that solves the minimax optimization problem at a rate of O(1/T)O(1/T) {assuming that the proximal step can be efficiently solved}, significantly faster than a standard subgradient descent method that has an O(1/T)O(1/\sqrt{T}) convergence rate. Our empirical study verifies the efficiency of the proposed method for various non-smooth optimization problems that arise ubiquitously in machine learning by comparing it to the state-of-the-art first order methods

    Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping

    Full text link
    We consider the problem of estimating a sparse multi-response regression function, with an application to expression quantitative trait locus (eQTL) mapping, where the goal is to discover genetic variations that influence gene-expression levels. In particular, we investigate a shrinkage technique capable of capturing a given hierarchical structure over the responses, such as a hierarchical clustering tree with leaf nodes for responses and internal nodes for clusters of related responses at multiple granularity, and we seek to leverage this structure to recover covariates relevant to each hierarchically-defined cluster of responses. We propose a tree-guided group lasso, or tree lasso, for estimating such structured sparsity under multi-response regression by employing a novel penalty function constructed from the tree. We describe a systematic weighting scheme for the overlapping groups in the tree-penalty such that each regression coefficient is penalized in a balanced manner despite the inhomogeneous multiplicity of group memberships of the regression coefficients due to overlaps among groups. For efficient optimization, we employ a smoothing proximal gradient method that was originally developed for a general class of structured-sparsity-inducing penalties. Using simulated and yeast data sets, we demonstrate that our method shows a superior performance in terms of both prediction errors and recovery of true sparsity patterns, compared to other methods for learning a multivariate-response regression.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS549 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Dual Newton Proximal Point Algorithm for Solution Paths of the L1-Regularized Logistic Regression

    Full text link
    The l1-regularized logistic regression is a widely used statistical model in data classification. This paper proposes a dual Newton method based proximal point algorithm (PPDNA) to solve the l1-regularized logistic regression problem with bias term. The global and local convergence of PPDNA hold under mild conditions. The computational cost of a semismooth Newton (Ssn) algoithm for solving subproblems in the PPDNA can be effectively reduced by fully exploiting the second-order sparsity of the problem. We also design an adaptive sieving (AS) strategy to generate solution paths for the l1-regularized logistic regression problem, where each subproblem in the AS strategy is solved by the PPDNA. This strategy exploits active set constraints to reduce the number of variables in the problem, thereby speeding up the PPDNA for solving a series of problems. Numerical experiments demonstrate the superior performance of the PPDNA in comparison with some state-of-the-art second-order algorithms and the efficiency of the AS strategy combined with the PPDNA for generating solution paths

    FIRE: An Optimization Approach for Fast Interpretable Rule Extraction

    Full text link
    We present FIRE, Fast Interpretable Rule Extraction, an optimization-based framework to extract a small but useful collection of decision rules from tree ensembles. FIRE selects sparse representative subsets of rules from tree ensembles, that are easy for a practitioner to examine. To further enhance the interpretability of the extracted model, FIRE encourages fusing rules during selection, so that many of the selected decision rules share common antecedents. The optimization framework utilizes a fusion regularization penalty to accomplish this, along with a non-convex sparsity-inducing penalty to aggressively select rules. Optimization problems in FIRE pose a challenge to off-the-shelf solvers due to problem scale and the non-convexity of the penalties. To address this, making use of problem-structure, we develop a specialized solver based on block coordinate descent principles; our solver performs up to 40x faster than existing solvers. We show in our experiments that FIRE outperforms state-of-the-art rule ensemble algorithms at building sparse rule sets, and can deliver more interpretable models compared to existing methods

    Convex Relaxation for Combinatorial Penalties

    Get PDF
    In this paper, we propose an unifying view of several recently proposed structured sparsity-inducing norms. We consider the situation of a model simultaneously (a) penalized by a set- function de ned on the support of the unknown parameter vector which represents prior knowledge on supports, and (b) regularized in Lp-norm. We show that the natural combinatorial optimization problems obtained may be relaxed into convex optimization problems and introduce a notion, the lower combinatorial envelope of a set-function, that characterizes the tightness of our relaxations. We moreover establish links with norms based on latent representations including the latent group Lasso and block-coding, and with norms obtained from submodular functions.Comment: 35 pag
    • …
    corecore