298 research outputs found
Structured Sparsity: Discrete and Convex approaches
Compressive sensing (CS) exploits sparsity to recover sparse or compressible
signals from dimensionality reducing, non-adaptive sensing mechanisms. Sparsity
is also used to enhance interpretability in machine learning and statistics
applications: While the ambient dimension is vast in modern data analysis
problems, the relevant information therein typically resides in a much lower
dimensional space. However, many solutions proposed nowadays do not leverage
the true underlying structure. Recent results in CS extend the simple sparsity
idea to more sophisticated {\em structured} sparsity models, which describe the
interdependency between the nonzero components of a signal, allowing to
increase the interpretability of the results and lead to better recovery
performance. In order to better understand the impact of structured sparsity,
in this chapter we analyze the connections between the discrete models and
their convex relaxations, highlighting their relative advantages. We start with
the general group sparse model and then elaborate on two important special
cases: the dispersive and the hierarchical models. For each, we present the
models in their discrete nature, discuss how to solve the ensuing discrete
problems and then describe convex relaxations. We also consider more general
structures as defined by set functions and present their convex proxies.
Further, we discuss efficient optimization solutions for structured sparsity
problems and illustrate structured sparsity in action via three applications.Comment: 30 pages, 18 figure
From Common to Special: When Multi-Attribute Learning Meets Personalized Opinions
Visual attributes, which refer to human-labeled semantic annotations, have
gained increasing popularity in a wide range of real world applications.
Generally, the existing attribute learning methods fall into two categories:
one focuses on learning user-specific labels separately for different
attributes, while the other one focuses on learning crowd-sourced global labels
jointly for multiple attributes. However, both categories ignore the joint
effect of the two mentioned factors: the personal diversity with respect to the
global consensus; and the intrinsic correlation among multiple attributes. To
overcome this challenge, we propose a novel model to learn user-specific
predictors across multiple attributes. In our proposed model, the diversity of
personalized opinions and the intrinsic relationship among multiple attributes
are unified in a common-to-special manner. To this end, we adopt a
three-component decomposition. Specifically, our model integrates a common
cognition factor, an attribute-specific bias factor and a user-specific bias
factor. Meanwhile Lasso and group Lasso penalties are adopted to leverage
efficient feature selection. Furthermore, theoretical analysis is conducted to
show that our proposed method could reach reasonable performance. Eventually,
the empirical study carried out in this paper demonstrates the effectiveness of
our proposed method
An Efficient Primal-Dual Prox Method for Non-Smooth Optimization
We study the non-smooth optimization problems in machine learning, where both
the loss function and the regularizer are non-smooth functions. Previous
studies on efficient empirical loss minimization assume either a smooth loss
function or a strongly convex regularizer, making them unsuitable for
non-smooth optimization. We develop a simple yet efficient method for a family
of non-smooth optimization problems where the dual form of the loss function is
bilinear in primal and dual variables. We cast a non-smooth optimization
problem into a minimax optimization problem, and develop a primal dual prox
method that solves the minimax optimization problem at a rate of
{assuming that the proximal step can be efficiently solved}, significantly
faster than a standard subgradient descent method that has an
convergence rate. Our empirical study verifies the efficiency of the proposed
method for various non-smooth optimization problems that arise ubiquitously in
machine learning by comparing it to the state-of-the-art first order methods
Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping
We consider the problem of estimating a sparse multi-response regression
function, with an application to expression quantitative trait locus (eQTL)
mapping, where the goal is to discover genetic variations that influence
gene-expression levels. In particular, we investigate a shrinkage technique
capable of capturing a given hierarchical structure over the responses, such as
a hierarchical clustering tree with leaf nodes for responses and internal nodes
for clusters of related responses at multiple granularity, and we seek to
leverage this structure to recover covariates relevant to each
hierarchically-defined cluster of responses. We propose a tree-guided group
lasso, or tree lasso, for estimating such structured sparsity under
multi-response regression by employing a novel penalty function constructed
from the tree. We describe a systematic weighting scheme for the overlapping
groups in the tree-penalty such that each regression coefficient is penalized
in a balanced manner despite the inhomogeneous multiplicity of group
memberships of the regression coefficients due to overlaps among groups. For
efficient optimization, we employ a smoothing proximal gradient method that was
originally developed for a general class of structured-sparsity-inducing
penalties. Using simulated and yeast data sets, we demonstrate that our method
shows a superior performance in terms of both prediction errors and recovery of
true sparsity patterns, compared to other methods for learning a
multivariate-response regression.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS549 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Dual Newton Proximal Point Algorithm for Solution Paths of the L1-Regularized Logistic Regression
The l1-regularized logistic regression is a widely used statistical model in
data classification. This paper proposes a dual Newton method based proximal
point algorithm (PPDNA) to solve the l1-regularized logistic regression problem
with bias term. The global and local convergence of PPDNA hold under mild
conditions. The computational cost of a semismooth Newton (Ssn) algoithm for
solving subproblems in the PPDNA can be effectively reduced by fully exploiting
the second-order sparsity of the problem. We also design an adaptive sieving
(AS) strategy to generate solution paths for the l1-regularized logistic
regression problem, where each subproblem in the AS strategy is solved by the
PPDNA. This strategy exploits active set constraints to reduce the number of
variables in the problem, thereby speeding up the PPDNA for solving a series of
problems. Numerical experiments demonstrate the superior performance of the
PPDNA in comparison with some state-of-the-art second-order algorithms and the
efficiency of the AS strategy combined with the PPDNA for generating solution
paths
FIRE: An Optimization Approach for Fast Interpretable Rule Extraction
We present FIRE, Fast Interpretable Rule Extraction, an optimization-based
framework to extract a small but useful collection of decision rules from tree
ensembles. FIRE selects sparse representative subsets of rules from tree
ensembles, that are easy for a practitioner to examine. To further enhance the
interpretability of the extracted model, FIRE encourages fusing rules during
selection, so that many of the selected decision rules share common
antecedents. The optimization framework utilizes a fusion regularization
penalty to accomplish this, along with a non-convex sparsity-inducing penalty
to aggressively select rules. Optimization problems in FIRE pose a challenge to
off-the-shelf solvers due to problem scale and the non-convexity of the
penalties. To address this, making use of problem-structure, we develop a
specialized solver based on block coordinate descent principles; our solver
performs up to 40x faster than existing solvers. We show in our experiments
that FIRE outperforms state-of-the-art rule ensemble algorithms at building
sparse rule sets, and can deliver more interpretable models compared to
existing methods
Convex Relaxation for Combinatorial Penalties
In this paper, we propose an unifying view of several recently proposed
structured sparsity-inducing norms. We consider the situation of a model
simultaneously (a) penalized by a set- function de ned on the support of the
unknown parameter vector which represents prior knowledge on supports, and (b)
regularized in Lp-norm. We show that the natural combinatorial optimization
problems obtained may be relaxed into convex optimization problems and
introduce a notion, the lower combinatorial envelope of a set-function, that
characterizes the tightness of our relaxations. We moreover establish links
with norms based on latent representations including the latent group Lasso and
block-coding, and with norms obtained from submodular functions.Comment: 35 pag
- …