90,368 research outputs found
Sparse projections onto the simplex
Most learning methods with rank or sparsity constraints use convex
relaxations, which lead to optimization with the nuclear norm or the
-norm. However, several important learning applications cannot benefit
from this approach as they feature these convex norms as constraints in
addition to the non-convex rank and sparsity constraints. In this setting, we
derive efficient sparse projections onto the simplex and its extension, and
illustrate how to use them to solve high-dimensional learning problems in
quantum tomography, sparse density estimation and portfolio selection with
non-convex constraints.Comment: 9 Page
Proximally Constrained Methods for Weakly Convex Optimization with Weakly Convex Constraints
Optimization models with non-convex constraints arise in many tasks in
machine learning, e.g., learning with fairness constraints or Neyman-Pearson
classification with non-convex loss. Although many efficient methods have been
developed with theoretical convergence guarantees for non-convex unconstrained
problems, it remains a challenge to design provably efficient algorithms for
problems with non-convex functional constraints. This paper proposes a class of
subgradient methods for constrained optimization where the objective function
and the constraint functions are are weakly convex. Our methods solve a
sequence of strongly convex subproblems, where a proximal term is added to both
the objective function and each constraint function. Each subproblem can be
solved by various algorithms for strongly convex optimization. Under a uniform
Slater's condition, we establish the computation complexities of our methods
for finding a nearly stationary point
Decentralized Non-Convex Learning with Linearly Coupled Constraints
Motivated by the need for decentralized learning, this paper aims at
designing a distributed algorithm for solving nonconvex problems with general
linear constraints over a multi-agent network. In the considered problem, each
agent owns some local information and a local variable for jointly minimizing a
cost function, but local variables are coupled by linear constraints. Most of
the existing methods for such problems are only applicable for convex problems
or problems with specific linear constraints. There still lacks a distributed
algorithm for such problems with general linear constraints and under nonconvex
setting. In this paper, to tackle this problem, we propose a new algorithm,
called "proximal dual consensus" (PDC) algorithm, which combines a proximal
technique and a dual consensus method. We build the theoretical convergence
conditions and show that the proposed PDC algorithm can converge to an
-Karush-Kuhn-Tucker solution within
iterations. For computation reduction, the PDC algorithm can choose to perform
cheap gradient descent per iteration while preserving the same order of
iteration complexity. Numerical results are presented
to demonstrate the good performance of the proposed algorithms for solving a
regression problem and a classification problem over a network where agents
have only partial observations of data features
Structured sparsity-inducing norms through submodular functions
Sparse methods for supervised learning aim at finding good linear predictors
from as few variables as possible, i.e., with small cardinality of their
supports. This combinatorial selection problem is often turned into a convex
optimization problem by replacing the cardinality function by its convex
envelope (tightest convex lower bound), in this case the L1-norm. In this
paper, we investigate more general set-functions than the cardinality, that may
incorporate prior knowledge or structural constraints which are common in many
applications: namely, we show that for nondecreasing submodular set-functions,
the corresponding convex envelope can be obtained from its \lova extension, a
common tool in submodular analysis. This defines a family of polyhedral norms,
for which we provide generic algorithmic tools (subgradients and proximal
operators) and theoretical results (conditions for support recovery or
high-dimensional inference). By selecting specific submodular functions, we can
give a new interpretation to known norms, such as those based on
rank-statistics or grouped norms with potentially overlapping groups; we also
define new norms, in particular ones that can be used as non-factorial priors
for supervised learning
- …