2,259 research outputs found
Primal-Dual Rates and Certificates
We propose an algorithm-independent framework to equip existing optimization
methods with primal-dual certificates. Such certificates and corresponding rate
of convergence guarantees are important for practitioners to diagnose progress,
in particular in machine learning applications. We obtain new primal-dual
convergence rates, e.g., for the Lasso as well as many L1, Elastic Net, group
Lasso and TV-regularized problems. The theory applies to any norm-regularized
generalized linear model. Our approach provides efficiently computable duality
gaps which are globally defined, without modifying the original problems in the
region of interest.Comment: appearing at ICML 2016 - Proceedings of the 33rd International
Conference on Machine Learning, New York, NY, USA, 2016. JMLR: W&CP volume 4
Decomposition Techniques for Bilinear Saddle Point Problems and Variational Inequalities with Affine Monotone Operators on Domains Given by Linear Minimization Oracles
The majority of First Order methods for large-scale convex-concave saddle
point problems and variational inequalities with monotone operators are
proximal algorithms which at every iteration need to minimize over problem's
domain X the sum of a linear form and a strongly convex function. To make such
an algorithm practical, X should be proximal-friendly -- admit a strongly
convex function with easy to minimize linear perturbations. As a byproduct, X
admits a computationally cheap Linear Minimization Oracle (LMO) capable to
minimize over X linear forms. There are, however, important situations where a
cheap LMO indeed is available, but X is not proximal-friendly, which motivates
search for algorithms based solely on LMO's. For smooth convex minimization,
there exists a classical LMO-based algorithm -- Conditional Gradient. In
contrast, known to us LMO-based techniques for other problems with convex
structure (nonsmooth convex minimization, convex-concave saddle point problems,
even as simple as bilinear ones, and variational inequalities with monotone
operators, even as simple as affine) are quite recent and utilize common
approach based on Fenchel-type representations of the associated
objectives/vector fields. The goal of this paper is to develop an alternative
(and seemingly much simpler) LMO-based decomposition techniques for bilinear
saddle point problems and for variational inequalities with affine monotone
operators
L1-Regularized Distributed Optimization: A Communication-Efficient Primal-Dual Framework
Despite the importance of sparsity in many large-scale applications, there
are few methods for distributed optimization of sparsity-inducing objectives.
In this paper, we present a communication-efficient framework for
L1-regularized optimization in the distributed environment. By viewing
classical objectives in a more general primal-dual setting, we develop a new
class of methods that can be efficiently distributed and applied to common
sparsity-inducing models, such as Lasso, sparse logistic regression, and
elastic net-regularized problems. We provide theoretical convergence guarantees
for our framework, and demonstrate its efficiency and flexibility with a
thorough experimental comparison on Amazon EC2. Our proposed framework yields
speedups of up to 50x as compared to current state-of-the-art methods for
distributed L1-regularized optimization
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
The scale of modern datasets necessitates the development of efficient
distributed optimization methods for machine learning. We present a
general-purpose framework for distributed computing environments, CoCoA, that
has an efficient communication scheme and is applicable to a wide variety of
problems in machine learning and signal processing. We extend the framework to
cover general non-strongly-convex regularizers, including L1-regularized
problems like lasso, sparse logistic regression, and elastic net
regularization, and show how earlier work can be derived as a special case. We
provide convergence guarantees for the class of convex regularized loss
minimization objectives, leveraging a novel approach in handling
non-strongly-convex regularizers and non-smooth loss functions. The resulting
framework has markedly improved performance over state-of-the-art methods, as
we illustrate with an extensive set of experiments on real distributed
datasets
OSQP: An Operator Splitting Solver for Quadratic Programs
We present a general-purpose solver for convex quadratic programs based on
the alternating direction method of multipliers, employing a novel operator
splitting technique that requires the solution of a quasi-definite linear
system with the same coefficient matrix at almost every iteration. Our
algorithm is very robust, placing no requirements on the problem data such as
positive definiteness of the objective function or linear independence of the
constraint functions. It can be configured to be division-free once an initial
matrix factorization is carried out, making it suitable for real-time
applications in embedded systems. In addition, our technique is the first
operator splitting method for quadratic programs able to reliably detect primal
and dual infeasible problems from the algorithm iterates. The method also
supports factorization caching and warm starting, making it particularly
efficient when solving parametrized problems arising in finance, control, and
machine learning. Our open-source C implementation OSQP has a small footprint,
is library-free, and has been extensively tested on many problem instances from
a wide variety of application areas. It is typically ten times faster than
competing interior-point methods, and sometimes much more when factorization
caching or warm start is used. OSQP has already shown a large impact with tens
of thousands of users both in academia and in large corporations
Faster Coordinate Descent via Adaptive Importance Sampling
Coordinate descent methods employ random partial updates of decision
variables in order to solve huge-scale convex optimization problems. In this
work, we introduce new adaptive rules for the random selection of their
updates. By adaptive, we mean that our selection rules are based on the dual
residual or the primal-dual gap estimates and can change at each iteration. We
theoretically characterize the performance of our selection rules and
demonstrate improvements over the state-of-the-art, and extend our theory and
algorithms to general convex objectives. Numerical evidence with hinge-loss
support vector machines and Lasso confirm that the practice follows the theory.Comment: appearing at AISTATS 201
Comparing Experiments to the Fault-Tolerance Threshold
Achieving error rates that meet or exceed the fault-tolerance threshold is a
central goal for quantum computing experiments, and measuring these error rates
using randomized benchmarking is now routine. However, direct comparison
between measured error rates and thresholds is complicated by the fact that
benchmarking estimates average error rates while thresholds reflect worst-case
behavior when a gate is used as part of a large computation. These two measures
of error can differ by orders of magnitude in the regime of interest. Here we
facilitate comparison between the experimentally accessible average error rates
and the worst-case quantities that arise in current threshold theorems by
deriving relations between the two for a variety of physical noise sources. Our
results indicate that it is coherent errors that lead to an enormous mismatch
between average and worst case, and we quantify how well these errors must be
controlled to ensure fair comparison between average error probabilities and
fault-tolerance thresholds.Comment: 5 pages, 2 figures, 13 page appendi
- …