826 research outputs found
Penalty Methods with Stochastic Approximation for Stochastic Nonlinear Programming
In this paper, we propose a class of penalty methods with stochastic
approximation for solving stochastic nonlinear programming problems. We assume
that only noisy gradients or function values of the objective function are
available via calls to a stochastic first-order or zeroth-order oracle. In each
iteration of the proposed methods, we minimize an exact penalty function which
is nonsmooth and nonconvex with only stochastic first-order or zeroth-order
information available. Stochastic approximation algorithms are presented for
solving this particular subproblem. The worst-case complexity of calls to the
stochastic first-order (or zeroth-order) oracle for the proposed penalty
methods for obtaining an -stochastic critical point is analyzed
Zeroth-Order Stochastic Block Coordinate Type Methods for Nonconvex Optimization
We study (constrained) nonconvex (composite) optimization problems where the
decision variables vector can be split into blocks of variables. Random block
projection is a popular technique to handle this kind of problem for its
remarkable reduction of the computational cost from the projection. However,
this powerful method has not been proposed for the situation that first-order
information is prohibited and only zeroth-order information is available. In
this paper, we propose to develop different classes of zeroth-order stochastic
block coordinate type methods. Zeroth-order block coordinate descent (ZS-BCD)
is proposed for solving unconstrained nonconvex optimization problem. For
composite optimization, we establish the zeroth-order stochastic block mirror
descent (ZS-BMD) and its associated two-phase method to achieve the complexity
bound for finding -solution. Furthermore, we also establish
zeroth-order stochastic block coordinate conditional gradient (ZS-BCCG) method
for nonconvex (composite) optimization. By implementing ZS-BCCG method, in each
iteration, only (approximate) linear programming subproblem needs to be solved
on a random block instead of a rather costly projection subproblem on the whole
decision space, in contrast to the existing traditional stochastic
approximation methods. In what follows, an approximate ZS-BCCG method and
corresponding two-phase ZS-BCCG method are proposed. This is also the first
time that a two-phase BCCG method has been developed to achieve the -solution of nonconvex composite optimization problem. To the best of
our knowledge, the proposed results in this paper are new in stochastic
nonconvex (composite) optimization literature.Comment: 39page
Zeroth-order Nonconvex Stochastic Optimization: Handling Constraints, High-Dimensionality and Saddle-Points
In this paper, we propose and analyze zeroth-order stochastic approximation
algorithms for nonconvex and convex optimization, with a focus on addressing
constrained optimization, high-dimensional setting and saddle-point avoiding.
To handle constrained optimization, we first propose generalizations of the
conditional gradient algorithm achieving rates similar to the standard
stochastic gradient algorithm using only zeroth-order information. To
facilitate zeroth-order optimization in high-dimensions, we explore the
advantages of structural sparsity assumptions. Specifically, (i) we highlight
an implicit regularization phenomenon where the standard stochastic gradient
algorithm with zeroth-order information adapts to the sparsity of the problem
at hand by just varying the step-size and (ii) propose a truncated stochastic
gradient algorithm with zeroth-order information, whose rate of convergence
depends only poly-logarithmically on the dimensionality. We next focus on
avoiding saddle-points in non-convex setting. Towards that, we interpret the
Gaussian smoothing technique for estimating gradient based on zeroth-order
information as an instantiation of first-order Stein's identity. Based on this,
we provide a novel linear-(in dimension) time estimator of the Hessian matrix
of a function using only zeroth-order information, which is based on
second-order Stein's identity. We then provide an algorithm for avoiding
saddle-points, which is based on a zeroth-order cubic regularization Newton's
method and discuss its convergence rates
A Proximal Zeroth-Order Algorithm for Nonconvex Nonsmooth Problems
In this paper, we focus on solving an important class of nonconvex
optimization problems which includes many problems for example signal
processing over a networked multi-agent system and distributed learning over
networks. Motivated by many applications in which the local objective function
is the sum of smooth but possibly nonconvex part, and non-smooth but convex
part subject to a linear equality constraint, this paper proposes a proximal
zeroth-order primal dual algorithm (PZO-PDA) that accounts for the information
structure of the problem. This algorithm only utilize the zeroth-order
information (i.e., the functional values) of smooth functions, yet the
flexibility is achieved for applications that only noisy information of the
objective function is accessible, where classical methods cannot be applied. We
prove convergence and rate of convergence for PZO-PDA. Numerical experiments
are provided to validate the theoretical results
Mini-batch Stochastic Approximation Methods for Nonconvex Stochastic Composite Optimization
This paper considers a class of constrained stochastic composite optimization
problems whose objective function is given by the summation of a differentiable
(possibly nonconvex) component, together with a certain non-differentiable (but
convex) component. In order to solve these problems, we propose a randomized
stochastic projected gradient (RSPG) algorithm, in which proper mini-batch of
samples are taken at each iteration depending on the total budget of stochastic
samples allowed. The RSPG algorithm also employs a general distance function to
allow taking advantage of the geometry of the feasible region. Complexity of
this algorithm is established in a unified setting, which shows nearly optimal
complexity of the algorithm for convex stochastic programming. A
post-optimization phase is also proposed to significantly reduce the variance
of the solutions returned by the algorithm. In addition, based on the RSPG
algorithm, a stochastic gradient free algorithm, which only uses the stochastic
zeroth-order information, has been also discussed. Some preliminary numerical
results are also provided.Comment: 32 page
Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic Optimization Problems
In this paper, we design and analyze a new family of adaptive subgradient
methods for solving an important class of weakly convex (possibly nonsmooth)
stochastic optimization problems. Adaptive methods that use exponential moving
averages of past gradients to update search directions and learning rates have
recently attracted a lot of attention for solving optimization problems that
arise in machine learning. Nevertheless, their convergence analysis almost
exclusively requires smoothness and/or convexity of the objective function. In
contrast, we establish non-asymptotic rates of convergence of first and
zeroth-order adaptive methods and their proximal variants for a reasonably
broad class of nonsmooth \& nonconvex optimization problems. Experimental
results indicate how the proposed algorithms empirically outperform stochastic
gradient descent and its zeroth-order variant for solving such optimization
problems
Gradient-Free Multi-Agent Nonconvex Nonsmooth Optimization
In this paper, we consider the problem of minimizing the sum of nonconvex and
possibly nonsmooth functions over a connected multi-agent network, where the
agents have partial knowledge about the global cost function and can only
access the zeroth-order information (i.e., the functional values) of their
local cost functions. We propose and analyze a distributed primal-dual
gradient-free algorithm for this challenging problem. We show that by
appropriately choosing the parameters, the proposed algorithm converges to the
set of first order stationary solutions with a provable global sublinear
convergence rate. Numerical experiments demonstrate the effectiveness of our
proposed method for optimizing nonconvex and nonsmooth problems over a network.Comment: Long version of CDC pape
Zeroth-Order Stochastic Alternating Direction Method of Multipliers for Nonconvex Nonsmooth Optimization
Alternating direction method of multipliers (ADMM) is a popular optimization
tool for the composite and constrained problems in machine learning. However,
in many machine learning problems such as black-box attacks and bandit
feedback, ADMM could fail because the explicit gradients of these problems are
difficult or infeasible to obtain. Zeroth-order (gradient-free) methods can
effectively solve these problems due to that the objective function values are
only required in the optimization. Recently, though there exist a few
zeroth-order ADMM methods, they build on the convexity of objective function.
Clearly, these existing zeroth-order methods are limited in many applications.
In the paper, thus, we propose a class of fast zeroth-order stochastic ADMM
methods (i.e., ZO-SVRG-ADMM and ZO-SAGA-ADMM) for solving nonconvex problems
with multiple nonsmooth penalties, based on the coordinate smoothing gradient
estimator. Moreover, we prove that both the ZO-SVRG-ADMM and ZO-SAGA-ADMM have
convergence rate of , where denotes the number of iterations. In
particular, our methods not only reach the best convergence rate for
the nonconvex optimization, but also are able to effectively solve many complex
machine learning problems with multiple regularized penalties and constraints.
Finally, we conduct the experiments of black-box binary classification and
structured adversarial attack on black-box deep neural network to validate the
efficiency of our algorithms.Comment: To Appear in IJCAI 2019. Supplementary materials are adde
Zeroth Order Nonconvex Multi-Agent Optimization over Networks
In this paper, we consider distributed optimization problems over a
multi-agent network, where each agent can only partially evaluate the objective
function, and it is allowed to exchange messages with its immediate neighbors.
Differently from all existing works on distributed optimization, our focus is
given to optimizing a class of non-convex problems, and under the challenging
setting where each agent can only access the zeroth-order information (i.e.,
the functional values) of its local functions. For different types of network
topologies such as undirected connected networks or star networks, we develop
efficient distributed algorithms and rigorously analyze their convergence and
rate of convergence (to the set of stationary solutions). Numerical results are
provided to demonstrate the efficiency of the proposed algorithms
Semantics, Representations and Grammars for Deep Learning
Deep learning is currently the subject of intensive study. However,
fundamental concepts such as representations are not formally defined --
researchers "know them when they see them" -- and there is no common language
for describing and analyzing algorithms. This essay proposes an abstract
framework that identifies the essential features of current practice and may
provide a foundation for future developments.
The backbone of almost all deep learning algorithms is backpropagation, which
is simply a gradient computation distributed over a neural network. The main
ingredients of the framework are thus, unsurprisingly: (i) game theory, to
formalize distributed optimization; and (ii) communication protocols, to track
the flow of zeroth and first-order information. The framework allows natural
definitions of semantics (as the meaning encoded in functions), representations
(as functions whose semantics is chosen to optimized a criterion) and grammars
(as communication protocols equipped with first-order convergence guarantees).
Much of the essay is spent discussing examples taken from the literature. The
ultimate aim is to develop a graphical language for describing the structure of
deep learning algorithms that backgrounds the details of the optimization
procedure and foregrounds how the components interact. Inspiration is taken
from probabilistic graphical models and factor graphs, which capture the
essential structural features of multivariate distributions.Comment: 20 pages, many diagram
- …