331 research outputs found
Tensor-Compressed Back-Propagation-Free Training for (Physics-Informed) Neural Networks
Backward propagation (BP) is widely used to compute the gradients in neural
network training. However, it is hard to implement BP on edge devices due to
the lack of hardware and software resources to support automatic
differentiation. This has tremendously increased the design complexity and
time-to-market of on-device training accelerators. This paper presents a
completely BP-free framework that only requires forward propagation to train
realistic neural networks. Our technical contributions are three-fold. Firstly,
we present a tensor-compressed variance reduction approach to greatly improve
the scalability of zeroth-order (ZO) optimization, making it feasible to handle
a network size that is beyond the capability of previous ZO approaches.
Secondly, we present a hybrid gradient evaluation approach to improve the
efficiency of ZO training. Finally, we extend our BP-free training framework to
physics-informed neural networks (PINNs) by proposing a sparse-grid approach to
estimate the derivatives in the loss function without using BP. Our BP-free
training only loses little accuracy on the MNIST dataset compared with standard
first-order training. We also demonstrate successful results in training a PINN
for solving a 20-dim Hamiltonian-Jacobi-Bellman PDE. This memory-efficient and
BP-free approach may serve as a foundation for the near-future on-device
training on many resource-constraint platforms (e.g., FPGA, ASIC,
micro-controllers, and photonic chips)
Zeroth-Order Methods for Convex-Concave Minmax Problems: Applications to Decision-Dependent Risk Minimization
Min-max optimization is emerging as a key framework for analyzing problems of
robustness to strategically and adversarially generated data. We propose a
random reshuffling-based gradient free Optimistic Gradient Descent-Ascent
algorithm for solving convex-concave min-max problems with finite sum
structure. We prove that the algorithm enjoys the same convergence rate as that
of zeroth-order algorithms for convex minimization problems. We further
specialize the algorithm to solve distributionally robust, decision-dependent
learning problems, where gradient information is not readily available. Through
illustrative simulations, we observe that our proposed approach learns models
that are simultaneously robust against adversarial distribution shifts and
strategic decisions from the data sources, and outperforms existing methods
from the strategic classification literature.Comment: 32 pages, 5 figure
Decentralized Gradient-Free Methods for Stochastic Non-Smooth Non-Convex Optimization
We consider decentralized gradient-free optimization of minimizing Lipschitz
continuous functions that satisfy neither smoothness nor convexity assumption.
We propose two novel gradient-free algorithms, the Decentralized Gradient-Free
Method (DGFM) and its variant, the Decentralized Gradient-Free Method
(DGFM). Based on the techniques of randomized smoothing and gradient
tracking, DGFM requires the computation of the zeroth-order oracle of a single
sample in each iteration, making it less demanding in terms of computational
resources for individual computing nodes. Theoretically, DGFM achieves a
complexity of for obtaining
an -Goldstein stationary point. DGFM, an advanced
version of DGFM, incorporates variance reduction to further improve the
convergence behavior. It samples a mini-batch at each iteration and
periodically draws a larger batch of data, which improves the complexity to
. Moreover, experimental
results underscore the empirical advantages of our proposed algorithms when
applied to real-world datasets
Adaptive Stochastic Optimisation of Nonconvex Composite Objectives
In this paper, we propose and analyse a family of generalised stochastic
composite mirror descent algorithms. With adaptive step sizes, the proposed
algorithms converge without requiring prior knowledge of the problem. Combined
with an entropy-like update-generating function, these algorithms perform
gradient descent in the space equipped with the maximum norm, which allows us
to exploit the low-dimensional structure of the decision sets for
high-dimensional problems. Together with a sampling method based on the
Rademacher distribution and variance reduction techniques, the proposed
algorithms guarantee a logarithmic complexity dependence on dimensionality for
zeroth-order optimisation problems.Comment: arXiv admin note: substantial text overlap with arXiv:2208.0457
Small Errors in Random Zeroth Order Optimization are Imaginary
The vast majority of zeroth order optimization methods try to imitate first
order methods via some smooth approximation of the gradient. Here, the smaller
the smoothing parameter, the smaller the gradient approximation error. We show
that for the majority of zeroth order methods this smoothing parameter can
however not be chosen arbitrarily small as numerical cancellation errors will
dominate. As such, theoretical and numerical performance could differ
significantly. Using classical tools from numerical differentiation we will
propose a new smoothed approximation of the gradient that can be integrated
into general zeroth order algorithmic frameworks. Since the proposed smoothed
approximation does not suffer from cancellation errors, the smoothing parameter
(and hence the approximation error) can be made arbitrarily small. Sublinear
convergence rates for algorithms based on our smoothed approximation are
proved. Numerical experiments are also presented to demonstrate the superiority
of algorithms based on the proposed approximation.Comment: New: Figure 3.
Stochastic Zeroth-order Functional Constrained Optimization: Oracle Complexity and Applications
Functionally constrained stochastic optimization problems, where neither the
objective function nor the constraint functions are analytically available,
arise frequently in machine learning applications. In this work, assuming we
only have access to the noisy evaluations of the objective and constraint
functions, we propose and analyze stochastic zeroth-order algorithms for
solving the above class of stochastic optimization problem. When the domain of
the functions is , assuming there are constraint functions,
we establish oracle complexities of order and
respectively in the convex and nonconvex
setting, where represents the accuracy of the solutions required in
appropriately defined metrics. The established oracle complexities are, to our
knowledge, the first such results in the literature for functionally
constrained stochastic zeroth-order optimization problems. We demonstrate the
applicability of our algorithms by illustrating its superior performance on the
problem of hyperparameter tuning for sampling algorithms and neural network
training.Comment: To appear in INFORMS Journal on Optimizatio
- …