331 research outputs found

    Tensor-Compressed Back-Propagation-Free Training for (Physics-Informed) Neural Networks

    Full text link
    Backward propagation (BP) is widely used to compute the gradients in neural network training. However, it is hard to implement BP on edge devices due to the lack of hardware and software resources to support automatic differentiation. This has tremendously increased the design complexity and time-to-market of on-device training accelerators. This paper presents a completely BP-free framework that only requires forward propagation to train realistic neural networks. Our technical contributions are three-fold. Firstly, we present a tensor-compressed variance reduction approach to greatly improve the scalability of zeroth-order (ZO) optimization, making it feasible to handle a network size that is beyond the capability of previous ZO approaches. Secondly, we present a hybrid gradient evaluation approach to improve the efficiency of ZO training. Finally, we extend our BP-free training framework to physics-informed neural networks (PINNs) by proposing a sparse-grid approach to estimate the derivatives in the loss function without using BP. Our BP-free training only loses little accuracy on the MNIST dataset compared with standard first-order training. We also demonstrate successful results in training a PINN for solving a 20-dim Hamiltonian-Jacobi-Bellman PDE. This memory-efficient and BP-free approach may serve as a foundation for the near-future on-device training on many resource-constraint platforms (e.g., FPGA, ASIC, micro-controllers, and photonic chips)

    Zeroth-Order Methods for Convex-Concave Minmax Problems: Applications to Decision-Dependent Risk Minimization

    Full text link
    Min-max optimization is emerging as a key framework for analyzing problems of robustness to strategically and adversarially generated data. We propose a random reshuffling-based gradient free Optimistic Gradient Descent-Ascent algorithm for solving convex-concave min-max problems with finite sum structure. We prove that the algorithm enjoys the same convergence rate as that of zeroth-order algorithms for convex minimization problems. We further specialize the algorithm to solve distributionally robust, decision-dependent learning problems, where gradient information is not readily available. Through illustrative simulations, we observe that our proposed approach learns models that are simultaneously robust against adversarial distribution shifts and strategic decisions from the data sources, and outperforms existing methods from the strategic classification literature.Comment: 32 pages, 5 figure

    Decentralized Gradient-Free Methods for Stochastic Non-Smooth Non-Convex Optimization

    Full text link
    We consider decentralized gradient-free optimization of minimizing Lipschitz continuous functions that satisfy neither smoothness nor convexity assumption. We propose two novel gradient-free algorithms, the Decentralized Gradient-Free Method (DGFM) and its variant, the Decentralized Gradient-Free Method+^+ (DGFM+^{+}). Based on the techniques of randomized smoothing and gradient tracking, DGFM requires the computation of the zeroth-order oracle of a single sample in each iteration, making it less demanding in terms of computational resources for individual computing nodes. Theoretically, DGFM achieves a complexity of O(d3/2δ−1ε−4)\mathcal O(d^{3/2}\delta^{-1}\varepsilon ^{-4}) for obtaining an (δ,ε)(\delta,\varepsilon)-Goldstein stationary point. DGFM+^{+}, an advanced version of DGFM, incorporates variance reduction to further improve the convergence behavior. It samples a mini-batch at each iteration and periodically draws a larger batch of data, which improves the complexity to O(d3/2δ−1ε−3)\mathcal O(d^{3/2}\delta^{-1} \varepsilon^{-3}). Moreover, experimental results underscore the empirical advantages of our proposed algorithms when applied to real-world datasets

    Adaptive Stochastic Optimisation of Nonconvex Composite Objectives

    Full text link
    In this paper, we propose and analyse a family of generalised stochastic composite mirror descent algorithms. With adaptive step sizes, the proposed algorithms converge without requiring prior knowledge of the problem. Combined with an entropy-like update-generating function, these algorithms perform gradient descent in the space equipped with the maximum norm, which allows us to exploit the low-dimensional structure of the decision sets for high-dimensional problems. Together with a sampling method based on the Rademacher distribution and variance reduction techniques, the proposed algorithms guarantee a logarithmic complexity dependence on dimensionality for zeroth-order optimisation problems.Comment: arXiv admin note: substantial text overlap with arXiv:2208.0457

    Small Errors in Random Zeroth Order Optimization are Imaginary

    Full text link
    The vast majority of zeroth order optimization methods try to imitate first order methods via some smooth approximation of the gradient. Here, the smaller the smoothing parameter, the smaller the gradient approximation error. We show that for the majority of zeroth order methods this smoothing parameter can however not be chosen arbitrarily small as numerical cancellation errors will dominate. As such, theoretical and numerical performance could differ significantly. Using classical tools from numerical differentiation we will propose a new smoothed approximation of the gradient that can be integrated into general zeroth order algorithmic frameworks. Since the proposed smoothed approximation does not suffer from cancellation errors, the smoothing parameter (and hence the approximation error) can be made arbitrarily small. Sublinear convergence rates for algorithms based on our smoothed approximation are proved. Numerical experiments are also presented to demonstrate the superiority of algorithms based on the proposed approximation.Comment: New: Figure 3.

    Stochastic Zeroth-order Functional Constrained Optimization: Oracle Complexity and Applications

    Full text link
    Functionally constrained stochastic optimization problems, where neither the objective function nor the constraint functions are analytically available, arise frequently in machine learning applications. In this work, assuming we only have access to the noisy evaluations of the objective and constraint functions, we propose and analyze stochastic zeroth-order algorithms for solving the above class of stochastic optimization problem. When the domain of the functions is Rn\mathbb{R}^n, assuming there are mm constraint functions, we establish oracle complexities of order O((m+1)n/ϵ2)\mathcal{O}((m+1)n/\epsilon^2) and O((m+1)n/ϵ3)\mathcal{O}((m+1)n/\epsilon^3) respectively in the convex and nonconvex setting, where ϵ\epsilon represents the accuracy of the solutions required in appropriately defined metrics. The established oracle complexities are, to our knowledge, the first such results in the literature for functionally constrained stochastic zeroth-order optimization problems. We demonstrate the applicability of our algorithms by illustrating its superior performance on the problem of hyperparameter tuning for sampling algorithms and neural network training.Comment: To appear in INFORMS Journal on Optimizatio
    • …
    corecore