967 research outputs found

    On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization

    Full text link
    Conventional wisdom in deep learning states that increasing depth improves expressiveness but complicates optimization. This paper suggests that, sometimes, increasing depth can speed up optimization. The effect of depth on optimization is decoupled from expressiveness by focusing on settings where additional layers amount to overparameterization - linear neural networks, a well-studied model. Theoretical analysis, as well as experiments, show that here depth acts as a preconditioner which may accelerate convergence. Even on simple convex problems such as linear regression with â„“p\ell_p loss, p>2p>2, gradient descent can benefit from transitioning to a non-convex overparameterized objective, more than it would from some common acceleration schemes. We also prove that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer.Comment: Published at the International Conference on Machine Learning (ICML) 201

    Gaussian Max-Value Entropy Search for Multi-Agent Bayesian Optimization

    Full text link
    We study the multi-agent Bayesian optimization (BO) problem, where multiple agents maximize a black-box function via iterative queries. We focus on Entropy Search (ES), a sample-efficient BO algorithm that selects queries to maximize the mutual information about the maximum of the black-box function. One of the main challenges of ES is that calculating the mutual information requires computationally-costly approximation techniques. For multi-agent BO problems, the computational cost of ES is exponential in the number of agents. To address this challenge, we propose the Gaussian Max-value Entropy Search, a multi-agent BO algorithm with favorable sample and computational efficiency. The key to our idea is to use a normal distribution to approximate the function maximum and calculate its mutual information accordingly. The resulting approximation allows queries to be cast as the solution of a closed-form optimization problem which, in turn, can be solved via a modified gradient ascent algorithm and scaled to a large number of agents. We demonstrate the effectiveness of Gaussian max-value Entropy Search through numerical experiments on standard test functions and real-robot experiments on the source-seeking problem. Results show that the proposed algorithm outperforms the multi-agent BO baselines in the numerical experiments and can stably seek the source with a limited number of noisy observations on real robots.Comment: 10 pages, 9 figure

    Safety-aware Semi-end-to-end Coordinated Decision Model for Voltage Regulation in Active Distribution Network

    Full text link
    Prediction plays a vital role in the active distribution network voltage regulation under the high penetration of photovoltaics. Current prediction models aim at minimizing individual prediction errors but overlook their collective impacts on downstream decision-making. Hence, this paper proposes a safety-aware semi-end-to-end coordinated decision model to bridge the gap from the downstream voltage regulation to the upstream multiple prediction models in a coordinated differential way. The semi-end-to-end model maps the input features to the optimal var decisions via prediction, decision-making, and decision-evaluating layers. It leverages the neural network and the second-order cone program (SOCP) to formulate the stochastic PV/load predictions and the var decision-making/evaluating separately. Then the var decision quality is evaluated via the weighted sum of the power loss for economy and the voltage violation penalty for safety, denoted by regulation loss. Based on the regulation loss and prediction errors, this paper proposes the hybrid loss and hybrid stochastic gradient descent algorithm to back-propagate the gradients of the hybrid loss with respect to multiple predictions for enhancing decision quality. Case studies verify the effectiveness of the proposed model with lower power loss for economy and lower voltage violation rate for safety awareness
    • …
    corecore