28,258 research outputs found

    A parallelizable augmented Lagrangian method applied to large-scale non-convex-constrained optimization problems

    Get PDF
    We contribute improvements to a Lagrangian dual solution approach applied to large-scale optimization problems whose objective functions are convex, continuously differentiable and possibly nonlinear, while the non-relaxed constraint set is compact but not necessarily convex. Such problems arise, for example, in the split-variable deterministic reformulation of stochastic mixed-integer optimization problems. We adapt the augmented Lagrangian method framework to address the presence of nonconvexity in the non-relaxed constraint set and to enable efficient parallelization. The development of our approach is most naturally compared with the development of proximal bundle methods and especially with their use of serious step conditions. However, deviations from these developments allow for an improvement in efficiency with which parallelization can be utilized. Pivotal in our modification to the augmented Lagrangian method is an integration of the simplicial decomposition method and the nonlinear block Gauss-Seidel method. An adaptation of a serious step condition associated with proximal bundle methods allows for the approximation tolerance to be automatically adjusted. Under mild conditions optimal dual convergence is proven, and we report computational results on test instances from the stochastic optimization literature. We demonstrate improvement in parallel speedup over a baseline parallel approach

    Stochastic Training of Neural Networks via Successive Convex Approximations

    Full text link
    This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of non-convex optimization, going under the general name of successive convex approximation (SCA) techniques. The basic idea is to iteratively replace the original (non-convex, highly dimensional) learning problem with a sequence of (strongly convex) approximations, which are both accurate and simple to optimize. Differently from similar ideas (e.g., quasi-Newton algorithms), the approximations can be constructed using only first-order information of the neural network function, in a stochastic fashion, while exploiting the overall structure of the learning problem for a faster convergence. We discuss several use cases, based on different choices for the loss function (e.g., squared loss and cross-entropy loss), and for the regularization of the NN's weights. We experiment on several medium-sized benchmark problems, and on a large-scale dataset involving simulated physical data. The results show how the algorithm outperforms state-of-the-art techniques, providing faster convergence to a better minimum. Additionally, we show how the algorithm can be easily parallelized over multiple computational units without hindering its performance. In particular, each computational unit can optimize a tailored surrogate function defined on a randomly assigned subset of the input variables, whose dimension can be selected depending entirely on the available computational power.Comment: Preprint submitted to IEEE Transactions on Neural Networks and Learning System

    CoCoA: A General Framework for Communication-Efficient Distributed Optimization

    Get PDF
    The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning. We present a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing. We extend the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso, sparse logistic regression, and elastic net regularization, and show how earlier work can be derived as a special case. We provide convergence guarantees for the class of convex regularized loss minimization objectives, leveraging a novel approach in handling non-strongly-convex regularizers and non-smooth loss functions. The resulting framework has markedly improved performance over state-of-the-art methods, as we illustrate with an extensive set of experiments on real distributed datasets

    Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization

    Full text link
    We suggest a general oracle-based framework that captures different parallel stochastic optimization settings described by a dependency graph, and derive generic lower bounds in terms of this graph. We then use the framework and derive lower bounds for several specific parallel optimization settings, including delayed updates and parallel processing with intermittent communication. We highlight gaps between lower and upper bounds on the oracle complexity, and cases where the "natural" algorithms are not known to be optimal
    • …
    corecore