1,420 research outputs found

    Stochastic Optimization with Importance Sampling

    Full text link
    Uniform sampling of training data has been commonly used in traditional stochastic optimization algorithms such as Proximal Stochastic Gradient Descent (prox-SGD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA). Although uniform sampling can guarantee that the sampled stochastic quantity is an unbiased estimate of the corresponding true quantity, the resulting estimator may have a rather high variance, which negatively affects the convergence of the underlying optimization procedure. In this paper we study stochastic optimization with importance sampling, which improves the convergence rate by reducing the stochastic variance. Specifically, we study prox-SGD (actually, stochastic mirror descent) with importance sampling and prox-SDCA with importance sampling. For prox-SGD, instead of adopting uniform sampling throughout the training process, the proposed algorithm employs importance sampling to minimize the variance of the stochastic gradient. For prox-SDCA, the proposed importance sampling scheme aims to achieve higher expected dual value at each dual coordinate ascent step. We provide extensive theoretical analysis to show that the convergence rates with the proposed importance sampling methods can be significantly improved under suitable conditions both for prox-SGD and for prox-SDCA. Experiments are provided to verify the theoretical analysis.Comment: 29 page

    Distributed Partitioned Big-Data Optimization via Asynchronous Dual Decomposition

    Full text link
    In this paper we consider a novel partitioned framework for distributed optimization in peer-to-peer networks. In several important applications the agents of a network have to solve an optimization problem with two key features: (i) the dimension of the decision variable depends on the network size, and (ii) cost function and constraints have a sparsity structure related to the communication graph. For this class of problems a straightforward application of existing consensus methods would show two inefficiencies: poor scalability and redundancy of shared information. We propose an asynchronous distributed algorithm, based on dual decomposition and coordinate methods, to solve partitioned optimization problems. We show that, by exploiting the problem structure, the solution can be partitioned among the nodes, so that each node just stores a local copy of a portion of the decision variable (rather than a copy of the entire decision vector) and solves a small-scale local problem
    • …
    corecore