5,438 research outputs found

    CoCoA: A General Framework for Communication-Efficient Distributed Optimization

    Get PDF
    The scale of modern datasets necessitates the development of efficient distributed optimization methods for machine learning. We present a general-purpose framework for distributed computing environments, CoCoA, that has an efficient communication scheme and is applicable to a wide variety of problems in machine learning and signal processing. We extend the framework to cover general non-strongly-convex regularizers, including L1-regularized problems like lasso, sparse logistic regression, and elastic net regularization, and show how earlier work can be derived as a special case. We provide convergence guarantees for the class of convex regularized loss minimization objectives, leveraging a novel approach in handling non-strongly-convex regularizers and non-smooth loss functions. The resulting framework has markedly improved performance over state-of-the-art methods, as we illustrate with an extensive set of experiments on real distributed datasets

    Parallel Selective Algorithms for Big Data Optimization

    Full text link
    We propose a decomposition framework for the parallel optimization of the sum of a differentiable (possibly nonconvex) function and a (block) separable nonsmooth, convex one. The latter term is usually employed to enforce structure in the solution, typically sparsity. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss- Seidel (i.e., sequential) ones, as well as virtually all possibilities "in between" with only a subset of variables updated at each iteration. Our theoretical convergence results improve on existing ones, and numerical results on LASSO, logistic regression, and some nonconvex quadratic problems show that the new method consistently outperforms existing algorithms.Comment: This work is an extended version of the conference paper that has been presented at IEEE ICASSP'14. The first and the second author contributed equally to the paper. This revised version contains new numerical results on non convex quadratic problem

    D-ADMM: A Communication-Efficient Distributed Algorithm For Separable Optimization

    Full text link
    We propose a distributed algorithm, named Distributed Alternating Direction Method of Multipliers (D-ADMM), for solving separable optimization problems in networks of interconnected nodes or agents. In a separable optimization problem there is a private cost function and a private constraint set at each node. The goal is to minimize the sum of all the cost functions, constraining the solution to be in the intersection of all the constraint sets. D-ADMM is proven to converge when the network is bipartite or when all the functions are strongly convex, although in practice, convergence is observed even when these conditions are not met. We use D-ADMM to solve the following problems from signal processing and control: average consensus, compressed sensing, and support vector machines. Our simulations show that D-ADMM requires less communications than state-of-the-art algorithms to achieve a given accuracy level. Algorithms with low communication requirements are important, for example, in sensor networks, where sensors are typically battery-operated and communicating is the most energy consuming operation.Comment: To appear in IEEE Transactions on Signal Processin

    A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning

    Full text link
    Learning sparse combinations is a frequent theme in machine learning. In this paper, we study its associated optimization problem in the distributed setting where the elements to be combined are not centrally located but spread over a network. We address the key challenges of balancing communication costs and optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW) algorithm. We obtain theoretical guarantees on the optimization error ϵ\epsilon and communication cost that do not depend on the total number of combining elements. We further show that the communication cost of dFW is optimal by deriving a lower-bound on the communication cost required to construct an ϵ\epsilon-approximate solution. We validate our theoretical analysis with empirical studies on synthetic and real-world data, which demonstrate that dFW outperforms both baselines and competing methods. We also study the performance of dFW when the conditions of our analysis are relaxed, and show that dFW is fairly robust.Comment: Extended version of the SIAM Data Mining 2015 pape

    Flexible Parallel Algorithms for Big Data Optimization

    Full text link
    We propose a decomposition framework for the parallel optimization of the sum of a differentiable function and a (block) separable nonsmooth, convex one. The latter term is typically used to enforce structure in the solution as, for example, in Lasso problems. Our framework is very flexible and includes both fully parallel Jacobi schemes and Gauss-Seidel (Southwell-type) ones, as well as virtually all possibilities in between (e.g., gradient- or Newton-type methods) with only a subset of variables updated at each iteration. Our theoretical convergence results improve on existing ones, and numerical results show that the new method compares favorably to existing algorithms.Comment: submitted to IEEE ICASSP 201

    Massively-Parallel Feature Selection for Big Data

    Full text link
    We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of pp-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class
    corecore