7 research outputs found

    Distributed Methods for Composite Optimization: Communication Efficiency, Load-Balancing and Local Solvers

    Get PDF
    The scale of modern datasets necessitates the development of efficient distributed opti- mization methods for composite problems, which have numerous applications in the field of machine learning. A critical challenge in realizing this promise of scalability is to develop efficient methods for communicating and coordinating information between distributed ma- chines, taking into account the specific needs of machine learning algorithms. Recent work in this area has been limited by focusing heavily on developing highly specific methods for the distributed environment. These special-purpose methods are often unable to fully leverage the competitive performance of their well-tuned and customized single machine counterparts. Further, they are unable to easily integrate improvements that continue to be made to single machine methods. To this end, we present a framework for distributed optimization in Chapter 2 and its accelerated version in Chapter 3 that allow the flexibil- ity of arbitrary solvers to be used on each machine locally, and yet maintains competitive performance against other distributed methods. We give strong primal-dual convergence rate guarantees for our framework that hold for arbitrary local solvers. We demonstrate the impact of local solver selection both theoretically and in an extensive experimental comparison. Further, in Chapter 4 we proposed algorithmic modifications to an existed distributed inexact dumped Newton method, which lead to less round of communications and better load-balancing.In Chapter 5, we introduce the concept of an Underestimate Sequence (UES), which is a natural extension of Nesterov’s estimate sequence. Our definition of a UES utilizes three sequences, one of which is a lower bound of the objective function. The question of how to construct an appropriate sequence of lower bounds is also addressed, and we present lower bounds for strongly convex smooth functions and for strongly convex composite functions, which adhere to the UES framework. Further, we propose several first order methods for minimizing strongly convex functions in both the smooth and composite cases. The algorithms, based on efficiently updating lower bounds on the objective functions, have natural stopping conditions, which provides the user with a certificate of optimality. Convergence of all algorithms is guaranteed through the UES framework, and we show that all presented algorithms converge linearly, with the accelerated variants enjoying the optimal linear rate of convergence

    Distributed Algorithms in Large-scaled Empirical Risk Minimization: Non-convexity, Adaptive-sampling, and Matrix-free Second-order Methods

    Get PDF
    The rising amount of data has changed the classical approaches in statistical modeling significantly. Special methods are designed for inferring meaningful relationships and hidden patterns from these large datasets, which build the foundation of a study called Machine Learning (ML). Such ML techniques have already applied widely in various areas and achieved compelling success. In the meantime, the huge amount of data also requires a deep revolution of current techniques, like the availability of advanced data storage, new efficient large-scale algorithms, and their distributed/parallelized implementation.There is a broad class of ML methods can be interpreted as Empirical Risk Minimization (ERM) problems. When utilizing various loss functions and likely necessary regularization terms, one could approach their specific ML goals by solving ERMs as separable finite sum optimization problems. There are circumstances where the nonconvex component is introduced into the ERMs which usually makes the problems hard to optimize. Especially, in recent years, neural networks, a popular branch of ML, draw numerous attention from the community. Neural networks are powerful and highly flexible inspired by the structured functionality of the brain. Typically, neural networks could be treated as large-scale and highly nonconvex ERMs.While as nonconvex ERMs become more complex and larger in scales, optimization using stochastic gradient descent (SGD) type methods proceeds slowly regarding its convergence rate and incapability of being distributed efficiently. It motivates researchers to explore more advanced local optimization methods such as approximate-Newton/second-order methods.In this dissertation, first-order stochastic optimization for the regularized ERMs in Chapter1 is studied. Based on the development of stochastic dual coordinate accent (SDCA) method, a dual free SDCA with non-uniform mini-batch sampling strategy is investigated [30, 29]. We also introduce several efficient algorithms for training ERMs, including neural networks, using second-order optimization methods in a distributed environment. In Chapter 2, we propose a practical distributed implementation for Newton-CG methods. It makes training neural networks by second-order methods doable in the distributed environment [28]. In Chapter 3, we further build steps towards using second-order methods to train feed-forward neural networks with negative curvature direction utilization and momentum acceleration. In this Chapter, we also report numerical experiments for comparing second-order methods and first-order methods regarding training neural networks. The following Chapter 4 purpose an distributed accumulative sample-size second-order methods for solving large-scale convex ERMs and nonconvex neural networks [35]. In Chapter 5, a python library named UCLibrary is briefly introduced for solving unconstrained optimization problems. This dissertation is all concluded in the last Chapter 6
    corecore