13 research outputs found

    Some New Results in Distributed Tracking and Optimization

    Get PDF
    The current age of Big Data is built on the foundation of distributed systems, and efficient distributed algorithms to run on these systems.With the rapid increase in the volume of the data being fed into these systems, storing and processing all this data at a central location becomes infeasible. Such a central \textit{server} requires a gigantic amount of computational and storage resources. Even when it is possible to have central servers, it is not always desirable, due to privacy concerns. Also, sending huge amounts of data to such servers incur often infeasible bandwidth requirements. In this dissertation, we consider two kinds of distributed architectures: 1) star-shaped topology, where multiple worker nodes are connected to, and communicate with a server, but the workers do not communicate with each other; and 2) mesh topology or network of interconnected workers, where each worker can communicate with a small number of neighboring workers. In the first half of this dissertation (Chapters 2 and 3), we consider distributed systems with mesh topology.We study two different problems in this context. First, we study the problem of simultaneous localization and multi-target tracking. Multiple mobile agents localize themselves cooperatively, while also tracking multiple, unknown number of mobile targets, in the presence of measurement-origin uncertainty. In situations with limited GPS signal availability, agents (like self-driving cars in urban canyons, or autonomous vehicles in hazardous environments) need to rely on inter-agent measurements for localization. The agents perform the additional task of tracking multiple targets (pedestrians and road-signs for self-driving cars). We propose a decentralized algorithm for this problem. To be effective in real-time applications, we propose efficient Gaussian and Gaussian-mixture based filters, rather than the computationally expensive particle-based methods in the existing literature. Our novel factor-graph based approach gives better performance, in terms of both agent localization errors, and target-location and cardinality errors. Next, we study an online convex optimization problem, where a network of agents cooperate to minimize a global time-varying objective function. Only the local functions are revealed to individual agents. The agents also need to satisfy their individual constraints. We propose a primal-dual update based decentralized algorithm for this problem. Under standard assumptions, we prove that the proposed algorithm achieves sublinear regret and constraint violation across the network. In other words, over a long enough time horizon, the decisions taken by the agents are, on average, as good as if all the information was revealed ahead of time. In addition, the individual constraint violations of the agents, averaged over time, are zero. In the next part of the dissertation (Chapters 4), we study distributed systems with a star-shaped topology. The problem we study is distributed nonconvex optimization. With the recent success of deep learning, coupled with the use of distributed systems to solve large-scale problems, this problem has gained prominence over the past decade. The recently proposed paradigm of Federated Learning (which has already been deployed by Google/Apple in Android/iOS phones) has further catalyzed research in this direction. The problem we consider is minimizing the average of local smooth, nonconvex functions. Each node has access only to its own loss function, but can communicate with the server, which aggregates updates from all the nodes, before distributing them to all the nodes. With the advent of more and more complex neural network architectures, these updates can be high dimensional. To save resources, the problem needs to be solved via communication-efficient approaches. We propose a novel algorithm, which combines the idea of variance-reduction, with the paradigm of carrying out multiple local updates at each node before averaging. We prove the convergence of the approach to a first-order stationary point. Our algorithm is optimal in terms of computation, and state-of-the-art in terms of the communication requirements. Lastly in Chapter 5, we consider the situation when the nodes do not have access to function gradients, and need to minimize the loss function using only function values. This problem lies in the domain of zeroth-order optimization. For simplicity of analysis, we study this problem only in the single-node case. This problem finds application in simulation-based optimization, and adversarial example generation for attacking deep neural networks. We propose a novel function value based gradient estimator, which has better variance, and better query-efficiency compared to existing estimators. The proposed estimator covers the most commonly used existing estimators as special cases. We conduct a comprehensive convergence analysis under different conditions. We also demonstrate its effectiveness through a real-world application to generating adversarial examples from a black-box deep neural network

    Anchor Sampling for Federated Learning with Partial Client Participation

    Full text link
    Compared with full client participation, partial client participation is a more practical scenario in federated learning, but it may amplify some challenges in federated learning, such as data heterogeneity. The lack of inactive clients' updates in partial client participation makes it more likely for the model aggregation to deviate from the aggregation based on full client participation. Training with large batches on individual clients is proposed to address data heterogeneity in general, but their effectiveness under partial client participation is not clear. Motivated by these challenges, we propose to develop a novel federated learning framework, referred to as FedAMD, for partial client participation. The core idea is anchor sampling, which separates partial participants into anchor and miner groups. Each client in the anchor group aims at the local bullseye with the gradient computation using a large batch. Guided by the bullseyes, clients in the miner group steer multiple near-optimal local updates using small batches and update the global model. By integrating the results of the two groups, FedAMD is able to accelerate the training process and improve the model performance. Measured by ϵ\epsilon-approximation and compared to the state-of-the-art methods, FedAMD achieves the convergence by up to O(1/ϵ)O(1/\epsilon) fewer communication rounds under non-convex objectives. Empirical studies on real-world datasets validate the effectiveness of FedAMD and demonstrate the superiority of the proposed algorithm: Not only does it considerably save computation and communication costs, but also the test accuracy significantly improves.Comment: ICML 202

    Momentum Benefits Non-IID Federated Learning Simply and Provably

    Full text link
    Federated learning is a powerful paradigm for large-scale machine learning, but it faces significant challenges due to unreliable network connections, slow communication, and substantial data heterogeneity across clients. FedAvg and SCAFFOLD are two fundamental algorithms to address these challenges. In particular, FedAvg employs multiple local updates before communicating with a central server, while SCAFFOLD maintains a control variable on each client to compensate for "client drift" in its local updates. Various methods have been proposed in literature to enhance the convergence of these two algorithms, but they either make impractical adjustments to algorithmic structure, or rely on the assumption of bounded data heterogeneity. This paper explores the utilization of momentum to enhance the performance of FedAvg and SCAFFOLD. When all clients participate in the training process, we demonstrate that incorporating momentum allows FedAvg to converge without relying on the assumption of bounded data heterogeneity even using a constant local learning rate. This is a novel result since existing analyses for FedAvg require bounded data heterogeneity even with diminishing local learning rates. In the case of partial client participation, we show that momentum enables SCAFFOLD to converge provably faster without imposing any additional assumptions. Furthermore, we use momentum to develop new variance-reduced extensions of FedAvg and SCAFFOLD, which exhibit state-of-the-art convergence rates. Our experimental results support all theoretical findings

    NEW STOCHASTIC AND RANDOMIZED ALGORITHMS FOR NONCONVEX OPTIMIZATION IN MACHINE LEARNING

    Get PDF
    The goal of this dissertation is to develop efficient stochastic and randomized first-order methods to solve composite nonconvex problems arising from modern machine learning applications. The content of this dissertation is divided into four main chapters. Firstly, we motivate our research topics by briefly introducing our interested problems and their challenges. We also review necessary mathematical concepts and tools used throughout this dissertation. Our first contribution is in Chapter 2, where we propose ProxSARAH, a new framework that uses a variance reduced stochastic gradient estimator called SARAH, to develop new algorithms for solving the stochastic composite nonconvex problems. Our analysis shows that our methods can achieve the best-known convergence results and even match the lower bound complexity. We also provide extensive numerical experiments to illustrate the advantages of our methods compared to existing ones. Next, we study a policy gradient strategy in reinforcement learning in Chapter 3. We propose a new proximal hybrid stochastic policy gradient algorithm, called ProxHSPGA, using a new policy gradient estimator built from two different estimators. ProxHSPGA makes uses of a newly hybrid stochastic estimator introduced in Tran-Dinh et al. (2019b), and apply it to reinforcement learning. This new algorithm is able to solve the general composite policy optimization problem which includes regularization or constraint on the policy parameters. It also achieves the best-known sample complexity compared to existing methods. Our experiments on both discrete and continuous control tasks show that our proposed methods indeed are advantageous over existing ones. Then, in Chapter 4, we focus on a new machine learning paradigm, called federated learning (FL), where multiple agents collaboratively train a machine learning model in a distributed fashion. We propose two new algorithms, FedDR and asyncFedDR, for solving the nonconvex composite optimization problem which can handle convex regularizers in FL. Our algorithms rely on a novel combination between a nonconvex Douglas-Rachford splitting method, randomized block-coordinate strategies, and asynchronous implementation. Unlike previous primal-dual based method for FL, our algorithms allow not only partial participation at each communication round but also asynchronous updates between agents which greatly improves their practicality. Our convergence analyses show that the new algorithms match the communication complexity lower bound up to a constant factor under standard assumptions. Our numerical experiments illustrate the advantages of our methods compared to existing ones on various datasets. Finally, we summarize our contribution, further discuss some notable points of our results, and outline some ongoing and possible future directions. One of our ongoing works is to develop a class of accelerated Douglas-Rachford splitting algorithms for federated learning.Doctor of Philosoph

    Using MapReduce Streaming for Distributed Life Simulation on the Cloud

    Get PDF
    Distributed software simulations are indispensable in the study of large-scale life models but often require the use of technically complex lower-level distributed computing frameworks, such as MPI. We propose to overcome the complexity challenge by applying the emerging MapReduce (MR) model to distributed life simulations and by running such simulations on the cloud. Technically, we design optimized MR streaming algorithms for discrete and continuous versions of Conway’s life according to a general MR streaming pattern. We chose life because it is simple enough as a testbed for MR’s applicability to a-life simulations and general enough to make our results applicable to various lattice-based a-life models. We implement and empirically evaluate our algorithms’ performance on Amazon’s Elastic MR cloud. Our experiments demonstrate that a single MR optimization technique called strip partitioning can reduce the execution time of continuous life simulations by 64%. To the best of our knowledge, we are the first to propose and evaluate MR streaming algorithms for lattice-based simulations. Our algorithms can serve as prototypes in the development of novel MR simulation algorithms for large-scale lattice-based a-life models.https://digitalcommons.chapman.edu/scs_books/1014/thumbnail.jp
    corecore