111 research outputs found
Asynchronous Distributed ADMM for Large-Scale Optimization- Part I: Algorithm and Convergence Analysis
Aiming at solving large-scale learning problems, this paper studies
distributed optimization methods based on the alternating direction method of
multipliers (ADMM). By formulating the learning problem as a consensus problem,
the ADMM can be used to solve the consensus problem in a fully parallel fashion
over a computer network with a star topology. However, traditional synchronized
computation does not scale well with the problem size, as the speed of the
algorithm is limited by the slowest workers. This is particularly true in a
heterogeneous network where the computing nodes experience different
computation and communication delays. In this paper, we propose an asynchronous
distributed ADMM (AD-AMM) which can effectively improve the time efficiency of
distributed optimization. Our main interest lies in analyzing the convergence
conditions of the AD-ADMM, under the popular partially asynchronous model,
which is defined based on a maximum tolerable delay of the network.
Specifically, by considering general and possibly non-convex cost functions, we
show that the AD-ADMM is guaranteed to converge to the set of
Karush-Kuhn-Tucker (KKT) points as long as the algorithm parameters are chosen
appropriately according to the network delay. We further illustrate that the
asynchrony of the ADMM has to be handled with care, as slightly modifying the
implementation of the AD-ADMM can jeopardize the algorithm convergence, even
under a standard convex setting.Comment: 37 page
Learning and Management for Internet-of-Things: Accounting for Adaptivity and Scalability
Internet-of-Things (IoT) envisions an intelligent infrastructure of networked
smart devices offering task-specific monitoring and control services. The
unique features of IoT include extreme heterogeneity, massive number of
devices, and unpredictable dynamics partially due to human interaction. These
call for foundational innovations in network design and management. Ideally, it
should allow efficient adaptation to changing environments, and low-cost
implementation scalable to massive number of devices, subject to stringent
latency constraints. To this end, the overarching goal of this paper is to
outline a unified framework for online learning and management policies in IoT
through joint advances in communication, networking, learning, and
optimization. From the network architecture vantage point, the unified
framework leverages a promising fog architecture that enables smart devices to
have proximity access to cloud functionalities at the network edge, along the
cloud-to-things continuum. From the algorithmic perspective, key innovations
target online approaches adaptive to different degrees of nonstationarity in
IoT dynamics, and their scalable model-free implementation under limited
feedback that motivates blind or bandit approaches. The proposed framework
aspires to offer a stepping stone that leads to systematic designs and analysis
of task-specific learning and management schemes for IoT, along with a host of
new research directions to build on.Comment: Submitted on June 15 to Proceeding of IEEE Special Issue on Adaptive
and Scalable Communication Network
Asynchronous Distributed ADMM for Large-Scale Optimization- Part II: Linear Convergence Analysis and Numerical Performance
The alternating direction method of multipliers (ADMM) has been recognized as
a versatile approach for solving modern large-scale machine learning and signal
processing problems efficiently. When the data size and/or the problem
dimension is large, a distributed version of ADMM can be used, which is capable
of distributing the computation load and the data set to a network of computing
nodes. Unfortunately, a direct synchronous implementation of such algorithm
does not scale well with the problem size, as the algorithm speed is limited by
the slowest computing nodes. To address this issue, in a companion paper, we
have proposed an asynchronous distributed ADMM (AD-ADMM) and studied its
worst-case convergence conditions. In this paper, we further the study by
characterizing the conditions under which the AD-ADMM achieves linear
convergence. Our conditions as well as the resulting linear rates reveal the
impact that various algorithm parameters, network delay and network size have
on the algorithm performance. To demonstrate the superior time efficiency of
the proposed AD-ADMM, we test the AD-ADMM on a high-performance computer
cluster by solving a large-scale logistic regression problem.Comment: submitted for publication, 28 page
Recommended from our members
Towards More Scalable and Robust Machine Learning
For many data-intensive real-world applications, such as recognizing objects from images, detecting spam emails, and recommending items on retail websites, the most successful current approaches involve learning rich prediction rules from large datasets. There are many challenges in these machine learning tasks. For example, as the size of the datasets and the complexity of these prediction rules increase, there is a significant challenge in designing scalable methods that can effectively exploit the availability of distributed computing units. As another example, in many machine learning applications, there can be data corruptions, communication errors, and even adversarial attacks during training and test. Therefore, to build reliable machine learning models, we also have to tackle the challenge of robustness in machine learning.In this dissertation, we study several topics on the scalability and robustness in large-scale learning, with a focus of establishing solid theoretical foundations for these problems, and demonstrate recent progress towards the ambitious goal of building more scalable and robust machine learning models. We start with the speedup saturation problem in distributed stochastic gradient descent (SGD) algorithms with large mini-batches. We introduce the notion of gradient diversity, a metric of the dissimilarity between concurrent gradient updates, and show its key role in the convergence and generalization performance of mini-batch SGD. We then move forward to Byzantine distributed learning, a topic that involves both scalability and robustness in distributed learning. In the Byzantine setting that we consider, a fraction of distributed worker machines can have arbitrary or even adversarial behavior. We design statistically and computationally efficient algorithms to defend against Byzantine failures in distributed optimization with convex and non-convex objectives. Lastly, we discuss the adversarial example phenomenon. We provide theoretical analysis of the adversarially robust generalization properties of machine learning models through the lens of Radamacher complexity
- …