477 research outputs found
SCOPE: Scalable Composite Optimization for Learning on Spark
Many machine learning models, such as logistic regression~(LR) and support
vector machine~(SVM), can be formulated as composite optimization problems.
Recently, many distributed stochastic optimization~(DSO) methods have been
proposed to solve the large-scale composite optimization problems, which have
shown better performance than traditional batch methods. However, most of these
DSO methods are not scalable enough. In this paper, we propose a novel DSO
method, called \underline{s}calable \underline{c}omposite
\underline{op}timization for l\underline{e}arning~({SCOPE}), and implement it
on the fault-tolerant distributed platform \mbox{Spark}. SCOPE is both
computation-efficient and communication-efficient. Theoretical analysis shows
that SCOPE is convergent with linear convergence rate when the objective
function is convex. Furthermore, empirical results on real datasets show that
SCOPE can outperform other state-of-the-art distributed learning methods on
Spark, including both batch learning methods and DSO methods
Distributed Dual Coordinate Ascent with Imbalanced Data on a General Tree Network
In this paper, we investigate the impact of imbalanced data on the
convergence of distributed dual coordinate ascent in a tree network for solving
an empirical loss minimization problem in distributed machine learning. To
address this issue, we propose a method called delayed generalized distributed
dual coordinate ascent that takes into account the information of the
imbalanced data, and provide the analysis of the proposed algorithm. Numerical
experiments confirm the effectiveness of our proposed method in improving the
convergence speed of distributed dual coordinate ascent in a tree network.Comment: To be published in IEEE 2023 Workshop on Machine Learning for Signal
Processing (MLSP
Distributed Machine Learning Framework: New Algorithms and Theoretical Foundation
Machine learning is gaining fresh momentum, and has helped us to enhance not only many industrial and professional processes but also our everyday living. The recent success of machine learning relies heavily on the surge of big data, big models, and big computing. However, inefficient algorithms restrict the applications of machine learning to big data mining tasks. In terms of big data, serious concerns, such as communication overhead and data privacy, should be rigorously addressed when we train models using large amounts of data located on multiple devices. In terms of the big model, it is still an underexplored research area if a model is too big to train on a single device. To address these challenging problems, this thesis is focusing on designing new large-scale machine learning models, efficiently optimizing and training methods for big data mining, and studying new discoveries in both theory and applications.
For the challenges raised by big data, we proposed several new asynchronous distributed stochastic gradient descent or coordinate descent methods for efficiently solving convex and non-convex problems. We also designed new large-batch training methods for deep learning models to reduce the computation time significantly with better generalization performance. For the challenges raised by the big model, We scaled up the deep learning models by parallelizing the layer-wise computations with a theoretical guarantee, which is the first algorithm breaking the lock of backpropagation such that the large model can be dramatically accelerated
Federated Optimization: Distributed Machine Learning for On-Device Intelligence
We introduce a new and increasingly relevant setting for distributed
optimization in machine learning, where the data defining the optimization are
unevenly distributed over an extremely large number of nodes. The goal is to
train a high-quality centralized model. We refer to this setting as Federated
Optimization. In this setting, communication efficiency is of the utmost
importance and minimizing the number of rounds of communication is the
principal goal.
A motivating example arises when we keep the training data locally on users'
mobile devices instead of logging it to a data center for training. In
federated optimziation, the devices are used as compute nodes performing
computation on their local data in order to update a global model. We suppose
that we have extremely large number of devices in the network --- as many as
the number of users of a given service, each of which has only a tiny fraction
of the total data available. In particular, we expect the number of data points
available locally to be much smaller than the number of devices. Additionally,
since different users generate data with different patterns, it is reasonable
to assume that no device has a representative sample of the overall
distribution.
We show that existing algorithms are not suitable for this setting, and
propose a new algorithm which shows encouraging experimental results for sparse
convex problems. This work also sets a path for future research needed in the
context of \federated optimization.Comment: 38 page
- …