Search CORE

13 research outputs found

A Distributed Frank-Wolfe Algorithm for Communication-Efficient Sparse Learning

Author: Balcan Maria-Florina
Bellet Aurélien
Garakani Alireza Bagheri
Liang Yingyu
Sha Fei
Publication venue
Publication date: 12/01/2015
Field of study

Learning sparse combinations is a frequent theme in machine learning. In this paper, we study its associated optimization problem in the distributed setting where the elements to be combined are not centrally located but spread over a network. We address the key challenges of balancing communication costs and optimization errors. To this end, we propose a distributed Frank-Wolfe (dFW) algorithm. We obtain theoretical guarantees on the optimization error

\epsilon

and communication cost that do not depend on the total number of combining elements. We further show that the communication cost of dFW is optimal by deriving a lower-bound on the communication cost required to construct an

\epsilon

-approximate solution. We validate our theoretical analysis with empirical studies on synthetic and real-world data, which demonstrate that dFW outperforms both baselines and competing methods. We also study the performance of dFW when the conditions of our analysis are relaxed, and show that dFW is fairly robust.Comment: Extended version of the SIAM Data Mining 2015 pape

arXiv.org e-Print Archive

Crossref

COMET: A Recipe for Learning and Using Large Ensembles on Massive Data

Author: Basilico Justin D.
Dixon Kevin R.
Kegelmeyer W. Philip
Kolda Tamara G.
Munson M. Arthur
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

COMET is a single-pass MapReduce algorithm for learning on large-scale data. It builds multiple random forest ensembles on distributed blocks of data and merges them into a mega-ensemble. This approach is appropriate when learning from massive-scale data that is too large to fit on a single machine. To get the best accuracy, IVoting should be used instead of bagging to generate the training subset for each decision tree in the random forest. Experiments with two large datasets (5GB and 50GB compressed) show that COMET compares favorably (in both accuracy and training time) to learning on a subsample of data using a serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble evaluation which dynamically decides how many ensemble members to evaluate per data point; this can reduce evaluation cost by 100X or more

arXiv.org e-Print Archive

CiteSeerX

Boosting Methods for Federated Learning

Author: Aldinucci Marco
Esposito Roberto
Polato Mirko
Publication venue: CEUR
Publication date: 01/01/2023
Field of study

Institutional Research Information System University of Turin

Benchmarking FedAvg and FedCurv for Image Classification Tasks

Author: Aldinucci Marco
Casella Bruno
Esposito Roberto
Publication venue: Marco Anisetti, Angela Bonifati, Nicola Bena, Claudio A. Ardagna, Donato Malerba
Publication date: 01/01/2022
Field of study

Institutional Research Information System University of Turin

A Survey From Distributed Machine Learning to Distributed Deep Learning

Author: Dehghani Mohammad
Yazdanparast Zahra
Publication venue
Publication date: 11/07/2023
Field of study

Artificial intelligence has achieved significant success in handling complex tasks in recent years. This success is due to advances in machine learning algorithms and hardware acceleration. In order to obtain more accurate results and solve more complex problems, algorithms must be trained with more data. This huge amount of data could be time-consuming to process and require a great deal of computation. This solution could be achieved by distributing the data and algorithm across several machines, which is known as distributed machine learning. There has been considerable effort put into distributed machine learning algorithms, and different methods have been proposed so far. In this article, we present a comprehensive summary of the current state-of-the-art in the field through the review of these algorithms. We divide this algorithms in classification and clustering (traditional machine learning), deep learning and deep reinforcement learning groups. Distributed deep learning has gained more attention in recent years and most of studies worked on this algorithms. As a result, most of the articles we discussed here belong to this category. Based on our investigation of algorithms, we highlight limitations that should be addressed in future research

arXiv.org e-Print Archive

Classification in P2P Networks by Bagging Cascade RSVMs

Author: ANG Hock Hee
DATTA Anwitaman
GOPALKRISHNAN Vikvekanand
HOI Steven C. H.
NG Wee Keong
Publication venue: 'VLDB Endowment'
Publication date: 01/08/2008
Field of study

Institutional Knowledge at Singapore Management University

Distributed signal processing and optimization based on in-network subspace projections

Author: Barbarossa Sergio
Di Lorenzo Paolo
Sardellitti Stefania
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2020
Field of study

We study distributed optimization and processing of subspace-constrained signals in multi-agent networks with sparse connectivity. We introduce the first optimization framework based on distributed subspace projections, aimed at minimizing a network cost function depending on the specific processing task, while imposing subspace constraints on the final solution. The proposed method hinges on (sub)gradient optimization techniques while leveraging distributed projections as a mechanism to enforce subspace constraints in a cooperative and distributed fashion. Asymptotic convergence rates to optimal solutions of the problem are established under different assumptions (e.g., nondifferentiability, nonconvexity, etc.) on the objective function. We also introduce an extension of the framework that works with constant step-sizes, thus enabling faster convergence to optimal solutions of the optimization problem. Our algorithmic framework is very flexible and can be customized to a variety of problems in distributed signal processing. Finally, numerical tests on synthetic and realistic data illustrate how the proposed methods compare favorably to existing distributed algorithms

Archivio della ricerca- Università di Roma La Sapienza