Search CORE

202 research outputs found

Computing Low Latency Batches with Unreliable Workers in Volunteer Computing Environments

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector

ダイキボナイシュコンザイセイタイモデルノヘイレツシミュレーションニカンスルケンキュウ

Author: Heien Eric Martin
ハイエンエリックマーティン
Publication venue
Publication date
Field of study

Subdividing Long-Running, Variable-Length Analyses Into Short, Fixed-Length BOINC Workunits

Author: Adam L. Bazinet
Michael P. Cummings
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Springer - Publisher Connector

Parallel Tree Search in Volunteer Computing: a Case Study

Author: A Clauset
A Grama
AL Bazinet
B Abbott
B Javadi
CP Gomes
D Thain
DP Anderson
EM Heien
GE Blelloch
JA Gallian
K Apt
K Drakakis
L Dagum
M Pataki
RA Wright
RD Blumofe
RD Blumofe
RM Karp
Uwe Beckert
Wenjie Fang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

International audienc

TUGraz OPEN Library

Split-Merge Model of Workunit Replication in Distributed Computing

Author: Chakravarthy Srinivas R.
Rumyantsev Alexander
Publication venue: Digital Commons @ Kettering University
Publication date: 28/08/2017
Field of study

A Dynamic Task Allocation Algorithm Based on Weighted Velocity

Author: Cao Ying-Ying
Tang Ke-Ming
Wang Chuang-Wei
Wang Wen
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 01/12/2017
Field of study

Volunteer computing is a way of supporting people around the world who provide free computer resources, to participate in scientific calculation or data analysis on the Internet. This provides an effective solution to solve the problems of large scale of basic scientific computing and more computing resources requirements. Task allocation is a very important part of volunteer computing. An effective algorithm can significantly improve computational efficiency. At present, most of the existing tasks are divided in term of the computer hardware conditions or the initial state of the computer in the volunteer computing. It seems that this have no obvious impact to calculating efficiency in a short time, but this task will be less flexible when idle resources of the volunteer computing becomes less or more. To make full use of idle computer resources, a dynamic task allocation algorithm (TAA) based on weighted velocity was proposed in this work. The research results showed that the weighted velocity as a parameter can be used to test the computing performance of a computer, dynamically manage task allocation as well. Keywords: volunteer computing, task allocation, weighted average velocit

Multi-round Master-Worker Computing: a Repeated Game Approach

Author: Anta Antonio Fernández
Georgiou Chryssis
Mosteiro Miguel A.
Pareja Daniel
Publication venue
Publication date: 24/08/2015
Field of study

We consider a computing system where a master processor assigns tasks for execution to worker processors through the Internet. We model the workers decision of whether to comply (compute the task) or not (return a bogus result to save the computation cost) as a mixed extension of a strategic game among workers. That is, we assume that workers are rational in a game-theoretic sense, and that they randomize their strategic choice. Workers are assigned multiple tasks in subsequent rounds. We model the system as an infinitely repeated game of the mixed extension of the strategic game. In each round, the master decides stochastically whether to accept the answer of the majority or verify the answers received, at some cost. Incentives and/or penalties are applied to workers accordingly. Under the above framework, we study the conditions in which the master can reliably obtain tasks results, exploiting that the repeated games model captures the effect of long-term interaction. That is, workers take into account that their behavior in one computation will have an effect on the behavior of other workers in the future. Indeed, should a worker be found to deviate from some agreed strategic choice, the remaining workers would change their own strategy to penalize the deviator. Hence, being rational, workers do not deviate. We identify analytically the parameter conditions to induce a desired worker behavior, and we evaluate experi- mentally the mechanisms derived from such conditions. We also compare the performance of our mechanisms with a previously known multi-round mechanism based on reinforcement learning.Comment: 21 pages, 3 figure

arXiv.org e-Print Archive

SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

Author: Borzunov Alexander
Dettmers Tim
Diskin Michael
Ryabinin Max
Publication venue
Publication date: 29/06/2023
Field of study

Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions. We analyze the performance of existing model-parallel algorithms in these conditions and find configurations where training larger models becomes less communication-intensive. Based on these findings, we propose SWARM parallelism, a model-parallel training algorithm designed for poorly connected, heterogeneous and unreliable devices. SWARM creates temporary randomized pipelines between nodes that are rebalanced in case of failure. We empirically validate our findings and compare SWARM parallelism with existing large-scale training approaches. Finally, we combine our insights with compression strategies to train a large Transformer language model with 1B shared parameters (approximately 13B before sharing) on preemptible T4 GPUs with less than 200Mb/s network.Comment: Accepted to International Conference on Machine Learning (ICML) 2023. 25 pages, 8 figure

arXiv.org e-Print Archive