202 research outputs found
A Dynamic Task Allocation Algorithm Based on Weighted Velocity
Volunteer computing is a way of supporting people around the world who provide free computer resources, to participate in scientific calculation or data analysis on the Internet. This provides an effective solution to solve the problems of large scale of basic scientific computing and more computing resources requirements. Task allocation is a very important part of volunteer computing. An effective algorithm can significantly improve computational efficiency. At present, most of the existing tasks are divided in term of the computer hardware conditions or the initial state of the computer in the volunteer computing. It seems that this have no obvious impact to calculating efficiency in a short time, but this task will be less flexible when idle resources of the volunteer computing becomes less or more. To make full use of idle computer resources, a dynamic task allocation algorithm (TAA) based on weighted velocity was proposed in this work. The research results showed that the weighted velocity as a parameter can be used to test the computing performance of a computer, dynamically manage task allocation as well. Keywords: volunteer computing, task allocation, weighted average velocit
Multi-round Master-Worker Computing: a Repeated Game Approach
We consider a computing system where a master processor assigns tasks for
execution to worker processors through the Internet. We model the workers
decision of whether to comply (compute the task) or not (return a bogus result
to save the computation cost) as a mixed extension of a strategic game among
workers. That is, we assume that workers are rational in a game-theoretic
sense, and that they randomize their strategic choice. Workers are assigned
multiple tasks in subsequent rounds. We model the system as an infinitely
repeated game of the mixed extension of the strategic game. In each round, the
master decides stochastically whether to accept the answer of the majority or
verify the answers received, at some cost. Incentives and/or penalties are
applied to workers accordingly. Under the above framework, we study the
conditions in which the master can reliably obtain tasks results, exploiting
that the repeated games model captures the effect of long-term interaction.
That is, workers take into account that their behavior in one computation will
have an effect on the behavior of other workers in the future. Indeed, should a
worker be found to deviate from some agreed strategic choice, the remaining
workers would change their own strategy to penalize the deviator. Hence, being
rational, workers do not deviate. We identify analytically the parameter
conditions to induce a desired worker behavior, and we evaluate experi-
mentally the mechanisms derived from such conditions. We also compare the
performance of our mechanisms with a previously known multi-round mechanism
based on reinforcement learning.Comment: 21 pages, 3 figure
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Many deep learning applications benefit from using large models with billions
of parameters. Training these models is notoriously expensive due to the need
for specialized HPC clusters. In this work, we consider alternative setups for
training large models: using cheap "preemptible" instances or pooling existing
resources from multiple regions. We analyze the performance of existing
model-parallel algorithms in these conditions and find configurations where
training larger models becomes less communication-intensive. Based on these
findings, we propose SWARM parallelism, a model-parallel training algorithm
designed for poorly connected, heterogeneous and unreliable devices. SWARM
creates temporary randomized pipelines between nodes that are rebalanced in
case of failure. We empirically validate our findings and compare SWARM
parallelism with existing large-scale training approaches. Finally, we combine
our insights with compression strategies to train a large Transformer language
model with 1B shared parameters (approximately 13B before sharing) on
preemptible T4 GPUs with less than 200Mb/s network.Comment: Accepted to International Conference on Machine Learning (ICML) 2023.
25 pages, 8 figure
- …