202 research outputs found

    Split-Merge Model of Workunit Replication in Distributed Computing

    Get PDF

    A Dynamic Task Allocation Algorithm Based on Weighted Velocity

    Get PDF
    Volunteer computing is a way of supporting people around the world who provide free computer resources, to participate in scientific calculation or data analysis on the Internet. This provides an effective solution to solve the problems of large scale of basic scientific computing and more computing resources requirements. Task allocation is a very important part of volunteer computing. An effective algorithm can significantly improve computational efficiency. At present, most of the existing tasks are divided in term of the computer hardware conditions or the initial state of the computer in the volunteer computing. It seems that this have no obvious impact to calculating efficiency in a short time, but this task will be less flexible when idle resources of the volunteer computing becomes less or more. To make full use of idle computer resources, a dynamic task allocation algorithm (TAA) based on weighted velocity was proposed in this work. The research results showed that the weighted velocity as a parameter can be used to test the computing performance of a computer, dynamically manage task allocation as well. Keywords: volunteer computing, task allocation, weighted average velocit

    Multi-round Master-Worker Computing: a Repeated Game Approach

    Full text link
    We consider a computing system where a master processor assigns tasks for execution to worker processors through the Internet. We model the workers decision of whether to comply (compute the task) or not (return a bogus result to save the computation cost) as a mixed extension of a strategic game among workers. That is, we assume that workers are rational in a game-theoretic sense, and that they randomize their strategic choice. Workers are assigned multiple tasks in subsequent rounds. We model the system as an infinitely repeated game of the mixed extension of the strategic game. In each round, the master decides stochastically whether to accept the answer of the majority or verify the answers received, at some cost. Incentives and/or penalties are applied to workers accordingly. Under the above framework, we study the conditions in which the master can reliably obtain tasks results, exploiting that the repeated games model captures the effect of long-term interaction. That is, workers take into account that their behavior in one computation will have an effect on the behavior of other workers in the future. Indeed, should a worker be found to deviate from some agreed strategic choice, the remaining workers would change their own strategy to penalize the deviator. Hence, being rational, workers do not deviate. We identify analytically the parameter conditions to induce a desired worker behavior, and we evaluate experi- mentally the mechanisms derived from such conditions. We also compare the performance of our mechanisms with a previously known multi-round mechanism based on reinforcement learning.Comment: 21 pages, 3 figure

    SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient

    Full text link
    Many deep learning applications benefit from using large models with billions of parameters. Training these models is notoriously expensive due to the need for specialized HPC clusters. In this work, we consider alternative setups for training large models: using cheap "preemptible" instances or pooling existing resources from multiple regions. We analyze the performance of existing model-parallel algorithms in these conditions and find configurations where training larger models becomes less communication-intensive. Based on these findings, we propose SWARM parallelism, a model-parallel training algorithm designed for poorly connected, heterogeneous and unreliable devices. SWARM creates temporary randomized pipelines between nodes that are rebalanced in case of failure. We empirically validate our findings and compare SWARM parallelism with existing large-scale training approaches. Finally, we combine our insights with compression strategies to train a large Transformer language model with 1B shared parameters (approximately 13B before sharing) on preemptible T4 GPUs with less than 200Mb/s network.Comment: Accepted to International Conference on Machine Learning (ICML) 2023. 25 pages, 8 figure
    corecore