Search CORE

7,720 research outputs found

Model Parallelism on Distributed Infrastructure:A Literature Review from Theory to LLM Case-Studies

Author: Brakel Felix
Odyurt Uraz
Varbanescu Ana-Lucia
Publication venue: ArXiv.org
Publication date: 06/03/2024
Field of study

Neural networks have become a cornerstone of machine learning. As the trend for these to get more and more complex continues, so does the underlying hardware and software infrastructure for training and deployment. In this survey we answer three research questions: "What types of model parallelism exist?", "What are the challenges of model parallelism?", and "What is a modern use-case of model parallelism?" We answer the first question by looking at how neural networks can be parallelised and expressing these as operator graphs while exploring the available dimensions. The dimensions along which neural networks can be parallelised are intra-operator and inter-operator. We answer the second question by collecting and listing both implementation challenges for the types of parallelism, as well as the problem of optimally partitioning the operator graph. We answer the last question by collecting and listing how parallelism is applied in modern multi-billion parameter transformer networks, to the extend that this is possible with the limited information shared about these networks

University of Twente Research Information

Limitations of Intra-operator Parallelism Using Heterogeneous Computing Resources

Author: Habich Dirk
Karnagel Tomas
Lehner Wolfgang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/03/2023
Field of study

The hardware landscape is changing from homogeneous multi-core systems towards wildly heterogeneous systems combining different computing units, like CPUs and GPUs. To utilize these heterogeneous environments, database query execution has to adapt to cope with different architectures and computing behaviors. In this paper, we investigate the simple idea of partitioning an operator’s input data and processing all data partitions in parallel, one partition per computing unit. For heterogeneous systems, data has to be partitioned according to the performance of the computing units. We define a way to calculate the partition sizes, analyze the parallel execution exemplarily for two database operators, and present limitations that could hinder significant performance improvements. The findings in this paper can help system developers to assess the possibilities and limitations of intra-operator parallelism in heterogeneous environments, leading to more informed decisions if this approach is beneficial for a given workload and hardware environment

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa

Speculative execution plan for multiple query execution systems

Author: Brzuszek Marcin
Sasak Anna
Publication venue: 'Uniwersytetu Marii Curie-Sklodowskiej w Lublinie'
Publication date: 01/01/2010
Field of study

There are different levels at which parallelism can be introduced to the database system. Starting from data partitioning (intra-operator parallelism) up to parallelism of operation (inter-operator parallelism) that depends on a query granularity. The paper presents the parallelisation method based on speculative execution for the database systems which are expected to give answers to complex queries coming from different sources as soon as possible. Taking under consideration W of upcoming queries waiting for execution, the execution plan for the first query should be developed. This plan should give the largest benefit also for W-1 of the consecutive queries. Thus, in parallel to the first query, some excessive computations can be executed, which in further steps would reduce the execution time of the consecutive queries. The paper presents possible risks and benefits are using this method and also analyses of possible execution time reduction for different models of speculative parallelization [1]

Biblioteka Nauki - repozytorium artykuÅÃ³w

University of Maria Curie-Skłodowska (UMCS): Scientific e-Journals / Uniwersytet Marii Curie-Skłodowskiej: e-czasopisma naukowe

Parallel Query Execution in PRISMA/DB

Author: Apers P.M.G.
Flokstra J.
Wilschut A.N.
Publication venue: Springer-Verlag
Publication date: 01/01/1990
Field of study

University of Twente Research Information

Parallel Evaluation of Multi-join Queries

Author: America R
Annita N. Wilschut
Bitton D.
Carino E
Chen M. S.
Hong W.
Jan Flokstra
P. W. P.
Peter M. G. Apers
Schneider D.
Schneider D. A.
Spiliopoulou M.
Srivastava J.
Stonebraker M.
Wilschut A. N.
Wilschut A. N.
Wilschut A.N.
Ziane M.
Publication venue: Springer Verlag
Publication date: 01/01/1995
Field of study

A number of execution strategies for parallel evaluation of multi-join queries have been proposed in the literature. In this paper we give a comparative performance evaluation of four execution strategies by implementing all of them on the same parallel database system, PRISMA/DB. Experiments have been done up to 80 processors. These strategies, coming from the literature, are named: Sequential Parallel, Synchronous Execution, Segmented Right-Deep, and Full Parallel. Based on the experiments clear guidelines are given when to use which strategy. This is an extended abstract; the full paper appeared in Proc. ACM SIGMOD'94, Minneapolis, Minnesota, May 24–27, 199

CiteSeerX

Crossref

University of Twente Research Information

Parallelizing Windowed Stream Joins in a Shared-Nothing Cluster

Author: Chakraborty Abhirup
Singh Ajit
Publication venue
Publication date: 24/07/2013
Field of study

The availability of large number of processing nodes in a parallel and distributed computing environment enables sophisticated real time processing over high speed data streams, as required by many emerging applications. Sliding window stream joins are among the most important operators in a stream processing system. In this paper, we consider the issue of parallelizing a sliding window stream join operator over a shared nothing cluster. We propose a framework, based on fixed or predefined communication pattern, to distribute the join processing loads over the shared-nothing cluster. We consider various overheads while scaling over a large number of nodes, and propose solution methodologies to cope with the issues. We implement the algorithm over a cluster using a message passing system, and present the experimental results showing the effectiveness of the join processing algorithm.Comment: 11 page

arXiv.org e-Print Archive

Crossref