Search CORE

5,106 research outputs found

Mapping Divide-and-Conquer Algorithms to Parallel Architectures

Author: Gupta S.
Keldsen D.
Lo V.
Mohamed M.
Rajopadhye S. V.
Telle J.
Publication venue: University of Oregon
Publication date: 19/01/1990
Field of study

24 pagesIn this paper, we identify the binomial tree as an ideal computation structure for parallel divide-and-conquer algorithms. We show its superiority to the classic full binary tree structure with respect to speedup and efficiency. We also present elegant and efficient algorithms for mapping the binomial tree to two interconnection networks commonly used in multicomputers, namely the hypercube and the two-dimensional mesh. Our mappings are optimal with respect to both average dilation and link contention. We discuss the practical implications of these results for message-passing architectures using store-and-forward routing vs. those using wormhole routing

University of Oregon Scholars' Bank

Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers

Author: Dinh David
Simhadri Harsha Vardhan
Tang Yuan
Publication venue
Publication date: 14/02/2016
Field of study

The nested parallel (a.k.a. fork-join) model is widely used for writing parallel programs. However, the two composition constructs, i.e. "

\parallel

" (parallel) and "

;

" (serial), are insufficient in expressing "partial dependencies" or "partial parallelism" in a program. We propose a new dataflow composition construct "

\leadsto

" to express partial dependencies in algorithms in a processor- and cache-oblivious way, thus extending the Nested Parallel (NP) model to the \emph{Nested Dataflow} (ND) model. We redesign several divide-and-conquer algorithms ranging from dense linear algebra to dynamic-programming in the ND model and prove that they all have optimal span while retaining optimal cache complexity. We propose the design of runtime schedulers that map ND programs to multicore processors with multiple levels of possibly shared caches (i.e, Parallel Memory Hierarchies) and provide theoretical guarantees on their ability to preserve locality and load balance. For this, we adapt space-bounded (SB) schedulers for the ND model. We show that our algorithms have increased "parallelizability" in the ND model, and that SB schedulers can use the extra parallelizability to achieve asymptotically optimal bounds on cache misses and running time on a greater number of processors than in the NP model. The running time for the algorithms in this paper is

O\left(\frac{\sum_{i=0}^{h-1} Q^{*}({\mathsf t};\sigma\cdot M_i)\cdot C_i}{p}\right)

, where

Q^{*}

is the cache complexity of task

{\mathsf t}

C_i

is the cost of cache miss at level-

i

cache which is of size

M_i

\sigma\in(0,1)

is a constant, and

p

is the number of processors in an

h

-level cache hierarchy

arXiv.org e-Print Archive

Crossref

Parallel language constructs for tensor product computations on loosely coupled architectures

Author: Mehrotra Piyush
Vanrosendale John
Publication venue
Publication date
Field of study

Distributed memory architectures offer high levels of performance and flexibility, but have proven awkard to program. Current languages for nonshared memory architectures provide a relatively low level programming environment, and are poorly suited to modular programming, and to the construction of libraries. A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. Tensor product array computations are focused on along with a simple but important class of numerical algorithms. The problem of programming 1-D kernal routines is focused on first, such as parallel tridiagonal solvers, and then how such parallel kernels can be combined to form parallel tensor product algorithms is examined

NASA Technical Reports Server

Comparing MapReduce and pipeline implementations for counting triangles

Author: Pasarella Sánchez Ana Edelmira
Vidal Serodio Maria Esther
Zoltan Cristina
Publication venue
Publication date: 01/01/2016
Field of study

A generalized method to define the Divide & Conquer paradigm in order to have processors acting on its own data and scheduled in a parallel fashion. MapReduce is a programming model that follows this paradigm, and allows for the definition of efficient solutions by both decomposing a problem into steps on subsets of the input data and combining the results of each step to produce final results. Albeit used for the implementation of a wide variety of computational problems, MapReduce performance can be negatively affected whenever the replication factor grows or the size of the input is larger than the resources available at each processor. In this paper we show an alternative approach to implement the Divide & Conquer paradigm, named pipeline. The main features of pipeline are illustrated on a parallel implementation of the well-known problem of counting triangles in a graph. This problem is especially interesting either when the input graph does not fit in memory or is dynamically generated. To evaluate the properties of pipeline, a dynamic pipeline of processes and an ad-hoc version of MapReduce are implemented in the language Go, exploiting its ability to deal with channels and spawned processes. An empirical evaluation is conducted on graphs of different sizes and densities. Observed results suggest that pipeline allows for the implementation of an efficient solution of the problem of counting triangles in a graph, particularly, in dense and large graphs, drastically reducing the execution time with respect to the MapReduce implementation.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Improved Accuracy and Parallelism for MRRR-based Eigensolvers -- A Mixed Precision Approach

Author: Bientinesi Paolo
Petschow Matthias
Quintana-Orti Enrique
Publication venue
Publication date: 01/01/2013
Field of study

The real symmetric tridiagonal eigenproblem is of outstanding importance in numerical computations; it arises frequently as part of eigensolvers for standard and generalized dense Hermitian eigenproblems that are based on a reduction to tridiagonal form. For its solution, the algorithm of Multiple Relatively Robust Representations (MRRR) is among the fastest methods. Although fast, the solvers based on MRRR do not deliver the same accuracy as competing methods like Divide & Conquer or the QR algorithm. In this paper, we demonstrate that the use of mixed precisions leads to improved accuracy of MRRR-based eigensolvers with limited or no performance penalty. As a result, we obtain eigensolvers that are not only equally or more accurate than the best available methods, but also -in most circumstances- faster and more scalable than the competition

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Publikationsserver der RWTH Aachen University

Comparing MapReduce and pipeline implementations for counting triangles

Author: Pasarella Sánchez Ana Edelmira
Vidal Maria-Esther
Zoltan Torres Ana Cristina
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2017
Field of study

A common method to define a parallel solution for a computational problem consists in finding a way to use the Divide and Conquer paradigm in order to have processors acting on its own data and scheduled in a parallel fashion. MapReduce is a programming model that follows this paradigm, and allows for the definition of efficient solutions by both decomposing a problem into steps on subsets of the input data and combining the results of each step to produce final results. Albeit used for the implementation of a wide variety of computational problems, MapReduce performance can be negatively affected whenever the replication factor grows or the size of the input is larger than the resources available at each processor. In this paper we show an alternative approach to implement the Divide and Conquer paradigm, named dynamic pipeline. The main features of dynamic pipelines are illustrated on a parallel implementation of the well-known problem of counting triangles in a graph. This problem is especially interesting either when the input graph does not fit in memory or is dynamically generated. To evaluate the properties of pipeline, a dynamic pipeline of processes and an ad-hoc version of MapReduce are implemented in the language Go, exploiting its ability to deal with channels and spawned processes. An empirical evaluation is conducted on graphs of different topologies, sizes, and densities. Observed results suggest that dynamic pipelines allows for an efficient implementation of the problem of counting triangles in a graph, particularly, in dense and large graphs, drastically reducing the execution time with respect to the MapReduce implementation.Peer ReviewedPostprint (published version

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals