Search CORE

124 research outputs found

Recommended from our members

Galois : a system for parallel execution of irregular algorithms

Author: Nguyen Donald Do
Publication venue
Publication date: 04/09/2015
Field of study

textA programming model which allows users to program with high productivity and which produces high performance executions has been a goal for decades. This dissertation makes progress towards this elusive goal by describing the design and implementation of the Galois system, a parallel programming model for shared-memory, multicore machines. Central to the design is the idea that scheduling of a program can be decoupled from the core computational operator and data structures. However, efficient programs often require application-specific scheduling to achieve best performance. To bridge this gap, an extensible and abstract scheduling policy language is proposed, which allows programmers to focus on selecting high-level scheduling policies while delegating the tedious task of implementing the policy to a scheduler synthesizer and runtime system. Implementations of deterministic and prioritized scheduling also are described. An evaluation of a well-studied benchmark suite reveals that factoring programs into operators, schedulers and data structures can produce significant performance improvements over unfactored approaches. Comparison of the Galois system with existing programming models for graph analytics shows significant performance improvements, often orders of magnitude more, due to (1) better support for the restrictive programming models of existing systems and (2) better support for more sophisticated algorithms and scheduling, which cannot be expressed in other systems.Computer Science

Texas ScholarWorks

Whirlpool: Improving Dynamic Cache Management with Static Data Classification

Author: Agarwal N.
Awasthi M.
Beckmann B. M.
Beckmann N.
Beckmann N.
Beckmann N.
Blumofe R. D.
Chandra D.
Chishti Z.
Cho S.
Dally W. J.
Hilton A.
Jouppi N. P.
Karypis G.
Kim C.
Leverich J.
Pekhimenko G.
Petoumenos P.
Rusu S.
Sanchez D.
Shalf J.
Wang Z.
Xiang X.
Zhuravlev S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2016
Field of study

Cache hierarchies are increasingly non-uniform and difficult to manage. Several techniques, such as scratchpads or reuse hints, use static information about how programs access data to manage the memory hierarchy. Static techniques are effective on regular programs, but because they set fixed policies, they are vulnerable to changes in program behavior or available cache space. Instead, most systems rely on dynamic caching policies that adapt to observed program behavior. Unfortunately, dynamic policies spend significant resources trying to learn how programs use memory, and yet they often perform worse than a static policy. We present Whirlpool, a novel approach that combines static information with dynamic policies to reap the benefits of each. Whirlpool statically classifies data into pools based on how the program uses memory. Whirlpool then uses dynamic policies to tune the cache to each pool. Hence, rather than setting policies statically, Whirlpool uses static analysis to guide dynamic policies. We present both an API that lets programmers specify pools manually and a profiling tool that discovers pools automatically in unmodified binaries. We evaluate Whirlpool on a state-of-the-art NUCA cache. Whirlpool significantly outperforms prior approaches: on sequential programs, Whirlpool improves performance by up to 38% and reduces data movement energy by up to 53%; on parallel programs, Whirlpool improves performance by up to 67% and reduces data movement energy by up to 2.6x.National Science Foundation (U.S.) (grant CCF-1318384)National Science Foundation (U.S.) (CAREER-1452994)Samsung (Firm) (GRO award

DSpace@MIT

Crossref

New approaches for efficient on-the-fly FE operator assembly in a high-performance mantle convection framework

Author: Bauer Simon
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 18/02/2019
Field of study

HiFlow3 - A Flexible and Hardware- Aware Parallel Finite Element Package

Author: Anzt Hartwig
Augustin Werner
Baumann Martin
Bockelmann Hendryk
Gengenbach Thomas
Hahn Tobias
Heuveline Vincent
Ketelaer Eva
Lukarski Dimitar
Otzen Andrea
Ritterbusch Sebastian
Rocker Björn
Ronnas Staffan
Schick Michael
Subramanian Chandramowli
Weiss Jan-Philipp
Wilhelm Florian
Publication venue: Karlsruher Institut für Technologie
Publication date: 01/01/2010
Field of study

KITopen

hyper.deal: An efficient, matrix-free finite-element library for high-dimensional partial differential equations

Author: Kormann Katharina
Kronbichler Martin
Munch Peter
Publication venue
Publication date: 19/02/2020
Field of study

This work presents the efficient, matrix-free finite-element library hyper.deal for solving partial differential equations in two to six dimensions with high-order discontinuous Galerkin methods. It builds upon the low-dimensional finite-element library deal.II to create complex low-dimensional meshes and to operate on them individually. These meshes are combined via a tensor product on the fly and the library provides new special-purpose highly optimized matrix-free functions exploiting domain decomposition as well as shared memory via MPI-3.0 features. Both node-level performance analyses and strong/weak-scaling studies on up to 147,456 CPU cores confirm the efficiency of the implementation. Results of the library hyper.deal are reported for high-dimensional advection problems and for the solution of the Vlasov--Poisson equation in up to 6D phase space.Comment: 33 pages, 18 figure

arXiv.org e-Print Archive

OPUS Augsburg

MPG.PuRe

Parallel Multiscale Contact Dynamics for Rigid Non-spherical Bodies

Author: KRESTENITIS KONSTANTINOS
Publication venue
Publication date: 01/01/2018
Field of study

The simulation of large numbers of rigid bodies of non-analytical shapes or vastly varying sizes which collide with each other is computationally challenging. The fundamental problem is the identification of all contact points between all particles at every time step. In the Discrete Element Method (DEM), this is particularly difficult for particles of arbitrary geometry that exhibit sharp features (e.g. rock granulates). While most codes avoid non-spherical or non-analytical shapes due to the computational complexity, we introduce an iterative-based contact detection method for triangulated geometries. The new method is an improvement over a naive brute force approach which checks all possible geometric constellations of contact and thus exhibits a lot of execution branching. Our iterative approach has limited branching and high floating point operations per processed byte. It thus is suitable for modern Single Instruction Multiple Data (SIMD) CPU hardware. As only the naive brute force approach is robust and always yields a correct solution, we propose a hybrid solution that combines the best of the two worlds to produce fast and robust contacts. In terms of the DEM workflow, we furthermore propose a multilevel tree-based data structure strategy that holds all particles in the domain on multiple scales in grids. Grids reduce the total computational complexity of the simulation. The data structure is combined with the DEM phases to form a single touch tree-based traversal that identifies both contact points between particle pairs and introduces concurrency to the system during particle comparisons in one multiscale grid sweep. Finally, a reluctant adaptivity variant is introduced which enables us to realise an improved time stepping scheme with larger time steps than standard adaptivity while we still minimise the grid administration overhead. Four different parallelisation strategies that exploit multicore architectures are discussed for the triad of methodological ingredients. Each parallelisation scheme exhibits unique behaviour depending on the grid and particle geometry at hand. The fusion of them into a task-based parallelisation workflow yields promising speedups. Our work shows that new computer architecture can push the boundary of DEM computability but this is only possible if the right data structures and algorithms are chosen

Durham e-Theses

A pattern language for parallelizing irregular algorithms

Author: Monteiro Pedro Miguel Ferreira Costa
Publication venue: Faculdade de Ciências e Tecnologia
Publication date: 01/01/2009
Field of study

Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para obtenção do grau de Mestre em Engenharia InformáticaIn irregular algorithms, data set’s dependences and distributions cannot be statically predicted. This class of algorithms tends to organize computations in terms of data locality instead of parallelizing control in multiple threads. Thus, opportunities for exploiting parallelism vary dynamically, according to how the algorithm changes data dependences. As such, effective parallelization of such algorithms requires new approaches that account for that dynamic nature. This dissertation addresses the problem of building efficient parallel implementations of irregular algorithms by proposing to extract, analyze and document patterns of concurrency and parallelism present in the Galois parallelization framework for irregular algorithms. Patterns capture formal representations of a tangible solution to a problem that arises in a well defined context within a specific domain. We document the said patterns in a pattern language, i.e., a set of inter-dependent patterns that compose well-documented template solutions that can be reused whenever a certain problem arises in a well-known context

Repositório da Universidade Nova de Lisboa

Schnelle Löser für partielle Differentialgleichungen

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2008
Field of study

[no abstract available

Repositorium für Naturwissenschaften und Technik