Search CORE

573 research outputs found

On-the-fly memory compression for multibody algorithms.

Author: Eckhardt W.
Glas R.
Joubert G.R.
Korzh D.
Leather H.
Parsons M.
Peters F.
Sawyer M.
Wallner S.
Weinzierl T.
Publication venue: 'IOS Press'
Publication date: 01/04/2016
Field of study

Memory and bandwidth demands challenge developers of particle-based codes that have to scale on new architectures, as the growth of concurrency outperforms improvements in memory access facilities, as the memory per core tends to stagnate, and as communication networks cannot increase bandwidth arbitrary. We propose to analyse each particle of such a code to find out whether a hierarchical data representation storing data with reduced precision caps the memory demands without exceeding given error bounds. For admissible candidates, we perform this compression and thus reduce the pressure on the memory subsystem, lower the total memory footprint and reduce the data to be exchanged via MPI. Notably, our analysis and transformation changes the data compression dynamically, i.e. the choice of data format follows the solution characteristics, and it does not require us to alter the core simulation code

Durham Research Online

The Parallelism Motifs of Genomic Data Analysis

Author: Awan Muaaz
Azad Ariful
Brock Benjamin
Buluc Aydin
Egan Rob
Ekanayake Saliya
Ellis Marquita
Georganas Evangelos
Guidi Giulia
Hofmeyr Steven
Oliker Leonid
Selvitopi Oguz
Teodoropol Cristina
Yelick Katherine
Publication venue: 'The Royal Society'
Publication date: 20/01/2020
Field of study

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

arXiv.org e-Print Archive

eScholarship - University of California

Communication-Avoiding Algorithms for a High-Performance Hyperbolic PDE Engine

Author: CHARRIER DOMINIC,ETIENNE
Publication venue
Publication date: 01/01/2020
Field of study

The study of waves has always been an important subject of research. Earthquakes, for example, have a direct impact on the daily lives of millions of people while gravitational waves reveal insight into the composition and history of the Universe. These physical phenomena, despite being tackled traditionally by different fields of physics, have in common that they are modelled the same way mathematically: as a system of hyperbolic partial differential equations (PDEs). The ExaHyPE project (“An Exascale Hyperbolic PDE Engine") translates this similarity into a software engine that can be quickly adapted to simulate a wide range of hyperbolic partial differential equations. ExaHyPE’s key idea is that the user only specifies the physics while the engine takes care of the parallelisation and the interplay of the underlying numerical methods. Consequently, a first simulation code for a new hyperbolic PDE can often be realised within a few hours. This is a task that traditionally can take weeks, months, even years for researchers starting from scratch. My main contribution to ExaHyPE is the development of the core infrastructure. This comprises the development and implementation of ExaHyPE’s solvers and adaptive mesh refinement procedures, it’s MPI+X parallelisation as well as high-level aspects of ExaHyPE’s application-tailored code generation, which allows to adapt ExaHyPE to model many different hyperbolic PDE systems. Like any high-performance computing code, ExaHyPE has to tackle the challenges of the coming exascale computing era, notably network communication latencies and the growing memory wall. In this thesis, I propose memory-efficient realisations of ExaHyPE’s solvers that avoid data movement together with a novel task-based MPI+X parallelisation concept that allows to hide network communication behind computation in dynamically adaptive simulations

Durham e-Theses

Compact connectivity representation for triangle meshes

Author: Gurung Topraj
Publication venue: Georgia Institute of Technology
Publication date: 05/04/2013
Field of study

Many digital models used in entertainment, medical visualization, material science, architecture, Geographic Information Systems (GIS), and mechanical Computer Aided Design (CAD) are defined in terms of their boundaries. These boundaries are often approximated using triangle meshes. The complexity of models, which can be measured by triangle count, increases rapidly with the precision of scanning technologies and with the need for higher resolution. An increase in mesh complexity results in an increase of storage requirement, which in turn increases the frequency of disk access or cache misses during mesh processing, and hence decreases performance. For example, in a test application involving a mesh with 55 million triangles in a machine with 4GB of memory versus a machine with 1GB of memory, performance decreases by a factor of about 6000 because of memory thrashing. To help reduce memory thrashing, we focus on decreasing the average storage requirement per triangle measured in 32-bit integer references per triangle (rpt). This thesis covers compact connectivity representation for triangle meshes and discusses four data structures: 1. Sorted Opposite Table (SOT), which uses 3 rpt and has been extended to support tetrahedral meshes. 2. Sorted Quad (SQuad), which uses about 2 rpt and has been extended to support streaming. 3. Laced Ring (LR), which uses about 1 rpt and offers an excellent compromise between storage compactness and performance of mesh traversal operators. 4. Zipper, an extension of LR, which uses about 6 bits per triangle (equivalently 0.19 rpt), therefore is the most compact representation. The triangle mesh data structures proposed in this thesis support the standard set of mesh connectivity operators introduced by the previously proposed Corner Table at an amortized constant time complexity. They can be constructed in linear time and space from the Corner Table or any equivalent representation. If geometry is stored as 16-bit coordinates, using Zipper instead of the Corner Table increases the size of the mesh that can be stored in core memory by a factor of about 8.PhDCommittee Chair: Rossignac, Jarek; Committee Co-Chair: Frost, David; Committee Member: Lindstrom, Peter; Committee Member: Liu, C. Karen; Committee Member: Turk, Gre

Scholarly Materials And Research @ Georgia Tech

Run-time optimization of adaptive irregular applications

Author: Yu Hao
Publication venue: Texas A&M University
Publication date: 15/11/2004
Field of study

Compared to traditional compile-time optimization, run-time optimization could oﬀer signiﬁcant performance improvements when parallelizing and optimizing adaptive irregular applications, because it performs program analysis and adaptive optimizations during program execution. Run-time techniques can succeed where static techniques fail because they exploit the characteristics of input data, programs' dynamic behaviors, and the underneath execution environment. When optimizing adaptive irregular applications for parallel execution, a common observation is that the effectiveness of the optimizing transformations depends on programs' input data and their dynamic phases. This dissertation presents a set of run-time optimization techniques that match the characteristics of programs' dynamic memory access patterns and the appropriate optimization (parallelization) transformations. First, we present a general adaptive algorithm selection framework to automatically and adaptively select at run-time the best performing, functionally equivalent algorithm for each of its execution instances. The selection process is based on off-line automatically generated prediction models and characteristics (collected and analyzed dynamically) of the algorithm's input data, In this dissertation, we specialize this framework for automatic selection of reduction algorithms. In this research, we have identiﬁed a small set of machine independent high-level characterization parameters and then we deployed an off-line, systematic experiment process to generate prediction models. These models, in turn, match the parameters to the best optimization transformations for a given machine. The technique has been evaluated thoroughly in terms of applications, platforms, and programs' dynamic behaviors. Speciﬁcally, for the reduction algorithm selection, the selected performance is within 2% of optimal performance and on average is 60% better than "Replicated Buffer," the default parallel reduction algorithm speciﬁed by OpenMP standard. To reduce the overhead of speculative run-time parallelization, we have developed an adaptive run-time parallelization technique that dynamically chooses effcient shadow structures to record a program's dynamic memory access patterns for parallelization. This technique complements the original speculative run-time parallelization technique, the LRPD test, in parallelizing loops with sparse memory accesses. The techniques presented in this dissertation have been implemented in an optimizing research compiler and can be viewed as effective building blocks for comprehensive run-time optimization systems, e.g., feedback-directed optimization systems and dynamic compilation systems

Texas A&M Repository

On-the-fly memory compression for multibody algorithms

Author: Eckhardt W.
Glas R.
Joubert G.R.
Korzh D.
Leather H.
Parsons M.
Peters F.
Sawyer M.
Wallner S.
Weinzierl T.
Publication venue
Publication date: 01/04/2016
Field of study

Durham Research Online