Search CORE

254 research outputs found

An Improved Parallelism Scheme for Deterministic Discrete Ordinates Transport

Author: Deakin Tom
Gaudin Wayne
Martineau Matt
McIntosh-Smith Simon
Publication venue: 'SAGE Publications'
Publication date: 01/07/2018
Field of study

Explore Bristol Research

Many-core acceleration of a discrete ordinates transport mini-app at extreme scale

Author: Deakin Tom J
Gaudin Wayne
McIntosh-Smith Simon N
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 15/06/2016
Field of study

Explore Bristol Research

Reviewing the Computational Performance of Structured and Unstructured Grid Deterministic SN Transport Sweeps on Many-core Architectures

Author: Deakin Tom
Hagues Andrew
Lovegrove Justin
McIntosh-Smith Simon
Smedley-Stevenson Richard
Publication venue: 'Informa UK Limited'
Publication date: 07/06/2020
Field of study

Explore Bristol Research

Leveraging Many-Core Technology for Deterministic Neutral Particle Transport at Extreme Scale

Author: Deakin Tom
Publication venue
Publication date: 08/05/2018
Field of study

Explore Bristol Research

An improved parallelism scheme for deterministic discrete ordinates transport

Author: Adams MP
Bailey TS
Baker RS
Deakin T
Evans TM
Hawkins WD
Koch KR
Lewis EE
Matt Martineau
McCalpin JD
Pennycook SJ
Simon McIntosh-Smith
Tom Deakin
Wayne Gaudin
Xiao S
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

Parallel Program Composition with Paragraphs in Stapl

Author: Smith Timmie
Publication venue
Publication date: 11/01/2021
Field of study

Languages and tools currently available for the development of parallel applications are difficult to learn and use. The Standard Template Adaptive Parallel Library (STAPL) is being developed to make it easier for programmers to implement a parallel application. STAPL is a parallel programming library for C++ that adopts the generic programming philosophy of the C++ Standard Template Library. STAPL provides collections of parallel algorithms (pAlgorithms) and containers (pContainers) that allow a developer to write their application without reimplementing the algorithms and data structures commonly used in parallel computing. pViews in STAPL are abstract data types that provide generic data access operations independently of the type of pContainer used to store the data. Algorithms and applications have a formal, high level representation in STAPL. A computation in STAPL is represented as a parallel task graph, which we call a PARAGRAPH. A PARAGRAPH contains a representation of the algorithm's input data, the operations that are used to transform individual data elements, and the ordering between the application of operations that transform the same data element. Just as programs are the result of a composition of algorithms, STAPL programs are the result of a composition of PARAGRAPHs. This dissertation develops the PARAGRAPH program representation and its compositional methods. PARAGRAPHs improve the developer's difficult situation by simplifying what she must specify when writing a parallel algorithm. The performance of the PARAGRAPH is evaluated using parallel generic algorithms, benchmarks from the NAS suite, and a nuclear particle transport application that has been written using STAPL. Our experiments were performed on Cray XT4 and Cray XE6 massively parallel systems and an IBM Power5 cluster, and show that scalable performance beyond 16,000 processors is possible using the PARAGRAPH

Texas A&M Repository

Goal-based h-adaptivity of the 1-D diamond difference discrete ordinate method.

Author: Eaton MD
Févotte F
Hülsemann F
Jeffers RS
Kópházi J
Ragusa J
Publication venue: Elsevier
Publication date: 18/01/2017
Field of study

The quantity of interest (QoI) associated with a solution of a partial differential equation (PDE) is not, in general, the solution itself, but a functional of the solution. Dual weighted residual (DWR) error estimators are one way of providing an estimate of the error in the QoI resulting from the discretisation of the PDE. This paper aims to provide an estimate of the error in the QoI due to the spatial discretisation, where the discretisation scheme being used is the diamond difference (DD) method in space and discrete ordinate (SNSN) method in angle. The QoI are reaction rates in detectors and the value of the eigenvalue (Keff)(Keff) for 1-D fixed source and eigenvalue (KeffKeff criticality) neutron transport problems respectively. Local values of the DWR over individual cells are used as error indicators for goal-based mesh refinement, which aims to give an optimal mesh for a given QoI

Spiral - Imperial College Digital Repository

FigShare

Evaluating the Effectiveness of a Vector-Length-Agnostic Instruction Set

Author: B Zhao
JD McCalpin
M Martineau
N Stephens
P Atkinson
S McIntosh-Smith
S McIntosh-Smith
T Deakin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/08/2020
Field of study

Crossref

Explore Bristol Research

Exploiting pipelined executions in OpenMP

Author: Ayguadé Parra Eduard
González Tallada Marc
Labarta Mancho Jesús José
Martorell Bofill Xavier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

We propose a set of extensions to the OpenMP programming model to express point-to-point synchronisation schemes. This is accomplished by defining, in the form of directives, precedence relations among the tasks that are originated from OpenMP work-sharing constructs. The proposal is based on the definition of a name space that identifies the work parceled out by these work-sharing constructs. Then the programmer defines the precedence relations using this name space. This relieves the programmer from the burden of defining complex synchronization data structures and the insertion of explicit synchronization actions in the program that make the program difficult to understand and maintain. We briefly describe the main aspects of the runtime implementation required to support precedence relations in OpenMP. We focus on the evaluation of the proposal through its use two benchmarks: NAS LU and ASCI Seep3dThis research has been supported by the Ministry of Science and Technology of Spain and the European Union (FEDER funds) under contract TIC2001-0995-C02-01.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Algorithm-Level Optimizations for Scalable Parallel Graph Processing

Author: Harshvardhan
Publication venue
Publication date: 17/01/2019
Field of study

Efficiently processing large graphs is challenging, since parallel graph algorithms suffer from poor scalability and performance due to many factors, including heavy communication and load-imbalance. Furthermore, it is difficult to express graph algorithms, as users need to understand and effectively utilize the underlying execution of the algorithm on the distributed system. The performance of graph algorithms depends not only on the characteristics of the system (such as latency, available RAM, etc.), but also on the characteristics of the input graph (small-world scalefree, mesh, long-diameter, etc.), and characteristics of the algorithm (sparse computation vs. dense communication). The best execution strategy, therefore, often heavily depends on the combination of input graph, system and algorithm. Fine-grained expression exposes maximum parallelism in the algorithm and allows the user to concentrate on a single vertex, making it easier to express parallel graph algorithms. However, this often loses information about the machine, making it difficult to extract performance and scalability from fine-grained algorithms. To address these issues, we present a model for expressing parallel graph algorithms using a fine-grained expression. Our model decouples the algorithm-writer from the underlying details of the system, graph, and execution and tuning of the algorithm. We also present various graph paradigms that optimize the execution of graph algorithms for various types of input graphs and systems. We show our model is general enough to allow graph algorithms to use the various graph paradigms for the best/fastest execution, and demonstrate good performance and scalability for various different graphs, algorithms, and systems to 100,000+ cores

Texas A&M Repository