Search CORE

241 research outputs found

Parallel Computers and Complex Systems

Author: G.C. Fox
P.D. Coddington
Publication venue: University Press
Publication date
Field of study

We present an overview of the state of the art and future trends in high performance parallel and distributed computing, and discuss techniques for using such computers in the simulation of complex problems in computational science. The use of high performance parallel computers can help improve our understanding of complex systems, and the converse is also true --- we can apply techniques used for the study of complex systems to improve our understanding of parallel computing. We consider parallel computing as the mapping of one complex system --- typically a model of the world --- into another complex system --- the parallel computer. We study static, dynamic, spatial and temporal properties of both the complex systems and the map between them. The result is a better understanding of which computer architectures are good for which problems, and of software structure, automatic partitioning of data, and the performance of parallel machines

CiteSeerX

On the benefits of tasking with OpenMP

Author: A Duran
A Duran
A Rico
E Ayguadé
M Garcia-Gasulla
MJ Berger
MJ Berger
P Atkinson
P Virouleau
R Vidal
T Gautier
X Teruel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Tasking promises a model to program parallel applications that provides intuitive semantics. In the case of tasks with dependences, it also promises better load balancing by removing global synchronizations (barriers), and potential for improved locality. Still, the adoption of tasking in production HPC codes has been slow. Despite OpenMP supporting tasks, most codes rely on worksharing-loop constructs alongside MPI primitives. This paper provides insights on the benefits of tasking over the worksharing-loop model by reporting on the experience of taskifying an adaptive mesh refinement proxy application: miniAMR. The performance evaluation shows the taskified implementation being 15–30% faster than the loop-parallel one for certain thread counts across four systems, three architectures and four compilers thanks to better load balancing and system utilization. Dynamic scheduling of loops narrows the gap but still falls short of tasking due to serial sections between loops. Locality improvements are incidental due to the lack of locality-aware scheduling. Overall, the introduction of asynchrony with tasking lives up to its promises, provided that programmers parallelize beyond individual loops and across application phases.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Performance visualizations using XML representations

Author: Beyls Kristof
D'Hollander Erik
Yu YJ
Publication venue
Publication date: 01/01/2004
Field of study

The intermediate representation (IR)forms the information exchanged among different passes of program compilation. The intermediate format proposed for extensibility and persistence is written in XML. In this way, the program transformations that were internal to the compiler become visible. The hierarchical structure of XML makes a natural representation for the abstract syntax tree (AST). A compiler can parse the program source into an IR, then output it as an XML document. Separated by orthogonal namespaces, other IRs are also presented in the same XML document, gathering program information such as dependence vectors, transforming matrices, iteration spaces dependence graphs and cache reuse distances. This XML document can be exchanged between the compiler and program visualizers for parallelism and locality

Ghent University Academic Bibliography

Developing a Benchmark for Evaluating the Performance of Parallel Computers

Author: Williamson D. Ladd
Publication venue: DigitalCommons@USU
Publication date: 20/04/1994
Field of study

This paper discusses the development of a portable suite of benchmarking programs for parallel computers. Comparative measurement of the performance of parallel computing systems has been limited because of the great diversity of architectures and of processor interconnection schemes. One solution is to translate benchmark codes into a consistent and portable parallel language. This paper reports on progress in developing such a portable suite of benchmarks. An extensive introduction to parallel computing is included as an appendix, to provide a thorough understanding of the factors complicating development of the performance suite. Key to the development was the use of p4, a library of tools developed at Argonne National Laboratory. The benchmark codes were translated successfully using p4 and were run on a variety of parallel machines. Conclusions and suggestions for future work are given

DigitalCommons@USU

An Iteration Space Visualizer for Polyhedral Loop Transformations in Numerical Programming

Author: Marek Palkowski
Wlodzimierz Bielecki
Publication venue: 'Polish Information Processing Society PTI'
Publication date: 01/10/2016
Field of study

Crossref

Directory of Open Access Journals

Dagstuhl News January - December 1999

Author: Wilhelm Reinhard
Publication venue: Dagstuhl Publications. Dagstuhl News
Publication date: 01/01/1998
Field of study

"Dagstuhl News" is a publication edited especially for the members of the Foundation "Informatikzentrum Schloss Dagstuhl" to thank them for their support. The News give a summary of the scientific work being done in Dagstuhl. Each Dagstuhl Seminar is presented by a small abstract describing the contents and scientific highlights of the seminar as well as the perspectives or challenges of the research topic

Dagstuhl Research Online Publication Server

Computing and Information Science (CIS)

Author: Cornell University
Publication venue: 'SAGE Publications'
Publication date: 01/01/2006
Field of study

Cornell University Courses of Study Vol. 97 2005/200

eCommons@Cornell

Graphical processing unit (GPU) acceleration for numerical solution of population balance models using high resolution finite volume algorithm

Author: Botond Szilagyi (2887820)
Zoltan Nagy (1254105)
Publication venue
Publication date: 07/04/2016
Field of study

© 2016 Elsevier LtdPopulation balance modeling is a widely used approach to describe crystallization processes. It can be extended to multivariate cases where more internal coordinates i.e., particle properties such as multiple characteristic sizes, composition, purity, etc. can be used. The current study presents highly efficient fully discretized parallel implementation of the high resolution finite volume technique implemented on graphical processing units (GPUs) for the solution of single- and multi-dimensional population balance models (PBMs). The proposed GPU-PBM is implemented using CUDA C++ code for GPU calculations and provides a generic Matlab interface for easy application for scientific computing. The case studies demonstrate that the code running on the GPU is between 2–40 times faster than the compiled C++ code and 50–250 times faster than the standard MatLab implementation. This significant improvement in computational time enables the application of model-based control approaches in real time even in case of multidimensional population balance models

Loughborough University Institutional Repository