Search CORE

45 research outputs found

Quantitative Performance Analysis of the SPEC OMPM2001 Benchmarks

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2003
Field of study

Optimizations In Compiler: Vectorization, Reordering, Register Allocation And Verification Of Explicitly Parallel Programs

Author: Das Santanu
Upadrasta Ramakrishna
Publication venue
Publication date: 01/01/2018
Field of study

Compiler Optimizations form a very important part of compiler development as they make a major difeerence between an average and a great compiler. There are various modules of a compiler-which opens opportunities for optimizations on various spheres. In this thesis, a comparative study of vectorization is done exposing the strengths and weaknesses of various contemporary compilers. Additionally, a study on the impact of vectorization on tiled code is performed. Different strategies for loop nest optimization is explored. An algorithm for statement reordering in loops to enhance performance has been developed. An Integer Linear Program formulation is done to improve loop parallelism, which makes use of loop unrolling and explicitly parallel directives. Finally, an attempt for optimal loop distribution is made. Following loop nest optimization chapter, an explanation of interprocedural register allocation(IPRA) for ARM32 and AArch64 is given. Additionally, a brief description of the problems for implementing IPRA for those architectures is presented. We conclude the chapter with the performance results with IPRA for those platforms. In the last chapter, a description of VoPiL, a static OpenMP verifier in LLVM, is presented. A brief description of the analysis and the results are included

Research Archive of Indian Institute of Technology Hyderabad

On the Interoperability of Programming Languages based on the Fork-Join Parallelism Model

Author: Shekar Sushek
Publication venue: Scholarship@Western
Publication date: 16/12/2013
Field of study

This thesis describes the implementation of MetaFork, a meta-language for concurrency platforms targeting multicore architectures. First of all, MetaFork is a multithreaded language based on the fork-join model of concurrency: it allows the programmer to express parallel algorithms assuming that tasks are dynamically scheduled at run-time. While MetaFork makes no assumption about the run-time system, it formally defines the serial C-elision of a MetaFork program. In addition, MetaFork is a suite of source-to-source compilers permitting the automatic translation of multithreaded programs between programming languages based on the fork-join model. Currently, this compilation framework supports the OpenMP and CilkPlus concurrency platforms. The implementation of those compilers explicitly manages parallelism according to the directives specified in MetaFork, OpenMP and CilkPlus. We evaluate experimentally the benefits of MetaFork. First, we show that this framework can be used to perform comparative implementation of a given multi- threaded algorithm so as to narrow performance bottlenecks in one implementation of this algorithm. Secondly, we show that the translation of hand written and highly optimized code within MetaFork generally produces code with similar performance as the original

Scholarship@Western

Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP

Author: Ayguadé Parra Eduard
Duran González Alejandro
Ferrer Roger
Martorell Bofill Xavier
Teruel Xavier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Traditional parallel applications have exploited regular parallelism, based on parallel loops. Only a few applications exploit sections parallelism. With the release of the new OpenMP specification (3.0), this programming model supports tasking. Parallel tasks allow the exploitation of irregular parallelism, but there is a lack of benchmarks exploiting tasks in OpenMP. With the current (and projected) multicore architectures that offer many more alternatives to execute parallel applications than traditional SMP machines, this kind of parallelism is increasingly important. And so, the need to have some set of benchmarks to evaluate it. In this paper, we motivate the need of having such a benchmarks suite, for irregular and/or recursive task parallelism. We present our proposal, the Barcelona OpenMP Tasks Suite (BOTS), with a set of applications exploiting regular and irregular parallelism, based on tasks. We present an overall evaluation of the BOTS benchmarks in an Altix system and we discuss some of the different experiments that can be done with the different compilation and runtime alternatives of the benchmarks.Peer ReviewedPostprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC

Multiprocessing for the particle tracking model MODPATH

Author: Fernández García Daniel
Pérez Illanes Rodrigo Alfonso
Publication venue: Wiley
Publication date: 01/12/2022
Field of study

Particle tracking has several important applications for solute transport studies in aquifer systems. Travel time distribution at observation points, particle coordinates in time and streamlines are some practical results providing information of expected transport patterns and interaction with boundary conditions. However, flow model complexity and simultaneous displacement of multiple particle groups leads to rapid increase of computational requirements. MODPATH is a particle tracking engine for MODFLOW models and source code displays potential for parallel processing of particles. This article addresses the implementation of this feature with the OpenMP library. Two synthetic aquifer applications are employed for performance tests on a desktop computer with increasing number of particles. Speed up analysis shows that dynamic thread scheduling is preferable for highly heterogeneous flows, providing processing adaptivity to the presence of slow particles. In simulations writing particles position in time, thread exclusive output files lead to higher speed up factors. Results show that above a threshold number of particles, simulation runtimes become independent of flow model grid complexity and are controlled by the large number of particles, then parallel processing reduces simulation runtimes for the particle tracking model MODPATH.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Guppie: A Coordination Framework for Parallel Processing Using Shared Memory Featuring A Master-Worker Relationship

Author: McCarthy Sean Christopher
Publication venue: The Aquila Digital Community
Publication date: 01/05/2010
Field of study

Most programs can be parallelized to some extent. The processing power available in computers today makes parallel computing more desirable and attainable than ever before. Many machines today have multiple processors or multiple processing cores making parallel computing more available locally, as well as over a network. In order for parallel applications to be written, they require a computing language, such as C++, and a coordination language (or library), such as Linda. This research involves the creation and implementation of a coordination framework, Guppie, which is easy to use, similar to Linda, but provides more efficiency when dealing with large amounts of messages and data. Greater efficiency can be achieved in coarse-grained parallel computing through the use of shared memory managed through a master-worker relationship

Aquila Digital Community

OpenMP aware MHP Analysis for Improved Static Data-Race Detection

Author: Bora Utpal
Joshi Saurabh
Upadrasta Ramakrishna
Vaishay Shraiysh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Data races, a major source of bugs in concurrent programs, can result in loss of manpower and time as well as data loss due to system failures. OpenMP, the de facto shared memory parallelism framework used in the HPC community, also suffers from data races. To detect race conditions in OpenMP programs and improve turnaround time and/or developer productivity, we present a data flow analysis based, fast, static data race checker in the LLVM compiler framework. Our tool can detect races in the presence or absence of explicit barriers, with implicit or explicit synchronization. In addition, our tool effectively works for the OpenMP target offloading constructs and also supports the frequently used OpenMP constructs.We formalize and provide a data flow analysis framework to perform Phase Interval Analysis (PIA) of OpenMP programs. Phase intervals are then used to compute the MHP (and its complement NHP) sets for the programs, which, in turn, are used to detect data races statically.We evaluate our work using multiple OpenMP race detection benchmarks and real world applications. Our experiments show that the checker is comparable to the state-of-The-Art in various performance metrics with around 90% accuracy, almost perfect recall, and significantly lower runtime and memory footprint. © 2021 IEEE

Research Archive of Indian Institute of Technology Hyderabad

OpenMP aware MHP Analysis for Improved Static Data-Race Detection

Author: Bora Utpal
Joshi Saurabh
Upadrasta Ramakrishna
Vaishay Shraiysh
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

arXiv.org e-Print Archive

Research Archive of Indian Institute of Technology Hyderabad