45 research outputs found

    Quantitative Performance Analysis of the SPEC OMPM2001 Benchmarks

    Get PDF

    Optimizations In Compiler: Vectorization, Reordering, Register Allocation And Verification Of Explicitly Parallel Programs

    Get PDF
    Compiler Optimizations form a very important part of compiler development as they make a major difeerence between an average and a great compiler. There are various modules of a compiler-which opens opportunities for optimizations on various spheres. In this thesis, a comparative study of vectorization is done exposing the strengths and weaknesses of various contemporary compilers. Additionally, a study on the impact of vectorization on tiled code is performed. Different strategies for loop nest optimization is explored. An algorithm for statement reordering in loops to enhance performance has been developed. An Integer Linear Program formulation is done to improve loop parallelism, which makes use of loop unrolling and explicitly parallel directives. Finally, an attempt for optimal loop distribution is made. Following loop nest optimization chapter, an explanation of interprocedural register allocation(IPRA) for ARM32 and AArch64 is given. Additionally, a brief description of the problems for implementing IPRA for those architectures is presented. We conclude the chapter with the performance results with IPRA for those platforms. In the last chapter, a description of VoPiL, a static OpenMP verifier in LLVM, is presented. A brief description of the analysis and the results are included

    On the Interoperability of Programming Languages based on the Fork-Join Parallelism Model

    Get PDF
    This thesis describes the implementation of MetaFork, a meta-language for concurrency platforms targeting multicore architectures. First of all, MetaFork is a multithreaded language based on the fork-join model of concurrency: it allows the programmer to express parallel algorithms assuming that tasks are dynamically scheduled at run-time. While MetaFork makes no assumption about the run-time system, it formally defines the serial C-elision of a MetaFork program. In addition, MetaFork is a suite of source-to-source compilers permitting the automatic translation of multithreaded programs between programming languages based on the fork-join model. Currently, this compilation framework supports the OpenMP and CilkPlus concurrency platforms. The implementation of those compilers explicitly manages parallelism according to the directives specified in MetaFork, OpenMP and CilkPlus. We evaluate experimentally the benefits of MetaFork. First, we show that this framework can be used to perform comparative implementation of a given multi- threaded algorithm so as to narrow performance bottlenecks in one implementation of this algorithm. Secondly, we show that the translation of hand written and highly optimized code within MetaFork generally produces code with similar performance as the original

    Barcelona OpenMP tasks suite: a set of benchmarks targeting the exploitation of task parallelism in OpenMP

    Get PDF
    Traditional parallel applications have exploited regular parallelism, based on parallel loops. Only a few applications exploit sections parallelism. With the release of the new OpenMP specification (3.0), this programming model supports tasking. Parallel tasks allow the exploitation of irregular parallelism, but there is a lack of benchmarks exploiting tasks in OpenMP. With the current (and projected) multicore architectures that offer many more alternatives to execute parallel applications than traditional SMP machines, this kind of parallelism is increasingly important. And so, the need to have some set of benchmarks to evaluate it. In this paper, we motivate the need of having such a benchmarks suite, for irregular and/or recursive task parallelism. We present our proposal, the Barcelona OpenMP Tasks Suite (BOTS), with a set of applications exploiting regular and irregular parallelism, based on tasks. We present an overall evaluation of the BOTS benchmarks in an Altix system and we discuss some of the different experiments that can be done with the different compilation and runtime alternatives of the benchmarks.Peer ReviewedPostprint (published version

    Multiprocessing for the particle tracking model MODPATH

    Get PDF
    Particle tracking has several important applications for solute transport studies in aquifer systems. Travel time distribution at observation points, particle coordinates in time and streamlines are some practical results providing information of expected transport patterns and interaction with boundary conditions. However, flow model complexity and simultaneous displacement of multiple particle groups leads to rapid increase of computational requirements. MODPATH is a particle tracking engine for MODFLOW models and source code displays potential for parallel processing of particles. This article addresses the implementation of this feature with the OpenMP library. Two synthetic aquifer applications are employed for performance tests on a desktop computer with increasing number of particles. Speed up analysis shows that dynamic thread scheduling is preferable for highly heterogeneous flows, providing processing adaptivity to the presence of slow particles. In simulations writing particles position in time, thread exclusive output files lead to higher speed up factors. Results show that above a threshold number of particles, simulation runtimes become independent of flow model grid complexity and are controlled by the large number of particles, then parallel processing reduces simulation runtimes for the particle tracking model MODPATH.Peer ReviewedPostprint (published version

    Guppie: A Coordination Framework for Parallel Processing Using Shared Memory Featuring A Master-Worker Relationship

    Get PDF
    Most programs can be parallelized to some extent. The processing power available in computers today makes parallel computing more desirable and attainable than ever before. Many machines today have multiple processors or multiple processing cores making parallel computing more available locally, as well as over a network. In order for parallel applications to be written, they require a computing language, such as C++, and a coordination language (or library), such as Linda. This research involves the creation and implementation of a coordination framework, Guppie, which is easy to use, similar to Linda, but provides more efficiency when dealing with large amounts of messages and data. Greater efficiency can be achieved in coarse-grained parallel computing through the use of shared memory managed through a master-worker relationship

    OpenMP aware MHP Analysis for Improved Static Data-Race Detection

    Get PDF
    Data races, a major source of bugs in concurrent programs, can result in loss of manpower and time as well as data loss due to system failures. OpenMP, the de facto shared memory parallelism framework used in the HPC community, also suffers from data races. To detect race conditions in OpenMP programs and improve turnaround time and/or developer productivity, we present a data flow analysis based, fast, static data race checker in the LLVM compiler framework. Our tool can detect races in the presence or absence of explicit barriers, with implicit or explicit synchronization. In addition, our tool effectively works for the OpenMP target offloading constructs and also supports the frequently used OpenMP constructs.We formalize and provide a data flow analysis framework to perform Phase Interval Analysis (PIA) of OpenMP programs. Phase intervals are then used to compute the MHP (and its complement NHP) sets for the programs, which, in turn, are used to detect data races statically.We evaluate our work using multiple OpenMP race detection benchmarks and real world applications. Our experiments show that the checker is comparable to the state-of-The-Art in various performance metrics with around 90% accuracy, almost perfect recall, and significantly lower runtime and memory footprint. © 2021 IEEE

    OpenMP aware MHP Analysis for Improved Static Data-Race Detection

    Get PDF
    Data races, a major source of bugs in concurrent programs, can result in loss of manpower and time as well as data loss due to system failures. OpenMP, the de facto shared memory parallelism framework used in the HPC community, also suffers from data races. To detect race conditions in OpenMP programs and improve turnaround time and/or developer productivity, we present a data flow analysis based, fast, static data race checker in the LLVM compiler framework. Our tool can detect races in the presence or absence of explicit barriers, with implicit or explicit synchronization. In addition, our tool effectively works for the OpenMP target offloading constructs and also supports the frequently used OpenMP constructs.We formalize and provide a data flow analysis framework to perform Phase Interval Analysis (PIA) of OpenMP programs. Phase intervals are then used to compute the MHP (and its complement NHP) sets for the programs, which, in turn, are used to detect data races statically.We evaluate our work using multiple OpenMP race detection benchmarks and real world applications. Our experiments show that the checker is comparable to the state-of-The-Art in various performance metrics with around 90% accuracy, almost perfect recall, and significantly lower runtime and memory footprint. © 2021 IEEE
    corecore