221,876 research outputs found

    Review of Elements of Parallel Computing

    Get PDF
    As the title clearly states, this book is about parallel computing. Modern computers are no longer characterized by a single, fully sequential CPU. Instead, they have one or more multicore/manycore processors. The purpose of such parallel architectures is to enable the simultaneous execution of instructions, in order to achieve faster computations. In high performance computing, clusters of parallel processors are used to achieve PFLOPS performance, which is necessary for scientific and Big Data applications. Mastering parallel computing means having deep knowledge of parallel architectures, parallel programming models, parallel algorithms, parallel design patterns, and performance analysis and optimization techniques. The design of parallel programs requires a lot of creativity, because there is no universal recipe that allows one to achieve the best possible efficiency for any problem. The book presents the fundamental concepts of parallel computing from the point of view of the algorithmic and implementation patterns. The idea is that, while the hardware keeps changing, the same principles of parallel computing are reused. The book surveys some key algorithmic structures and programming models, together with an abstract representation of the underlying hardware. Parallel programming patterns are purposely not illustrated using the formal design patterns approach, to keep an informal and friendly presentation that is suited to novices

    High-Level Programming for Medical Imaging on Multi-GPU Systems Using the SkelCL Library

    Get PDF
    Application development for modern high-performance systems with Graphics Processing Units (GPUs) relies on low-level programming approaches like CUDA and OpenCL, which leads to complex, lengthy and error-prone programs. In this paper, we present SkelCL – a high-level programming model for systems with multiple GPUs and its implementation as a library on top of OpenCL. SkelCL provides three main enhancements to the OpenCL standard: 1) computations are conveniently expressed using parallel patterns (skeletons); 2) memory management is simplified using parallel container data types; 3) an automatic data (re)distribution mechanism allows for scalability when using multi-GPU systems. We use a real-world example from the field of medical imaging to motivate the design of our programming model and we show how application development using SkelCL is simplified without sacrificing performance: we were able to reduce the code size in our imaging example application by 50% while introducing only a moderate runtime overhead of less than 5%

    The Parallelism Motifs of Genomic Data Analysis

    Get PDF
    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing

    Finding parallel patterns through static analysis in C++ applications

    Get PDF
    Since The 'Free Lunch' Of Processor Performance Is Over, Parallelism Has Become The New Trend In Hardware And Architecture Design. However, Parallel Resources Deployed In Data Centers Are Underused In Many Cases, Given That Sequential Programming Is Still Deeply Rooted In Current Software Development. To Address This Problem, New Methodologies And Techniques For Parallel Programming Have Been Progressively Developed. For Instance, Parallel Frameworks, Offering Programming Patterns, Allow Expressing Concurrency In Applications To Better Exploit Parallel Hardware. Nevertheless, A Large Portion Of Production Software, From A Broad Range Of Scientific And Industrial Areas, Is Still Developed Sequentially. Considering That These Software Modules Contain Thousands, Or Even Millions, Of Lines Of Code, An Extremely Large Amount Of Effort Is Needed To Identify Parallel Regions. To Pave The Way In This Area, This Paper Presents Parallel Pattern Analyzer Tool, A Software Component That Aids The Discovery And Annotation Of Parallel Patterns In Source Codes. This Tool Simplifies The Transformation Of Sequential Source Code To Parallel. Specifically, We Provide Support For Identifying Map, Farm, And Pipeline Parallel Patterns And Evaluate The Quality Of The Detection For A Set Of Different C++ Applications.This work was partially supported by the EU Projects ICT 644235 “RePhrase: Refactoring Parallel Heterogeneous Resource-Aware Applications” and the FP7 609666 “Repara: Reengineering and Enabling Performance and Power of Application

    AllScale API

    Get PDF
    Effectively implementing scientific algorithms in distributed memory parallel applications is a difficult task for domain scientists, as evident by the large number of domain-specific languages and libraries available today attempting to facilitate the process. However, they usually provide a closed set of parallel patterns and are not open for extension without vast modifications to the underlying system. In this work, we present the AllScale API, a programming interface for developing distributed memory parallel applications with the ease of shared memory programming models. The AllScale API is closed for a modification but open for an extension, allowing new user-defined parallel patterns and data structures to be implemented based on existing core primitives and therefore fully supported in the AllScale framework. Focusing on high-level functionality directly offered to application developers, we present the design advantages of such an API design, detail some of its specifications and evaluate it using three real-world use cases. Our results show that AllScale decreases the complexity of implementing scientific applications for distributed memory while attaining comparable or higher performance compared to MPI reference implementations

    UPIR: Toward the Design of Unified Parallel Intermediate Representation for Parallel Programming Models

    Full text link
    The complexity of heterogeneous computing architectures, as well as the demand for productive and portable parallel application development, have driven the evolution of parallel programming models to become more comprehensive and complex than before. Enhancing the conventional compilation technologies and software infrastructure to be parallelism-aware has become one of the main goals of recent compiler development. In this paper, we propose the design of unified parallel intermediate representation (UPIR) for multiple parallel programming models and for enabling unified compiler transformation for the models. UPIR specifies three commonly used parallelism patterns (SPMD, data and task parallelism), data attributes and explicit data movement and memory management, and synchronization operations used in parallel programming. We demonstrate UPIR via a prototype implementation in the ROSE compiler for unifying IR for both OpenMP and OpenACC and in both C/C++ and Fortran, for unifying the transformation that lowers both OpenMP and OpenACC code to LLVM runtime, and for exporting UPIR to LLVM MLIR dialect.Comment: Typos corrected. Format update

    Parallel Programming of General-Purpose Programs Using Task-Based Programming Models

    Get PDF
    The prevalence of multicore processors is bound to drive most kinds of software development towards parallel programming. To limit the difficulty and overhead of parallel software design and maintenance, it is crucial that parallel programming models allow an easy-to-understand, concise and dense representation of parallelism. Parallel programming models such as Cilk++ and Intel TBBs attempt to offer a better, higher-level abstraction for parallel programming than threads and locking synchronization. It is not straightforward, however, to express all patterns of parallelism in these models. Pipelines are an important parallel construct, although difficult to express in Cilk and TBBs in a straightfor- ward way, not without a verbose restructuring of the code. In this paper we demonstrate that pipeline parallelism can be easily and concisely expressed in a Cilk-like language, which we extend with input, output and input/output dependency types on procedure arguments, enforced at runtime by the scheduler. We evaluate our implementation on real applications and show that our Cilk-like scheduler, extended to track and enforce these dependencies has performance comparable to Cilk++

    Refining SCJ Mission Specifications into Parallel Handler Designs

    Full text link
    Safety-Critical Java (SCJ) is a recent technology that restricts the execution and memory model of Java in such a way that applications can be statically analysed and certified for their real-time properties and safe use of memory. Our interest is in the development of comprehensive and sound techniques for the formal specification, refinement, design, and implementation of SCJ programs, using a correct-by-construction approach. As part of this work, we present here an account of laws and patterns that are of general use for the refinement of SCJ mission specifications into designs of parallel handlers used in the SCJ programming paradigm. Our notation is a combination of languages from the Circus family, supporting state-rich reactive models with the addition of class objects and real-time properties. Our work is a first step to elicit laws of programming for SCJ and fits into a refinement strategy that we have developed previously to derive SCJ programs.Comment: In Proceedings Refine 2013, arXiv:1305.563
    • …
    corecore