81 research outputs found

    Programmability of the HPCS Languages: A Case Study with a Quantum Chemistry Kernel (Extended Version)

    Full text link

    Using the High Productivity Language Chapel to Target GPGPU Architectures

    Get PDF
    It has been widely shown that GPGPU architectures offer large performance gains compared to their traditional CPU counterparts for many applications. The downside to these architectures is that the current programming models present numerous challenges to the programmer: lower-level languages, explicit data movement, loss of portability, and challenges in performance optimization. In this paper, we present novel methods and compiler transformations that increase productivity by enabling users to easily program GPGPU architectures using the high productivity programming language Chapel. Rather than resorting to different parallel libraries or annotations for a given parallel platform, we leverage a language that has been designed from first principles to address the challenge of programming for parallelism and locality. This also has the advantage of being portable across distinct classes of parallel architectures, including desktop multicores, distributed memory clusters, large-scale shared memory, and now CPU-GPU hybrids. We present experimental results from the Parboil benchmark suite which demonstrate that codes written in Chapel achieve performance comparable to the original versions implemented in CUDA.NSF CCF 0702260Cray Inc. Cray-SRA-2010-016962010-2011 Nvidia Research Fellowshipunpublishednot peer reviewe

    A New Parallel Programing Language Fortress:features And Applications

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Bilişim Ensititüsü, 2009Thesis (M.Sc.) -- İstanbul Technical University, Institute of Informatics, 2009Bilgisayar sistemleri çok hızlı bir şekilde büyümektedirler. DARPA, 2010 yılı için peta-ölçekli bir bilgisayar sisteminin gereksinimi ve yapılabilirliğini öngördü. 2003 yılında bir çok firmayla bir proje başlattı. Şimdi, proje bitmek üzereyken ve milyonlarca dolar projeye aktarılmışken, projenin getirisi üç tane yüksek başarımlı yüksek işlevselli programlama dili oldu. Bu dillerden bir tanesi Fortress. Fortress, matematik gösterim temelli, değişken tipleri sıkı olarak takip edilen, blok tabanlı ve kesin olarak paralel bir bilgisayar programlama dilidir. Fortress'i ilginç kılan, yüksek işlevsellikli ve bilim yönlü yapısıdır. Bu çalışmada Fortress'in iç dinamiklerini inceledi. Performansını ölçmek için çeşitli testler yapıldı ve sonuçları tartışıldı.Bilgisayar sistemleri çok hızlı bir şekilde büyümektedirler. DARPA, 2010 yılı için peta-ölçekli bir bilgisayar sisteminin gereksinimi ve yapılabilirliğini öngördü. 2003 yılında bir çok firmayla bir proje başlattı. Şimdi, proje bitmek üzereyken ve milyonlarca dolar projeye aktarılmışken, projenin getirisi üç tane yüksek başarımlı yüksek işlevselli programlama dili oldu. Bu dillerden bir tanesi Fortress. Fortress, matematik gösterim temelli, değişken tipleri sıkı olarak takip edilen, blok tabanlı ve kesin olarak paralel bir bilgisayar programlama dilidir. Fortress'i ilginç kılan, yüksek işlevsellikli ve bilim yönlü yapısıdır. Bu çalışmada Fortress'in iç dinamiklerini inceledi. Performansını ölçmek için çeşitli testler yapıldı ve sonuçları tartışıldı.Yüksek LisansM.Sc

    DART-MPI: An MPI-based Implementation of a PGAS Runtime System

    Full text link
    A Partitioned Global Address Space (PGAS) approach treats a distributed system as if the memory were shared on a global level. Given such a global view on memory, the user may program applications very much like shared memory systems. This greatly simplifies the tasks of developing parallel applications, because no explicit communication has to be specified in the program for data exchange between different computing nodes. In this paper we present DART, a runtime environment, which implements the PGAS paradigm on large-scale high-performance computing clusters. A specific feature of our implementation is the use of one-sided communication of the Message Passing Interface (MPI) version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated the performance of the implementation with several low-level kernels in order to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address Space Programming Models (PGAS14

    An Incremental Parallel PGAS-based Tree Search Algorithm

    Get PDF
    International audienceIn this work, we show that the Chapel high-productivity language is suitable for the design and implementation of all aspects involved in the conception of parallel tree search algorithms for solving combinatorial problems. Initially, it is possible to hand-optimize the data structures involved in the search process in a way equivalent to C. As a consequence, the single-threaded search in Chapel is on average only 7% slower than its counterpart written in C. Whereas programming a multicore tree search in Chapel is equivalent to C-OpenMP in terms of performance and programmability, its productivity-aware features for distributed programming stand out. It is possible to incrementally conceive a distributed tree search algorithm starting from its multicore counterpart by adding few lines of code. The distributed implementation performs load balancing among different computer nodes and also exploits all CPU cores of the system. Chapel presents an interesting trade-off between programmability and performance despite the high level of its features. The distributed tree search in Chapel is on average 16% slower and reaches up to 80% of the scalability achieved by its C-MPI+OpenMP counterpart

    User-Defined Data Distributions in High-Level Programming Languages

    Get PDF
    One of the characteristic features of today’s high performance computing systems is a physically distributed memory. Efficient management of locality is essential for meeting key performance requirements for these architectures. The standard technique for dealing with this issue has involved the extension of traditional sequential programming languages with explicit message passing, in the context of a processor-centric view of parallel computation. This has resulted in complex and error-prone assembly-style codes in which algorithms and communication are inextricably interwoven. This paper presents a high-level approach to the design and implementation of data distributions. Our work is motivated by the need to improve the current parallel programming methodology by introducing a paradigm supporting the development of efficient and reusable parallel code. This approach is currently being implemented in the context of a new programming language called Chapel, which is designed in the HPCS project Cascade

    Partitioned Global Address Space Languages

    Get PDF
    The Partitioned Global Address Space (PGAS) model is a parallel programming model that aims to improve programmer productivity while at the same time aiming for high performance. The main premise of PGAS is that a globally shared address space improves productivity, but that a distinction between local and remote data accesses is required to allow performance optimizations and to support scalability on large-scale parallel architectures. To this end, PGAS preserves the global address space while embracing awareness of non-uniform communication costs. Today, about a dozen languages exist that adhere to the PGAS model. This survey proposes a definition and a taxonomy along four axes: how parallelism is introduced, how the address space is partitioned, how data is distributed among the partitions and finally how data is accessed across partitions. Our taxonomy reveals that today's PGAS languages focus on distributing regular data and distinguish only between local and remote data access cost, whereas the distribution of irregular data and the adoption of richer data access cost models remain open challenges

    An Incremental Parallel PGAS-based Tree Search Algorithm

    Get PDF
    International audienceIn this work, we show that the Chapel high-productivity language is suitable for the design and implementation of all aspects involved in the conception of parallel tree search algorithms for solving combinatorial problems. Initially, it is possible to hand-optimize the data structures involved in the search process in a way equivalent to C. As a consequence, the single-threaded search in Chapel is on average only 7% slower than its counterpart written in C. Whereas programming a multicore tree search in Chapel is equivalent to C-OpenMP in terms of performance and programmability, its productivity-aware features for distributed programming stand out. It is possible to incrementally conceive a distributed tree search algorithm starting from its multicore counterpart by adding few lines of code. The distributed implementation performs load balancing among different computer nodes and also exploits all CPU cores of the system. Chapel presents an interesting trade-off between programmability and performance despite the high level of its features. The distributed tree search in Chapel is on average 16% slower and reaches up to 80% of the scalability achieved by its C-MPI+OpenMP counterpart

    Survey of Novel Programming Models for Parallelizing Applications at Exascale

    Full text link

    HPCML: A Modeling Language Dedicated to High-Performance Scientific Computing

    Get PDF
    International audienceTremendous computational resources are required to compute complex physical simulations. Unfortunately computers able to provide such computational power are difficult to program, especially since the rise of heterogeneous hardware architectures. This makes it particularly challenging to exploit efficiently and sustainably supercomputers resources. We think that model-driven engineering can help us tame the complexity of high-performance scientific computing software development by separating the different concerns such as mathematics, parallelism, or validation. The principles of our approach, named MDE4HPC, stem from this idea. In this paper, we describe the High-Performance Computing Modeling Language (HPCML), a domain-specific modeling language at the center of this approach
    corecore