316,841 research outputs found

    A domain-specific high-level programming model

    No full text
    International audienceNowadays, computing hardware continues to move toward more parallelism and more heterogeneity, to obtain more computing power. From personal computers to supercomputers, we can find several levels of parallelism expressed by the interconnections of multi-core and many-core accelerators. On the other hand, computing software needs to adapt to this trend, and programmers can use parallel programming models (PPM) to fulfil this difficult task. There are different PPMs available that are based on tasks, directives, or low level languages or library. These offer higher or lower abstraction levels from the architecture by handling their own syntax. However, to offer an efficient PPM with a greater (additional) high-levelabstraction level while saving on performance, one idea is to restrict this to a specific domain and to adapt it to a family of applications. In the present study, we propose a high-level PPM specific to digital signal processing applications. It is based on data-flow graph models of computation, and a dynamic runtime model of execution (StarPU). We show how the user can easily express this digital signal processing application, and can take advantage of task, data and graph parallelism in the implementation, to enhance the performances of targeted heterogeneous clusters composed of CPUs and different accelerators (e.g., GPU, Xeon Phi

    A C++-embedded Domain-Specific Language for programming the MORA soft processor array

    Get PDF
    MORA is a novel platform for high-level FPGA programming of streaming vector and matrix operations, aimed at multimedia applications. It consists of soft array of pipelined low-complexity SIMD processors-in-memory (PIM). We present a Domain-Specific Language (DSL) for high-level programming of the MORA soft processor array. The DSL is embedded in C++, providing designers with a familiar language framework and the ability to compile designs using a standard compiler for functional testing before generating the FPGA bitstream using the MORA toolchain. The paper discusses the MORA-C++ DSL and the compilation route into the assembly for the MORA machine and provides examples to illustrate the programming model and performance

    Parallelizing Julia with a Non-Invasive DSL

    Get PDF
    Computational scientists often prototype software using productivity languages that offer high-level programming abstractions. When higher performance is needed, they are obliged to rewrite their code in a lower-level efficiency language. Different solutions have been proposed to address this trade-off between productivity and efficiency. One promising approach is to create embedded domain-specific languages that sacrifice generality for productivity and performance, but practical experience with DSLs points to some road blocks preventing widespread adoption. This paper proposes a non-invasive domain-specific language that makes as few visible changes to the host programming model as possible. We present ParallelAccelerator, a library and compiler for high-level, high-performance scientific computing in Julia. ParallelAccelerator\u27s programming model is aligned with existing Julia programming idioms. Our compiler exposes the implicit parallelism in high-level array-style programs and compiles them to fast, parallel native code. Programs can also run in "library-only" mode, letting users benefit from the full Julia environment and libraries. Our results show encouraging performance improvements with very few changes to source code required. In particular, few to no additional type annotations are necessary

    Improving programmability and performance for scientific applications

    Get PDF
    With modern advancements in hardware and software technology scaling towards new limits, our compute machines are reaching new potentials to tackle more challenging problems. While the size and complexity of both the problems and solutions increases, the programming methodologies must remain at a level that can be understood by programmers and scientists alike. In our work, this problem is encountered when developing an optimized framework to best exploit the semantic properties of a finite-element solver. In efforts to address this problem, we explore programming and runtime models which decouple algorithmic complexity, parallelism concerns, and hardware mapping. We build upon these frameworks to exploit domain-specific semantics using high-level transformations and modifications to obtain performance through algorithmic and runtime optimizations. We first discusses optimizations performed on a computational mechanics solver using a novel coupling technique for multi-time scale methods for discrete finite element domains. We exploit domain semantics using a high-level dynamic runtime scheme to reorder and balance workloads to greatly improve runtime performance. The framework presented automatically chooses a near-optimal coupling solution and runs a work-stealing parallel executor to run effectively on multi-core systems. In my latter work, I focus on the parallel programming model, Concurrent Collections (CnC), to seamlessly bridge the gap between performance and programmability. Because challenging problems in various domains, not limited to computation mechanics, requires both domain expertise and programming prowess, there is a need for ways to separate those concerns. This thesis describes methods and techniques to obtain scalable performance using CnC programming while limiting the burden of programming. These high level techniques are presented for two high-performance applications corresponding to hydrodynamics and multi-grid solvers

    PiCo: A Domain-Specific Language for Data Analytics Pipelines

    Get PDF
    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world

    A Domain-Specific Language and Editor for Parallel Particle Methods

    Full text link
    Domain-specific languages (DSLs) are of increasing importance in scientific high-performance computing to reduce development costs, raise the level of abstraction and, thus, ease scientific programming. However, designing and implementing DSLs is not an easy task, as it requires knowledge of the application domain and experience in language engineering and compilers. Consequently, many DSLs follow a weak approach using macros or text generators, which lack many of the features that make a DSL a comfortable for programmers. Some of these features---e.g., syntax highlighting, type inference, error reporting, and code completion---are easily provided by language workbenches, which combine language engineering techniques and tools in a common ecosystem. In this paper, we present the Parallel Particle-Mesh Environment (PPME), a DSL and development environment for numerical simulations based on particle methods and hybrid particle-mesh methods. PPME uses the meta programming system (MPS), a projectional language workbench. PPME is the successor of the Parallel Particle-Mesh Language (PPML), a Fortran-based DSL that used conventional implementation strategies. We analyze and compare both languages and demonstrate how the programmer's experience can be improved using static analyses and projectional editing. Furthermore, we present an explicit domain model for particle abstractions and the first formal type system for particle methods.Comment: Submitted to ACM Transactions on Mathematical Software on Dec. 25, 201

    The software-cycle model for re-engineering and reuse

    Get PDF
    This paper reports on the progress of a study which will contribute to our ability to perform high-level, component-based programming by describing means to obtain useful components, methods for the configuration and integration of those components, and an underlying economic model of the costs and benefits associated with this approach to reuse. One goal of the study is to develop and demonstrate methods to recover reusable components from domain-specific software through a combination of tools, to perform the identification, extraction, and re-engineering of components, and domain experts, to direct the applications of those tools. A second goal of the study is to enable the reuse of those components by identifying techniques for configuring and recombining the re-engineered software. This component-recovery or software-cycle model addresses not only the selection and re-engineering of components, but also their recombination into new programs. Once a model of reuse activities has been developed, the quantification of the costs and benefits of various reuse options will enable the development of an adaptable economic model of reuse, which is the principal goal of the overall study. This paper reports on the conception of the software-cycle model and on several supporting techniques of software recovery, measurement, and reuse which will lead to the development of the desired economic model

    Generating Fast Sparse Matrix Vector Multiplication From a High Level Generic Functional IR

    Get PDF
    Usage of high-level intermediate representations promises the generation of fast code from a high-level description, improving the productivity of developers while achieving the performance traditionally only reached with low-level programming approaches. High-level IRs come in two flavors: 1) domain-specific IRs designed only for a specific application area; or 2) generic high-level IRs that can be used to generate high-performance code across many domains. Developing generic IRs is more challenging but offers the advantage of reusing a common compiler infrastructure across various applications. In this paper, we extend a generic high-level IR to enable efficient computation with sparse data structures. Crucially, we encode sparse representation using reusable dense building blocks already present in the high-level IR. We use a form of dependent types to model sparse matrices in CSR format by expressing the relationship between multiple dense arrays explicitly separately storing the length of rows, the column indices, and the non-zero values of the matrix. We achieve high-performance compared to sparse low-level library code using our extended generic high-level code generator. On an Nvidia GPU, we outperform the highly tuned Nvidia cuSparse implementation of spmv multiplication across 28 sparse matrices of varying sparsity on average by 1.7×
    corecore