Search CORE

316,841 research outputs found

A domain-specific high-level programming model

Author: Balarin
Bhattacharya
Blumofe
Board
Boulos
Cameron
Chandra
Grotker
Hiram
Houzet
Kirk
Lee
Munshi
Pacheco
Parhi
Polukhin
Reinders
Sanders
Skillicorn
Valiant
Publication venue: 'Wiley'
Publication date: 22/09/2015
Field of study

International audienceNowadays, computing hardware continues to move toward more parallelism and more heterogeneity, to obtain more computing power. From personal computers to supercomputers, we can find several levels of parallelism expressed by the interconnections of multi-core and many-core accelerators. On the other hand, computing software needs to adapt to this trend, and programmers can use parallel programming models (PPM) to fulfil this difficult task. There are different PPMs available that are based on tasks, directives, or low level languages or library. These offer higher or lower abstraction levels from the architecture by handling their own syntax. However, to offer an efficient PPM with a greater (additional) high-levelabstraction level while saving on performance, one idea is to restrict this to a specific domain and to adapt it to a family of applications. In the present study, we propose a high-level PPM specific to digital signal processing applications. It is based on data-flow graph models of computation, and a dynamic runtime model of execution (StarPU). We show how the user can easily express this digital signal processing application, and can take advantage of task, data and graph parallelism in the implementation, to enhance the performances of targeted heterogeneous clusters composed of CPUs and different accelerators (e.g., GPU, Xeon Phi

Crossref

Hal - Université Grenoble Alpes

A C++-embedded Domain-Specific Language for programming the MORA soft processor array

Author: Chalamalasetti S.R.
Margala M.
Purohit S.
Vanderbauwhede W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

MORA is a novel platform for high-level FPGA programming of streaming vector and matrix operations, aimed at multimedia applications. It consists of soft array of pipelined low-complexity SIMD processors-in-memory (PIM). We present a Domain-Specific Language (DSL) for high-level programming of the MORA soft processor array. The DSL is embedded in C++, providing designers with a familiar language framework and the ability to compile designs using a standard compiler for functional testing before generating the FPGA bitstream using the MORA toolchain. The paper discusses the MORA-C++ DSL and the compilation route into the assembly for the MORA machine and provides examples to illustrate the programming model and performance

Enlighten

Parallelizing Julia with a Non-Invasive DSL

Author: Anderson Todd A.
Kuper Lindsey
Liu Hai
Shpeisman Tatiana
Totoni Ehsan
Vitek Jan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st European Conference on Object-Oriented Programming (ECOOP 2017)
Publication date: 01/01/2017
Field of study

Computational scientists often prototype software using productivity languages that offer high-level programming abstractions. When higher performance is needed, they are obliged to rewrite their code in a lower-level efficiency language. Different solutions have been proposed to address this trade-off between productivity and efficiency. One promising approach is to create embedded domain-specific languages that sacrifice generality for productivity and performance, but practical experience with DSLs points to some road blocks preventing widespread adoption. This paper proposes a non-invasive domain-specific language that makes as few visible changes to the host programming model as possible. We present ParallelAccelerator, a library and compiler for high-level, high-performance scientific computing in Julia. ParallelAccelerator\u27s programming model is aligned with existing Julia programming idioms. Our compiler exposes the implicit parallelism in high-level array-style programs and compiles them to fast, parallel native code. Programs can also run in "library-only" mode, letting users benefit from the full Julia environment and libraries. Our results show encouraging performance improvements with very few changes to source code required. In particular, few to no additional type annotations are necessary

Dagstuhl Research Online Publication Server

Improving programmability and performance for scientific applications

Author: Liu Chenyang
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

With modern advancements in hardware and software technology scaling towards new limits, our compute machines are reaching new potentials to tackle more challenging problems. While the size and complexity of both the problems and solutions increases, the programming methodologies must remain at a level that can be understood by programmers and scientists alike. In our work, this problem is encountered when developing an optimized framework to best exploit the semantic properties of a finite-element solver. In efforts to address this problem, we explore programming and runtime models which decouple algorithmic complexity, parallelism concerns, and hardware mapping. We build upon these frameworks to exploit domain-specific semantics using high-level transformations and modifications to obtain performance through algorithmic and runtime optimizations. We first discusses optimizations performed on a computational mechanics solver using a novel coupling technique for multi-time scale methods for discrete finite element domains. We exploit domain semantics using a high-level dynamic runtime scheme to reorder and balance workloads to greatly improve runtime performance. The framework presented automatically chooses a near-optimal coupling solution and runs a work-stealing parallel executor to run effectively on multi-core systems. In my latter work, I focus on the parallel programming model, Concurrent Collections (CnC), to seamlessly bridge the gap between performance and programmability. Because challenging problems in various domains, not limited to computation mechanics, requires both domain expertise and programming prowess, there is a need for ways to separate those concerns. This thesis describes methods and techniques to obtain scalable performance using CnC programming while limiting the burden of programming. These high level techniques are presented for two high-performance applications corresponding to hydrodynamics and multi-grid solvers

Purdue E-Pubs

Recommended from our members

Languages and Compilers for Writing Efficient High-Performance Computing Applications

Author: Jangda Abhinav
Publication venue: ScholarWorks@UMass Amherst
Publication date: 26/10/2022
Field of study

Many everyday applications, such as web search, speech recognition, and weather prediction, are executed on high-performance systems containing thousands of Central Processing Units (CPUs) and Graphics Processing Units (GPUs). These applications can be written in either low-level programming languages, such as NVIDIA CUDA, or domain specific languages, like Halide for image processing and PyTorch for machine learning programs. Despite the popularity of these languages, there are several challenges that programmers face when developing efficient high-performance computing applications. First, since every hardware support a different low-level programming model, to utilize new hardware programmers need to rewrite their applications in another programming language. Second, writing efficient code involves restructuring the computation to ensure (i) regular memory access patterns, (ii) non-divergent control flow, and (iii) complete utilization of different programmer managed caches. Furthermore, since these low-level optimizations are known only to hardware experts, it is difficult for a domain expert to write optimized code for new computations. Third, existing domain specific languages suffer from optimization barriers in the language constructs that prevent new optimizations and hence, these languages provide sub-optimal performance. To address these challenges this thesis presents the following novel abstractions and compiler techniques for writing image processing and machine learning applications that can run efficiently on a variety of high-performance systems. First, this thesis presents techniques to optimize image processing programs on GPUs using the features of modern GPUs. These techniques improve the concurrency and register usage of generated code to provide better performance than the state-of-the-art. Second, this thesis presents NextDoor, which is the first system to provide an abstraction for writing graph sampling applications and efficiently executing these applications on GPUs. Third, this thesis presents CoCoNet, which is a domain specific language to co-optimize communication and computation in distributed machine learning workloads. By breaking the optimization barriers in existing domain specific languages, these techniques help programmers write correct and efficient code for diverse high-performance computing workloads

ScholarWorks@UMass Amherst

PiCo: A Domain-Specific Language for Data Analytics Pipelines

Author: Misale Claudia
Publication venue
Publication date: 01/01/2017
Field of study

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institutional Research Information System University of Turin

A Domain-Specific Language and Editor for Parallel Particle Methods

Author: Castrillon Jeronimo
Karol Sven
Nett Tobias
Sbalzarini Ivo F.
Publication venue
Publication date: 17/09/2017
Field of study

Domain-specific languages (DSLs) are of increasing importance in scientific high-performance computing to reduce development costs, raise the level of abstraction and, thus, ease scientific programming. However, designing and implementing DSLs is not an easy task, as it requires knowledge of the application domain and experience in language engineering and compilers. Consequently, many DSLs follow a weak approach using macros or text generators, which lack many of the features that make a DSL a comfortable for programmers. Some of these features---e.g., syntax highlighting, type inference, error reporting, and code completion---are easily provided by language workbenches, which combine language engineering techniques and tools in a common ecosystem. In this paper, we present the Parallel Particle-Mesh Environment (PPME), a DSL and development environment for numerical simulations based on particle methods and hybrid particle-mesh methods. PPME uses the meta programming system (MPS), a projectional language workbench. PPME is the successor of the Parallel Particle-Mesh Language (PPML), a Fortran-based DSL that used conventional implementation strategies. We analyze and compare both languages and demonstrate how the programmer's experience can be improved using static analyses and projectional editing. Furthermore, we present an explicit domain model for particle abstractions and the first formal type system for particle methods.Comment: Submitted to ACM Transactions on Mathematical Software on Dec. 25, 201

arXiv.org e-Print Archive

MPG.PuRe

The software-cycle model for re-engineering and reuse

Author: Bailey John W.
Basili Victor R.
Publication venue
Publication date
Field of study

This paper reports on the progress of a study which will contribute to our ability to perform high-level, component-based programming by describing means to obtain useful components, methods for the configuration and integration of those components, and an underlying economic model of the costs and benefits associated with this approach to reuse. One goal of the study is to develop and demonstrate methods to recover reusable components from domain-specific software through a combination of tools, to perform the identification, extraction, and re-engineering of components, and domain experts, to direct the applications of those tools. A second goal of the study is to enable the reuse of those components by identifying techniques for configuring and recombining the re-engineered software. This component-recovery or software-cycle model addresses not only the selection and re-engineering of components, but also their recombination into new programs. Once a model of reuse activities has been developed, the quantification of the costs and benefits of various reuse options will enable the development of an adaptable economic model of reuse, which is the principal goal of the overall study. This paper reports on the conception of the software-cycle model and on several supporting techniques of software recovery, measurement, and reuse which will lead to the development of the desired economic model

NASA Technical Reports Server

Generating Fast Sparse Matrix Vector Multiplication From a High Level Generic Functional IR

Author: Dubach Christophe
Pizzuti Federico
Steuwer Michel
Publication venue
Publication date: 01/01/2020
Field of study

Usage of high-level intermediate representations promises the generation of fast code from a high-level description, improving the productivity of developers while achieving the performance traditionally only reached with low-level programming approaches. High-level IRs come in two flavors: 1) domain-specific IRs designed only for a specific application area; or 2) generic high-level IRs that can be used to generate high-performance code across many domains. Developing generic IRs is more challenging but offers the advantage of reusing a common compiler infrastructure across various applications. In this paper, we extend a generic high-level IR to enable efficient computation with sparse data structures. Crucially, we encode sparse representation using reusable dense building blocks already present in the high-level IR. We use a form of dependent types to model sparse matrices in CSR format by expressing the relationship between multiple dense arrays explicitly separately storing the length of rows, the column indices, and the non-zero values of the matrix. We achieve high-performance compared to sparse low-level library code using our extended generic high-level code generator. On an Nvidia GPU, we outperform the highly tuned Nvidia cuSparse implementation of spmv multiplication across 28 sparse matrices of varying sparsity on average by 1.7×

Crossref

Edinburgh Research Explorer

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Enlighten