19 research outputs found
Building-Blocks for Performance Oriented DSLs
Domain-specific languages raise the level of abstraction in software
development. While it is evident that programmers can more easily reason about
very high-level programs, the same holds for compilers only if the compiler has
an accurate model of the application domain and the underlying target platform.
Since mapping high-level, general-purpose languages to modern, heterogeneous
hardware is becoming increasingly difficult, DSLs are an attractive way to
capitalize on improved hardware performance, precisely by making the compiler
reason on a higher level. Implementing efficient DSL compilers is a daunting
task however, and support for building performance-oriented DSLs is urgently
needed. To this end, we present the Delite Framework, an extensible toolkit
that drastically simplifies building embedded DSLs and compiling DSL programs
for execution on heterogeneous hardware. We discuss several building blocks in
some detail and present experimental results for the OptiML machine-learning
DSL implemented on top of Delite.Comment: In Proceedings DSL 2011, arXiv:1109.032
Forge: Generating a High Performance DSL Implementation from a Declarative Specification
Domain-specific languages provide a promising path to automatically compile high-level code to parallel, heterogeneous, and distributed hardware. However, in practice high performance DSLs still require considerable software expertise to develop and force users into tool-chains that hinder prototyping and debugging. To address these problems, we present Forge, a new meta DSL for declaratively specifying high performance embedded DSLs. Forge provides DSL authors with high-level abstractions (e.g., data structures, parallel patterns, effects) for specifying their DSL in a way that permits high performance. From this high-level specification, Forge automatically generates both a naive Scala library implementation of the DSL and a high performance version using the Delite DSL framework. Users of a Forge-generated DSL can prototype their application using the library version, and then switch to the Delite version to run on multicore CPUs, GPUs, and clusters without changing the application code. Forge-generated Delite DSLs perform within 2x of hand-optimized C++ and up to 40x better than Spark, an alternative high-level distributed programming environment. Compared to a manually implemented Delite DSL, Forge provides a factor of 3-6x reduction in lines of code and does not sacrifice any performance. Furthermore, Forge specifications can be generated from existing Scala libraries, are easy to maintain, shield DSL developers from changes in the Delite framework, and enable DSLs to be retargeted to other frameworks transparently
Delite: A Compiler Architecture for Performance-Oriented Embedded Domain-Specific Languages
Developing high-performance software is a difficult task that requires the use of low-level, architecture-specific programming models (e.g., OpenMP for CMPs, CUDA for GPUs, MPI for clusters). It is typically not possible to write a single application that can run efficiently in different environments, leading to multiple versions and increased complexity. Domain-Specific Languages (DSLs) are a promising avenue to enable programmers to use high-level abstractions and still achieve good performance on a variety of hardware. This is possible because DSLs have higher-level semantics and restrictions than general-purpose languages, so DSL compilers can perform higher-level optimization and translation. However, the cost of developing performance-oriented DSLs is a substantial roadblock to their development and adoption. In this article, we present an overview of the Delite compiler framework and the DSLs that have been developed with it. Delite simplifies the process of DSL development by providing common components, like parallel patterns, optimizations, and code generators, that can be reused in DSL implementations. Delite DSLs are embedded in Scala, a general-purpose programming language, but use metaprogramming to construct an Intermediate Representation (IR) of user programs and compile to multiple languages (including C++, CUDA, and OpenCL). DSL programs are automatically parallelized and different parts of the application can run simultaneously on CPUs and GPUs. We present Delite DSLs for machine learning, data querying, graph analysis, and scientific computing and show that they all achieve performance competitive to or exceeding C++ code