8 research outputs found
Parallelizing Julia with a Non-Invasive DSL
Computational scientists often prototype software using productivity
languages that offer high-level programming abstractions. When higher
performance is needed, they are obliged to rewrite their code in a
lower-level efficiency language. Different solutions have been
proposed to address this trade-off between productivity and
efficiency. One promising approach is to create embedded
domain-specific languages that sacrifice generality for productivity
and performance, but practical experience with DSLs points to some
road blocks preventing widespread adoption. This paper proposes a
non-invasive domain-specific language that makes as few visible
changes to the host programming model as possible. We present ParallelAccelerator,
a library and compiler for high-level, high-performance scientific
computing in Julia. ParallelAccelerator\u27s programming model is aligned with existing
Julia programming idioms. Our compiler exposes the implicit
parallelism in high-level array-style programs and compiles them to
fast, parallel native code. Programs can also run in "library-only"
mode, letting users benefit from the full Julia environment and
libraries. Our results show encouraging performance improvements with very few changes to source code required. In particular, few to no additional type annotations are necessary
Parallelizing Julia with a Non-Invasive DSL (Artifact)
This artifact is based on ParallelAccelerator, an embedded domain-specific language (DSL) and compiler for speeding up compute-intensive Julia programs. In particular, Julia code that makes heavy use of aggregate array operations is a good candidate for speeding up with ParallelAccelerator. ParallelAccelerator is a non-invasive DSL that makes as few changes to the host programming model as possible
PARALLELIZING TIME-SERIES SESSION DATA ANALYSIS WITH A TYPE-ERASURE BASED DSEL
The Science Information Network (SINET) is a Japanese academic backbone network. SINET consists of more than 800 universities and research institutions. In the operation of a huge academic backbone network, more flexible querying technology is required to cope with massive time series session data and analysis of sophisticated cyber-attacks. This paper proposes a parallelizing DSEL (Domain Specific Embedded Language) processing for huge time-series session data. In our DESL, the function object is implemented by type erasure for constructing internal DSL for processing time-series data. Type erasure enables our parser to store function pointer and function object into the same *void type with class templates. We apply to scatter/gather pattern for concurrent DSEL parsing. Each thread parses DSEL to extract the tuple timestamp, source IP, and destination IP in the gather phase. In the scattering phase, we use a concurrent hash map to handle multiple thread outputs with our DSEL.
In the experiment, we have measured the elapsed time in parsing and inserting IPv4 address and timestamp data format ranging from 1,000 to 50,000 lines with 24-row items. We have also measured CPU idle time in processing 100,000,000 lines of session data with 5, 10 and 20 multiple threads. It has been turned out that the proposed method can work in feasible computing time in both cases
Improving Scientist Productivity, Architecture Portability, and Performance in ParFlow
Legacy scientific applications represent significant investments by universities, engineers, and researchers and contain valuable implementations of key scientific computations. Over time hardware architectures have changed. Adapting existing code to new architectures is time consuming, expensive, and increases code complexity. The increase in complexity negatively affects the scientific impact of the applications. There is an immediate need to reduce complexity. We propose using abstractions to manage and reduce code complexity, improving scientific impact of applications.
This thesis presents a set of abstractions targeting boundary conditions in iterative solvers. Many scientific applications represent physical phenomena as a set of partial differential equations (PDEs). PDEs are structured around steady state and boundary condition equations, starting from initial conditions.
The proposed abstractions separate architecture specific implementation details from the primary computation. We use ParFlow to demonstrate the effectiveness of the abstractions. ParFlow is a hydrologic and geoscience application that simulates surface and subsurface water flow. The abstractions have enabled ParFlow developers to successfully add new boundary conditions for the first time in 15 years, and have enabled an experimental OpenMP version of ParFlow that is transparent to computational scientists. This is achieved without requiring expensive rewrites of key computations or major codebase changes; improving developer productivity, enabling hardware portability, and allowing transparent performance optimizations
Recommended from our members
Write once, rewrite everywhere: A Unified Framework for Factorized Machine Learning
This thesis describes TRINITY, a framework to optimize linear algebra algorithms operat- ing over relational data in GraalVM. The framework implements a host-language-agnostic version of the optimizations introduced by the Morpheus project, meaning that a single implementation of the Morpheus rewrite rules can be used to optimize linear algebra algorithms written in arbitrary GraalVM languages. We evaluate its performance when hosted within FastR and GraalPython, GraalVM’s R and Python implementations respectively. In doing so, we also show that TRINITY can optimize across languages, meaning that it can execute and optimize an algorithm written in one language, such as Python, while using data originating from another language, such as R
Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations
Recent work showed that compiling functional programs to use dense,
serialized memory representations for recursive algebraic datatypes can yield
significant constant-factor speedups for sequential programs. But serializing
data in a maximally dense format consequently serializes the processing of that
data, yielding a tension between density and parallelism. This paper shows that
a disciplined, practical compromise is possible. We present Parallel Gibbon, a
compiler that obtains the benefits of dense data formats and parallelism. We
formalize the semantics of the parallel location calculus underpinning this
novel implementation strategy, and show that it is type-safe. Parallel Gibbon
exceeds the parallel performance of existing compilers for purely functional
programs that use recursive algebraic datatypes, including, notably,
abstract-syntax-tree traversals as in compilers
A Type System for Julia
The Julia programming language was designed to fill the needs of scientific
computing by combining the benefits of productivity and performance languages.
Julia allows users to write untyped scripts easily without needing to worry
about many implementation details, as do other productivity languages. If one
just wants to get the work done-regardless of how efficient or general the
program might be, such a paradigm is ideal. Simultaneously, Julia also allows
library developers to write efficient generic code that can run as fast as
implementations in performance languages such as C or Fortran. This combination
of user-facing ease and library developer-facing performance has proven quite
attractive, and the language has increasing adoption.
With adoption comes combinatorial challenges to correctness. Multiple
dispatch -- Julia's key mechanism for abstraction -- allows many libraries to
compose "out of the box." However, it creates bugs where one library's
requirements do not match what another provides. Typing could address this at
the cost of Julia's flexibility for scripting.
I developed a "best of both worlds" solution: gradual typing for Julia. My
system forms the core of a gradual type system for Julia, laying the foundation
for improving the correctness of Julia programs while not getting in the way of
script writers. My framework allows methods to be individually typed or
untyped, allowing users to write untyped code that interacts with typed library
code and vice versa. Typed methods then get a soundness guarantee that is
robust in the presence of both dynamically typed code and dynamically generated
definitions. I additionally describe protocols, a mechanism for typing
abstraction over concrete implementation that accommodates one common pattern
in Julia libraries, and describe its implementation into my typed Julia
framework.Comment: PhD thesi
Code generation of array constructs for distributed memory systems
Programming for high-performance systems to fully utilize the potential of the computing system is a complex problem. This is particularly evident when programming distributed memory clusters containing multiple NUMA chips and GPUs on each node since it would require a complex combination of MPI, OpenMP, CUDA, OpenCL, etc to achieve high performance even for sequentially simplistic codes. Programs requiring high performance are usually painstakingly written by hand in C/C++ or Fortran using MPI+X to target these machines.
This work presents a multi-layer code generation framework Vaani that takes a very high-level representation of computations and generates C+MPI code by transforming the input through a series of intermediate representations. The very high-level nature of the language greatly facilitates programming parallel systems. Additionally, the use of multiple representations provide a flexible and transparent venue for the user to interact and customize the transformation process to generate code suitable to the user and the target machine. Experimental evaluation shows that the current implementation of Vaani generates code that is competitive with handwritten codes and hand-optimized libraries