Search CORE

8 research outputs found

Parallelizing Julia with a Non-Invasive DSL

Author: Anderson Todd A.
Kuper Lindsey
Liu Hai
Shpeisman Tatiana
Totoni Ehsan
Vitek Jan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st European Conference on Object-Oriented Programming (ECOOP 2017)
Publication date: 01/01/2017
Field of study

Computational scientists often prototype software using productivity languages that offer high-level programming abstractions. When higher performance is needed, they are obliged to rewrite their code in a lower-level efficiency language. Different solutions have been proposed to address this trade-off between productivity and efficiency. One promising approach is to create embedded domain-specific languages that sacrifice generality for productivity and performance, but practical experience with DSLs points to some road blocks preventing widespread adoption. This paper proposes a non-invasive domain-specific language that makes as few visible changes to the host programming model as possible. We present ParallelAccelerator, a library and compiler for high-level, high-performance scientific computing in Julia. ParallelAccelerator\u27s programming model is aligned with existing Julia programming idioms. Our compiler exposes the implicit parallelism in high-level array-style programs and compiles them to fast, parallel native code. Programs can also run in "library-only" mode, letting users benefit from the full Julia environment and libraries. Our results show encouraging performance improvements with very few changes to source code required. In particular, few to no additional type annotations are necessary

Dagstuhl Research Online Publication Server

Parallelizing Julia with a Non-Invasive DSL (Artifact)

Author: Anderson Todd A.
Kuper Lindsey
Liu Hai
Shpeisman Tatiana
Totoni Ehsan
Vitek Jan
Publication venue: DARTS - Dagstuhl Artifacts Series. DARTS, Volume 3, Issue 2
Publication date: 01/01/2017
Field of study

This artifact is based on ParallelAccelerator, an embedded domain-specific language (DSL) and compiler for speeding up compute-intensive Julia programs. In particular, Julia code that makes heavy use of aggregate array operations is a good candidate for speeding up with ParallelAccelerator. ParallelAccelerator is a non-invasive DSL that makes as few changes to the host programming model as possible

Dagstuhl Research Online Publication Server

PARALLELIZING TIME-SERIES SESSION DATA ANALYSIS WITH A TYPE-ERASURE BASED DSEL

Author: Ando Ruo
Publication venue: 'Revista Mexicana de Biodiversidad'
Publication date: 25/12/2020
Field of study

The Science Information Network (SINET) is a Japanese academic backbone network.  SINET consists of more than 800 universities and research institutions.  In the operation of a huge academic backbone network, more flexible querying technology is required to cope with massive time series session data and analysis of sophisticated cyber-attacks. This paper proposes a parallelizing DSEL (Domain Specific Embedded Language) processing for huge time-series session data. In our DESL, the function object is implemented by type erasure for constructing internal DSL for processing time-series data. Type erasure enables our parser to store function pointer and function object into the same *void type with class templates. We apply to scatter/gather pattern for concurrent DSEL parsing. Each thread parses DSEL to extract the tuple timestamp, source IP, and destination IP in the gather phase. In the scattering phase, we use a concurrent hash map to handle multiple thread outputs with our DSEL. In the experiment, we have measured the elapsed time in parsing and inserting IPv4 address and timestamp data format ranging from 1,000 to 50,000 lines with 24-row items. We have also measured CPU idle time in processing 100,000,000 lines of session data with 5, 10 and 20 multiple threads. It has been turned out that the proposed method can work in feasible computing time in both cases

International Journal of Advanced Computer Technology

Improving Scientist Productivity, Architecture Portability, and Performance in ParFlow

Author: Burke Michael
Publication venue: 'IUScholarWorks'
Publication date: 01/05/2020
Field of study

Legacy scientific applications represent significant investments by universities, engineers, and researchers and contain valuable implementations of key scientific computations. Over time hardware architectures have changed. Adapting existing code to new architectures is time consuming, expensive, and increases code complexity. The increase in complexity negatively affects the scientific impact of the applications. There is an immediate need to reduce complexity. We propose using abstractions to manage and reduce code complexity, improving scientific impact of applications. This thesis presents a set of abstractions targeting boundary conditions in iterative solvers. Many scientific applications represent physical phenomena as a set of partial differential equations (PDEs). PDEs are structured around steady state and boundary condition equations, starting from initial conditions. The proposed abstractions separate architecture specific implementation details from the primary computation. We use ParFlow to demonstrate the effectiveness of the abstractions. ParFlow is a hydrologic and geoscience application that simulates surface and subsurface water flow. The abstractions have enabled ParFlow developers to successfully add new boundary conditions for the first time in 15 years, and have enabled an experimental OpenMP version of ParFlow that is transparent to computational scientists. This is achieved without requiring expensive rewrites of key computations or major codebase changes; improving developer productivity, enabling hardware portability, and allowing transparent performance optimizations

Boise State University - ScholarWorks

Recommended from our members

Write once, rewrite everywhere: A Unified Framework for Factorized Machine Learning

Author: Justo David Antonio
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

This thesis describes TRINITY, a framework to optimize linear algebra algorithms operat- ing over relational data in GraalVM. The framework implements a host-language-agnostic version of the optimizations introduced by the Morpheus project, meaning that a single implementation of the Morpheus rewrite rules can be used to optimize linear algebra algorithms written in arbitrary GraalVM languages. We evaluate its performance when hosted within FastR and GraalPython, GraalVM’s R and Python implementations respectively. In doing so, we also show that TRINITY can optimize across languages, meaning that it can execute and optimize an algorithm written in one language, such as Python, while using data originating from another language, such as R

eScholarship - University of California

Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations

Author: Koparkar Chaitanya
Kulkarni Milind
Newton Ryan R.
Rainey Mike
Vollmer Michael
Publication venue
Publication date: 01/07/2021
Field of study

Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a maximally dense format consequently serializes the processing of that data, yielding a tension between density and parallelism. This paper shows that a disciplined, practical compromise is possible. We present Parallel Gibbon, a compiler that obtains the benefits of dense data formats and parallelism. We formalize the semantics of the parallel location calculus underpinning this novel implementation strategy, and show that it is type-safe. Parallel Gibbon exceeds the parallel performance of existing compilers for purely functional programs that use recursive algebraic datatypes, including, notably, abstract-syntax-tree traversals as in compilers

arXiv.org e-Print Archive

Kent Academic Repository

A Type System for Julia

Author: Chung Benjamin
Publication venue
Publication date: 25/10/2023
Field of study

The Julia programming language was designed to fill the needs of scientific computing by combining the benefits of productivity and performance languages. Julia allows users to write untyped scripts easily without needing to worry about many implementation details, as do other productivity languages. If one just wants to get the work done-regardless of how efficient or general the program might be, such a paradigm is ideal. Simultaneously, Julia also allows library developers to write efficient generic code that can run as fast as implementations in performance languages such as C or Fortran. This combination of user-facing ease and library developer-facing performance has proven quite attractive, and the language has increasing adoption. With adoption comes combinatorial challenges to correctness. Multiple dispatch -- Julia's key mechanism for abstraction -- allows many libraries to compose "out of the box." However, it creates bugs where one library's requirements do not match what another provides. Typing could address this at the cost of Julia's flexibility for scripting. I developed a "best of both worlds" solution: gradual typing for Julia. My system forms the core of a gradual type system for Julia, laying the foundation for improving the correctness of Julia programs while not getting in the way of script writers. My framework allows methods to be individually typed or untyped, allowing users to write untyped code that interacts with typed library code and vice versa. Typed methods then get a soundness guarantee that is robust in the presence of both dynamically typed code and dynamically generated definitions. I additionally describe protocols, a mechanism for typing abstraction over concrete implementation that accommodates one common pattern in Julia libraries, and describe its implementation into my typed Julia framework.Comment: PhD thesi

arXiv.org e-Print Archive

Code generation of array constructs for distributed memory systems

Author: Pothukuchi Sweta Yamini
Publication venue
Publication date: 01/12/2020
Field of study

Programming for high-performance systems to fully utilize the potential of the computing system is a complex problem. This is particularly evident when programming distributed memory clusters containing multiple NUMA chips and GPUs on each node since it would require a complex combination of MPI, OpenMP, CUDA, OpenCL, etc to achieve high performance even for sequentially simplistic codes. Programs requiring high performance are usually painstakingly written by hand in C/C++ or Fortran using MPI+X to target these machines. This work presents a multi-layer code generation framework Vaani that takes a very high-level representation of computations and generates C+MPI code by transforming the input through a series of intermediate representations. The very high-level nature of the language greatly facilitates programming parallel systems. Additionally, the use of multiple representations provide a flexible and transparent venue for the user to interact and customize the transformation process to generate code suitable to the user and the target machine. Experimental evaluation shows that the current implementation of Vaani generates code that is competitive with handwritten codes and hand-optimized libraries

Illinois Digital Environment for Access to Learning and Scholarship Repository