12 research outputs found

    Parallelizing Julia with a Non-Invasive DSL

    Get PDF
    Computational scientists often prototype software using productivity languages that offer high-level programming abstractions. When higher performance is needed, they are obliged to rewrite their code in a lower-level efficiency language. Different solutions have been proposed to address this trade-off between productivity and efficiency. One promising approach is to create embedded domain-specific languages that sacrifice generality for productivity and performance, but practical experience with DSLs points to some road blocks preventing widespread adoption. This paper proposes a non-invasive domain-specific language that makes as few visible changes to the host programming model as possible. We present ParallelAccelerator, a library and compiler for high-level, high-performance scientific computing in Julia. ParallelAccelerator\u27s programming model is aligned with existing Julia programming idioms. Our compiler exposes the implicit parallelism in high-level array-style programs and compiles them to fast, parallel native code. Programs can also run in "library-only" mode, letting users benefit from the full Julia environment and libraries. Our results show encouraging performance improvements with very few changes to source code required. In particular, few to no additional type annotations are necessary

    Parallelizing Julia with a Non-Invasive DSL (Artifact)

    Get PDF
    This artifact is based on ParallelAccelerator, an embedded domain-specific language (DSL) and compiler for speeding up compute-intensive Julia programs. In particular, Julia code that makes heavy use of aggregate array operations is a good candidate for speeding up with ParallelAccelerator. ParallelAccelerator is a non-invasive DSL that makes as few changes to the host programming model as possible

    PARALLELIZING TIME-SERIES SESSION DATA ANALYSIS WITH A TYPE-ERASURE BASED DSEL

    Get PDF
    The Science Information Network (SINET) is a Japanese academic backbone network.  SINET consists of more than 800 universities and research institutions.  In the operation of a huge academic backbone network, more flexible querying technology is required to cope with massive time series session data and analysis of sophisticated cyber-attacks. This paper proposes a parallelizing DSEL (Domain Specific Embedded Language) processing for huge time-series session data. In our DESL, the function object is implemented by type erasure for constructing internal DSL for processing time-series data. Type erasure enables our parser to store function pointer and function object into the same *void type with class templates. We apply to scatter/gather pattern for concurrent DSEL parsing. Each thread parses DSEL to extract the tuple timestamp, source IP, and destination IP in the gather phase. In the scattering phase, we use a concurrent hash map to handle multiple thread outputs with our DSEL. In the experiment, we have measured the elapsed time in parsing and inserting IPv4 address and timestamp data format ranging from 1,000 to 50,000 lines with 24-row items. We have also measured CPU idle time in processing 100,000,000 lines of session data with 5, 10 and 20 multiple threads. It has been turned out that the proposed method can work in feasible computing time in both cases

    Improving Scientist Productivity, Architecture Portability, and Performance in ParFlow

    Get PDF
    Legacy scientific applications represent significant investments by universities, engineers, and researchers and contain valuable implementations of key scientific computations. Over time hardware architectures have changed. Adapting existing code to new architectures is time consuming, expensive, and increases code complexity. The increase in complexity negatively affects the scientific impact of the applications. There is an immediate need to reduce complexity. We propose using abstractions to manage and reduce code complexity, improving scientific impact of applications. This thesis presents a set of abstractions targeting boundary conditions in iterative solvers. Many scientific applications represent physical phenomena as a set of partial differential equations (PDEs). PDEs are structured around steady state and boundary condition equations, starting from initial conditions. The proposed abstractions separate architecture specific implementation details from the primary computation. We use ParFlow to demonstrate the effectiveness of the abstractions. ParFlow is a hydrologic and geoscience application that simulates surface and subsurface water flow. The abstractions have enabled ParFlow developers to successfully add new boundary conditions for the first time in 15 years, and have enabled an experimental OpenMP version of ParFlow that is transparent to computational scientists. This is achieved without requiring expensive rewrites of key computations or major codebase changes; improving developer productivity, enabling hardware portability, and allowing transparent performance optimizations

    A Type System for Julia

    Full text link
    The Julia programming language was designed to fill the needs of scientific computing by combining the benefits of productivity and performance languages. Julia allows users to write untyped scripts easily without needing to worry about many implementation details, as do other productivity languages. If one just wants to get the work done-regardless of how efficient or general the program might be, such a paradigm is ideal. Simultaneously, Julia also allows library developers to write efficient generic code that can run as fast as implementations in performance languages such as C or Fortran. This combination of user-facing ease and library developer-facing performance has proven quite attractive, and the language has increasing adoption. With adoption comes combinatorial challenges to correctness. Multiple dispatch -- Julia's key mechanism for abstraction -- allows many libraries to compose "out of the box." However, it creates bugs where one library's requirements do not match what another provides. Typing could address this at the cost of Julia's flexibility for scripting. I developed a "best of both worlds" solution: gradual typing for Julia. My system forms the core of a gradual type system for Julia, laying the foundation for improving the correctness of Julia programs while not getting in the way of script writers. My framework allows methods to be individually typed or untyped, allowing users to write untyped code that interacts with typed library code and vice versa. Typed methods then get a soundness guarantee that is robust in the presence of both dynamically typed code and dynamically generated definitions. I additionally describe protocols, a mechanism for typing abstraction over concrete implementation that accommodates one common pattern in Julia libraries, and describe its implementation into my typed Julia framework.Comment: PhD thesi

    Code generation of array constructs for distributed memory systems

    Get PDF
    Programming for high-performance systems to fully utilize the potential of the computing system is a complex problem. This is particularly evident when programming distributed memory clusters containing multiple NUMA chips and GPUs on each node since it would require a complex combination of MPI, OpenMP, CUDA, OpenCL, etc to achieve high performance even for sequentially simplistic codes. Programs requiring high performance are usually painstakingly written by hand in C/C++ or Fortran using MPI+X to target these machines. This work presents a multi-layer code generation framework Vaani that takes a very high-level representation of computations and generates C+MPI code by transforming the input through a series of intermediate representations. The very high-level nature of the language greatly facilitates programming parallel systems. Additionally, the use of multiple representations provide a flexible and transparent venue for the user to interact and customize the transformation process to generate code suitable to the user and the target machine. Experimental evaluation shows that the current implementation of Vaani generates code that is competitive with handwritten codes and hand-optimized libraries

    Efficient Tree-Traversals: Reconciling Parallelism and Dense Data Representations

    Get PDF
    Recent work showed that compiling functional programs to use dense, serialized memory representations for recursive algebraic datatypes can yield significant constant-factor speedups for sequential programs. But serializing data in a maximally dense format consequently serializes the processing of that data, yielding a tension between density and parallelism. This paper shows that a disciplined, practical compromise is possible. We present Parallel Gibbon, a compiler that obtains the benefits of dense data formats and parallelism. We formalize the semantics of the parallel location calculus underpinning this novel implementation strategy, and show that it is type-safe. Parallel Gibbon exceeds the parallel performance of existing compilers for purely functional programs that use recursive algebraic datatypes, including, notably, abstract-syntax-tree traversals as in compilers

    How To Touch a Running System

    Get PDF
    The increasing importance of distributed and decentralized software architectures entails more and more attention for adaptive software. Obtaining adaptiveness, however, is a difficult task as the software design needs to foresee and cope with a variety of situations. Using reconfiguration of components facilitates this task, as the adaptivity is conducted on an architecture level instead of directly in the code. This results in a separation of concerns; the appropriate reconfiguration can be devised on a coarse level, while the implementation of the components can remain largely unaware of reconfiguration scenarios. We study reconfiguration in component frameworks based on formal theory. We first discuss programming with components, exemplified with the development of the cmc model checker. This highly efficient model checker is made of C++ components and serves as an example for component-based software development practice in general, and also provides insights into the principles of adaptivity. However, the component model focuses on high performance and is not geared towards using the structuring principle of components for controlled reconfiguration. We thus complement this highly optimized model by a message passing-based component model which takes reconfigurability to be its central principle. Supporting reconfiguration in a framework is about alleviating the programmer from caring about the peculiarities as much as possible. We utilize the formal description of the component model to provide an algorithm for reconfiguration that retains as much flexibility as possible, while avoiding most problems that arise due to concurrency. This algorithm is embedded in a general four-stage adaptivity model inspired by physical control loops. The reconfiguration is devised to work with stateful components, retaining their data and unprocessed messages. Reconfiguration plans, which are provided with a formal semantics, form the input of the reconfiguration algorithm. We show that the algorithm achieves perceived atomicity of the reconfiguration process for an important class of plans, i.e., the whole process of reconfiguration is perceived as one atomic step, while minimizing the use of blocking of components. We illustrate the applicability of our approach to reconfiguration by providing several examples like fault-tolerance and automated resource control

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
    corecore