4,598 research outputs found
Object-oriented implementations of the MPDATA advection equation solver in C++, Python and Fortran
Three object-oriented implementations of a prototype solver of the advection
equation are introduced. The presented programs are based on Blitz++ (C++),
NumPy (Python), and Fortran's built-in array containers. The solvers include an
implementation of the Multidimensional Positive-Definite Advective Transport
Algorithm (MPDATA). The introduced codes exemplify how the application of
object-oriented programming (OOP) techniques allows to reproduce the
mathematical notation used in the literature within the program code. A
discussion on the tradeoffs of the programming language choice is presented.
The main angles of comparison are code brevity and syntax clarity (and hence
maintainability and auditability) as well as performance. In the case of
Python, a significant performance gain is observed when switching from the
standard interpreter (CPython) to the PyPy implementation of Python. Entire
source code of all three implementations is embedded in the text and is
licensed under the terms of the GNU GPL license
Contract-Based General-Purpose GPU Programming
Using GPUs as general-purpose processors has revolutionized parallel
computing by offering, for a large and growing set of algorithms, massive
data-parallelization on desktop machines. An obstacle to widespread adoption,
however, is the difficulty of programming them and the low-level control of the
hardware required to achieve good performance. This paper suggests a
programming library, SafeGPU, that aims at striking a balance between
programmer productivity and performance, by making GPU data-parallel operations
accessible from within a classical object-oriented programming language. The
solution is integrated with the design-by-contract approach, which increases
confidence in functional program correctness by embedding executable program
specifications into the program text. We show that our library leads to modular
and maintainable code that is accessible to GPGPU non-experts, while providing
performance that is comparable with hand-written CUDA code. Furthermore,
runtime contract checking turns out to be feasible, as the contracts can be
executed on the GPU
Reconstructing Rational Functions with
We present the open-source library for the
reconstruction of multivariate rational functions over finite fields. We
discuss the involved algorithms and their implementation. As an application, we
use in the context of integration-by-parts reductions and
compare runtime and memory consumption to a fully algebraic approach with the
program .Comment: 46 pages, 3 figures, 6 tables; v2: matches published versio
AUTOMATING DATA-LAYOUT DECISIONS IN DOMAIN-SPECIFIC LANGUAGES
A long-standing challenge in High-Performance Computing (HPC) is the simultaneous achievement of programmer productivity and hardware computational efficiency. The challenge has been exacerbated by the onset of multi- and many-core CPUs and accelerators. Only a few expert programmers have been able to hand-code domain-specific data transformations and vectorization schemes needed to extract the best possible performance on such architectures. In this research, we examined the possibility of automating these methods by developing a Domain-Specific Language (DSL) framework. Our DSL approach extends C++14 by embedding into it a high-level data-parallel array language, and by using a domain-specific compiler to compile to hybrid-parallel code. We also implemented an array index-space transformation algebra within this high-level array language to manipulate array data-layouts and data-distributions. The compiler introduces a novel method for SIMD auto-vectorization based on array data-layouts. Our new auto-vectorization technique is shown to outperform the default auto-vectorization strategy by up to 40% for stencil computations. The compiler also automates distributed data movement with overlapping of local compute with remote data movement using polyhedral integer set analysis. Along with these main innovations, we developed a new technique using C++ template metaprogramming for developing embedded DSLs using C++. We also proposed a domain-specific compiler intermediate representation that simplifies data flow analysis of abstract DSL constructs. We evaluated our framework by constructing a DSL for the HPC grand-challenge domain of lattice quantum chromodynamics. Our DSL yielded performance gains of up to twice the flop rate over existing production C code for selected kernels. This gain in performance was obtained while using less than one-tenth the lines of code. The performance of this DSL was also competitive with the best hand-optimized and hand-vectorized code, and is an order of magnitude better than existing production DSLs.Doctor of Philosoph
Building Efficient Query Engines in a High-Level Language
Abstraction without regret refers to the vision of using high-level
programming languages for systems development without experiencing a negative
impact on performance. A database system designed according to this vision
offers both increased productivity and high performance, instead of sacrificing
the former for the latter as is the case with existing, monolithic
implementations that are hard to maintain and extend. In this article, we
realize this vision in the domain of analytical query processing. We present
LegoBase, a query engine written in the high-level language Scala. The key
technique to regain efficiency is to apply generative programming: LegoBase
performs source-to-source compilation and optimizes the entire query engine by
converting the high-level Scala code to specialized, low-level C code. We show
how generative programming allows to easily implement a wide spectrum of
optimizations, such as introducing data partitioning or switching from a row to
a column data layout, which are difficult to achieve with existing low-level
query compilers that handle only queries. We demonstrate that sufficiently
powerful abstractions are essential for dealing with the complexity of the
optimization effort, shielding developers from compiler internals and
decoupling individual optimizations from each other. We evaluate our approach
with the TPC-H benchmark and show that: (a) With all optimizations enabled,
LegoBase significantly outperforms a commercial database and an existing query
compiler. (b) Programmers need to provide just a few hundred lines of
high-level code for implementing the optimizations, instead of complicated
low-level code that is required by existing query compilation approaches. (c)
The compilation overhead is low compared to the overall execution time, thus
making our approach usable in practice for compiling query engines
Mapping-based SPARQL access to a MongoDB database
Accessing legacy data as virtual RDF stores is a key issue in the building of the Web of Data. In recent years, the MongoDB database has become a leader in the NoSQL market and the management of very large datasets, making it a significant potential contributor to the Web of Linked Data. Therefore, in this paper we address the research question of how to access arbitrary MongoDB documents with SPARQL.We propose a two-step method to (i) translate a SPARQL query into a pivot abstract query under MongoDB-to-RDF mappings represented in the xR2RML language, then (ii) translate the pivot query into a concrete MongoDB query. We elaborate on the discrepancy between the expressiveness of SPARQL and the MongoDB query language, and we show that we can always come up with a rewriting that shall produce all certain answers
Declassification: transforming java programs to remove intermediate classes
Computer applications are increasingly being written in object-oriented languages like Java and C++ Object-onented programming encourages the use of small methods and classes. However, this style of programming introduces much overhead as each method call results in a dynamic dispatch and each field access becomes a pointer dereference to the heap allocated object. Many of the classes in these programs are included to provide structure rather than to act as reusable code, and can therefore be regarded as intermediate. We have therefore developed an optimisation technique, called declassification, which will transform Java programs into equivalent programs from which these intermediate classes have been removed.
The optimisation technique developed involves two phases, analysis and transformation. The analysis involves the identification of intermediate classes for removal. A suitable class is defined to be a class which is used exactly once within a program. Such classes are identified by this analysis The subsequent transformation involves eliminating these intermediate classes from the program. This involves inlinmg the fields and methods of each intermediate class within the enclosing class which uses it.
In theory, declassification reduces the number of classes which are instantiated and used in a program during its execution. This should reduce the overhead of object creation and maintenance as child objects are no longer created, and it should also reduce the number of field accesses and dynamic dispatches required by a program to execute. An important feature of the declassification technique, as opposed to other similar techniques, is that it guarantees there will be no increase in code size. An empirical study was conducted on a number of reasonable-sized Java programs and it was found that very few suitable classes were identified for miming. The results showed that the declassification technique had a small influence on the memory consumption and a negligible influence on the run-time performance of these programs. It is therefore concluded that the declassification technique was not successful in optimizing the test programs but further extensions to this technique combined with an intrinsically object-onented set of test programs could greatly improve its success
HERMIT: Mechanized Reasoning during Compilation in the Glasgow Haskell Compiler
It is difficult to write programs which are both correct and fast. A promising approach, functional programming, is based on the idea of using pure, mathematical functions to construct programs. With effort, it is possible to establish a connection between a specification written in a functional language, which has been proven correct, and a fast implementation, via program transformation. When practiced in the functional programming community, this style of reasoning is still typically performed by hand, by either modifying the source code or using pen-and-paper. Unfortunately, performing such semi-formal reasoning by directly modifying the source code often obfuscates the program, and pen-and-paper reasoning becomes outdated as the program changes over time. Even so, this semi-formal reasoning prevails because formal reasoning is time-consuming, and requires considerable expertise. Formal reasoning tools often only work for a subset of the target language, or require programs to be implemented in a custom language for reasoning. This dissertation investigates a solution, called HERMIT, which mechanizes reasoning during compilation. HERMIT can be used to prove properties about programs written in the Haskell functional programming language, or transform them to improve their performance. Reasoning in HERMIT proceeds in a style familiar to practitioners of pen-and-paper reasoning, and mechanization allows these techniques to be applied to real-world programs with greater confidence. HERMIT can also re-check recorded reasoning steps on subsequent compilations, enforcing a connection with the program as the program is developed. HERMIT is the first system capable of directly reasoning about the full Haskell language. The design and implementation of HERMIT, motivated both by typical reasoning tasks and HERMIT's place in the Haskell ecosystem, is presented in detail. Three case studies investigate HERMIT's capability to reason in practice. These case studies demonstrate that semi-formal reasoning with HERMIT lowers the barrier to writing programs which are both correct and fast
- …