32,337 research outputs found
Group Communication Patterns for High Performance Computing in Scala
We developed a Functional object-oriented Parallel framework (FooPar) for
high-level high-performance computing in Scala. Central to this framework are
Distributed Memory Parallel Data structures (DPDs), i.e., collections of data
distributed in a shared nothing system together with parallel operations on
these data. In this paper, we first present FooPar's architecture and the idea
of DPDs and group communications. Then, we show how DPDs can be implemented
elegantly and efficiently in Scala based on the Traversable/Builder pattern,
unifying Functional and Object-Oriented Programming. We prove the correctness
and safety of one communication algorithm and show how specification testing
(via ScalaCheck) can be used to bridge the gap between proof and
implementation. Furthermore, we show that the group communication operations of
FooPar outperform those of the MPJ Express open source MPI-bindings for Java,
both asymptotically and empirically. FooPar has already been shown to be
capable of achieving close-to-optimal performance for dense matrix-matrix
multiplication via JNI. In this article, we present results on a parallel
implementation of the Floyd-Warshall algorithm in FooPar, achieving more than
94 % efficiency compared to the serial version on a cluster using 100 cores for
matrices of dimension 38000 x 38000
Expression Templates Revisited: A Performance Analysis of the Current ET Methodology
In the last decade, Expression Templates (ET) have gained a reputation as an
efficient performance optimization tool for C++ codes. This reputation builds
on several ET-based linear algebra frameworks focused on combining both elegant
and high-performance C++ code. However, on closer examination the assumption
that ETs are a performance optimization technique cannot be maintained. In this
paper we demonstrate and explain the inability of current ET-based frameworks
to deliver high performance for dense and sparse linear algebra operations, and
introduce a new "smart" ET implementation that truly allows the combination of
high performance code with the elegance and maintainability of a
domain-specific language.Comment: 16 pages, 7 figure
Sympiler: Transforming Sparse Matrix Codes by Decoupling Symbolic Analysis
Sympiler is a domain-specific code generator that optimizes sparse matrix
computations by decoupling the symbolic analysis phase from the numerical
manipulation stage in sparse codes. The computation patterns in sparse
numerical methods are guided by the input sparsity structure and the sparse
algorithm itself. In many real-world simulations, the sparsity pattern changes
little or not at all. Sympiler takes advantage of these properties to
symbolically analyze sparse codes at compile-time and to apply inspector-guided
transformations that enable applying low-level transformations to sparse codes.
As a result, the Sympiler-generated code outperforms highly-optimized matrix
factorization codes from commonly-used specialized libraries, obtaining average
speedups over Eigen and CHOLMOD of 3.8X and 1.5X respectively.Comment: 12 page
Modular Construction of Shape-Numeric Analyzers
The aim of static analysis is to infer invariants about programs that are
precise enough to establish semantic properties, such as the absence of
run-time errors. Broadly speaking, there are two major branches of static
analysis for imperative programs. Pointer and shape analyses focus on inferring
properties of pointers, dynamically-allocated memory, and recursive data
structures, while numeric analyses seek to derive invariants on numeric values.
Although simultaneous inference of shape-numeric invariants is often needed,
this case is especially challenging and is not particularly well explored.
Notably, simultaneous shape-numeric inference raises complex issues in the
design of the static analyzer itself.
In this paper, we study the construction of such shape-numeric, static
analyzers. We set up an abstract interpretation framework that allows us to
reason about simultaneous shape-numeric properties by combining shape and
numeric abstractions into a modular, expressive abstract domain. Such a modular
structure is highly desirable to make its formalization and implementation
easier to do and get correct. To achieve this, we choose a concrete semantics
that can be abstracted step-by-step, while preserving a high level of
expressiveness. The structure of abstract operations (i.e., transfer, join, and
comparison) follows the structure of this semantics. The advantage of this
construction is to divide the analyzer in modules and functors that implement
abstractions of distinct features.Comment: In Proceedings Festschrift for Dave Schmidt, arXiv:1309.455
- …