1,464 research outputs found
Recommended from our members
Applying an abstract data structure description approach to parallelizing scientific pointer programs
Even though impressive progress has been made in the area of parallelizing scientific programs with arrays, the application of similar techniques to programs with pointer data structures has remained difficult. Unlike arrays which have a small number of well-defined properties that can be utilized by a parallelizing compiler, pointer data structures are used to implement a wide variety of structures that exhibit a much more diverse set of properties. The complexity and diversity of such properties means that, in general, scientific programs with pointer data structures cannot be effectively analyzed by an optimizing and parallelizing compiler.In order to provide a system in which the compiler can fully utilize the properties of different types of pointer data structures, we have developed a mechanism for the Abstract Description of Data Structures (ADDS). With our approach, the programmer can explicitly describe important properties such as dimensionality of the pointer data structure, independence of dimensions, and direction of traversal. These abstract descriptions of pointer data structures are then used by the compiler to guide analysis, optimization, and parallelization.In this paper we summarize the ADDS approach through the use of numerous examples of data structures used in scientific computations, we illustrate how such declarations are natural and non-tedious to specify, and we show how the ADDS declarations can be used to improve compile-time analysis. In order to demonstrate the viability of our approach, we show how such techniques can be used to parallelize an important class of scientific codes which naturally use recursive pointer data structures. In particular, we use our approach to develop the parallelization of an N-body simulation that is based on a relatively complicated pointer data structure, and we report the speedup results for a Sequent multiprocessor
Interactive Simplifier Tracing and Debugging in Isabelle
The Isabelle proof assistant comes equipped with a very powerful tactic for
term simplification. While tremendously useful, the results of simplifying a
term do not always match the user's expectation: sometimes, the resulting term
is not in the form the user expected, or the simplifier fails to apply a rule.
We describe a new, interactive tracing facility which offers insight into the
hierarchical structure of the simplification with user-defined filtering,
memoization and search. The new simplifier trace is integrated into the
Isabelle/jEdit Prover IDE.Comment: Conferences on Intelligent Computer Mathematics, 201
A Comparison of Big Data Frameworks on a Layered Dataflow Model
In the world of Big Data analytics, there is a series of tools aiming at
simplifying programming applications to be executed on clusters. Although each
tool claims to provide better programming, data and execution models, for which
only informal (and often confusing) semantics is generally provided, all share
a common underlying model, namely, the Dataflow model. The Dataflow model we
propose shows how various tools share the same expressiveness at different
levels of abstraction. The contribution of this work is twofold: first, we show
that the proposed model is (at least) as general as existing batch and
streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to
understand high-level data-processing applications written in such frameworks.
Second, we provide a layered model that can represent tools and applications
following the Dataflow paradigm and we show how the analyzed tools fit in each
level.Comment: 19 pages, 6 figures, 2 tables, In Proc. of the 9th Intl Symposium on
High-Level Parallel Programming and Applications (HLPP), July 4-5 2016,
Muenster, German
Auto-Parallelizing Large Models with Rhino: A Systematic Approach on Production AI Platform
We present Rhino, a system for accelerating tensor programs with automatic
parallelization on AI platform for real production environment. It transforms a
tensor program written for a single device into an equivalent distributed
program that is capable of scaling up to thousands of devices with no user
configuration. Rhino firstly works on a semantically independent intermediate
representation of tensor programs, which facilitates its generalization to
unprecedented applications. Additionally, it implements a task-oriented
controller and a distributed runtime for optimal performance. Rhino explores on
a complete and systematic parallelization strategy space that comprises all the
paradigms commonly employed in deep learning (DL), in addition to strided
partitioning and pipeline parallelism on non-linear models. Aiming to
efficiently search for a near-optimal parallel execution plan, our analysis of
production clusters reveals general heuristics to speed up the strategy search.
On top of it, two optimization levels are designed to offer users flexible
trade-offs between the search time and strategy quality. Our experiments
demonstrate that Rhino can not only re-discover the expert-crafted strategies
of classic, research and production DL models, but also identify novel
parallelization strategies which surpass existing systems for novel models
- …