3,135 research outputs found
A Comparison of Big Data Frameworks on a Layered Dataflow Model
In the world of Big Data analytics, there is a series of tools aiming at
simplifying programming applications to be executed on clusters. Although each
tool claims to provide better programming, data and execution models, for which
only informal (and often confusing) semantics is generally provided, all share
a common underlying model, namely, the Dataflow model. The Dataflow model we
propose shows how various tools share the same expressiveness at different
levels of abstraction. The contribution of this work is twofold: first, we show
that the proposed model is (at least) as general as existing batch and
streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to
understand high-level data-processing applications written in such frameworks.
Second, we provide a layered model that can represent tools and applications
following the Dataflow paradigm and we show how the analyzed tools fit in each
level.Comment: 19 pages, 6 figures, 2 tables, In Proc. of the 9th Intl Symposium on
High-Level Parallel Programming and Applications (HLPP), July 4-5 2016,
Muenster, German
Parallel Processing of Large Graphs
More and more large data collections are gathered worldwide in various IT
systems. Many of them possess the networked nature and need to be processed and
analysed as graph structures. Due to their size they require very often usage
of parallel paradigm for efficient computation. Three parallel techniques have
been compared in the paper: MapReduce, its map-side join extension and Bulk
Synchronous Parallel (BSP). They are implemented for two different graph
problems: calculation of single source shortest paths (SSSP) and collective
classification of graph nodes by means of relational influence propagation
(RIP). The methods and algorithms are applied to several network datasets
differing in size and structural profile, originating from three domains:
telecommunication, multimedia and microblog. The results revealed that
iterative graph processing with the BSP implementation always and
significantly, even up to 10 times outperforms MapReduce, especially for
algorithms with many iterations and sparse communication. Also MapReduce
extension based on map-side join usually noticeably presents better efficiency,
although not as much as BSP. Nevertheless, MapReduce still remains the good
alternative for enormous networks, whose data structures do not fit in local
memories.Comment: Preprint submitted to Future Generation Computer System
Design of testbed and emulation tools
The research summarized was concerned with the design of testbed and emulation tools suitable to assist in projecting, with reasonable accuracy, the expected performance of highly concurrent computing systems on large, complete applications. Such testbed and emulation tools are intended for the eventual use of those exploring new concurrent system architectures and organizations, either as users or as designers of such systems. While a range of alternatives was considered, a software based set of hierarchical tools was chosen to provide maximum flexibility, to ease in moving to new computers as technology improves and to take advantage of the inherent reliability and availability of commercially available computing systems
Shape-based cost analysis of skeletal parallel programs
Institute for Computing Systems ArchitectureThis work presents an automatic cost-analysis system for an implicitly parallel skeletal
programming language.
Although deducing interesting dynamic characteristics of parallel programs (and in
particular, run time) is well known to be an intractable problem in the general case, it
can be alleviated by placing restrictions upon the programs which can be expressed.
By combining two research threads, the “skeletal” and “shapely” paradigms which
take this route, we produce a completely automated, computation and communication
sensitive cost analysis system. This builds on earlier work in the area by quantifying
communication as well as computation costs, with the former being derived for the
Bulk Synchronous Parallel (BSP) model.
We present details of our shapely skeletal language and its BSP implementation strategy
together with an account of the analysis mechanism by which program behaviour
information (such as shape and cost) is statically deduced. This information can be
used at compile-time to optimise a BSP implementation and to analyse computation
and communication costs. The analysis has been implemented in Haskell. We consider
different algorithms expressed in our language for some example problems and
illustrate each BSP implementation, contrasting the analysis of their efficiency by traditional,
intuitive methods with that achieved by our cost calculator. The accuracy of
cost predictions by our cost calculator against the run time of real parallel programs is
tested experimentally.
Previous shape-based cost analysis required all elements of a vector (our nestable bulk
data structure) to have the same shape. We partially relax this strict requirement on data
structure regularity by introducing new shape expressions in our analysis framework.
We demonstrate that this allows us to achieve the first automated analysis of a complete
derivation, the well known maximum segment sum algorithm of Skillicorn and Cai
- …