54,895 research outputs found
A Generic Parallel Pattern Interface for Stream and Data Processing
Current parallel programming frameworks aid developers to a great extent in implementing applications that exploit parallel hardware resources. Nevertheless, developers require additional expertise to properly use and tune them to operate efficiently on specific parallel platforms. On the other hand, porting applications between different parallel programming models and platforms is not straightforward and demands considerable efforts and specific knowledge. Apart from that, the lack of high-level parallel pattern abstractions, in those frameworks, further increases the complexity in developing parallel applications. To pave the way in this direction, this paper proposes GRPPI, a generic and reusable parallel pattern interface for both stream processing and data-intensive C++ applications. GRPPI accommodates a layer between developers and existing parallel programming frameworks targeting multi-core processors, such as C++ threads, OpenMP and Intel TBB, and accelerators, as CUDA Thrust. Furthermore, thanks to its high-level C++ application programming interface and pattern composability features, GRPPI allows users to easily expose parallelism via standalone patterns or patterns compositions matching in sequential applications. We evaluate this interface using an image processing use case and demonstrate its benefits from the usability, flexibility, and performance points of view. Furthermore, we analyze the impact of using stream and data pattern compositions on CPUs, GPUs and heterogeneous configurations.This work has been partially supported by the EU project ICT 644235 “REPHRASE: REfactoring Parallel Heterogeneous Resource-aware Applications” and the Spanish “Ministerio de EconomĂa y Competitividad” under the grant TIN2016-79673-P “Towards Unification of HPC and Big Data Paradigms.
Exploiting stream parallelism of MRI reconstruction using GrPPI over multiple back-ends
Proceeding of: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), Larnaca, Cyprus, 14-17 May 2019In recent years, on-line processing of data streams has been established as a major computing paradigm. This is due mainly to two reasons: first, more and more data are generated in near real-time that need to be processed; the second reason is given by the need of efficient parallel applications. However, the above-mentioned areas expose a tough challenge over traditional data-analysis techniques, which have been forced to evolve to a stream perspective. In this work we present an comparative study of a stream-aware multi-staged application, which has been implemented using GrPPI, a generic and reusable parallel pattern interface for C++ applications. We demonstrate the benefits of using this interface in terms of programability, performance, and scalability.This work was supported by the EU project “ASPIDE: Exascale Programing Models for Extreme Data Processing” under grant 80109
Paving the way towards high-level parallel pattern interfaces for data stream processing
The emergence of the Internet of Things (IoT) data stream applications has posed a number of new challenges to existing infrastructures, processing engines, and programming models. In this sense, high-level interfaces, encapsulating algorithmic aspects in pattern-based constructions, have considerably reduced the development and parallelization efforts of this type of applications. An example of parallel pattern interface is GrPPI, a C++ generic high-level library that acts as a layer between developers and existing parallel programming frameworks, such as C++ threads, OpenMP and Intel TBB. In this paper, we complement the basic patterns supported by GrPPI with the new stream operators Split-Join and Window, and the advanced parallel patterns Stream-Pool, Windowed-Farm and Stream-Iterator for the aforementioned back ends. Thanks to these new stream operators, complex compositions among streaming patterns can be expressed. On the other hand, the collection of advanced patterns allows users to tackle some domain-specific applications, ranging from the evolutionary to the real-time computing areas, where compositions of basic patterns are not capable of fully mimicking the algorithmic behavior of their original sequential codes. The experimental evaluation of the new advanced patterns and the stream operators on a set of domain-specific use-cases, using different back ends and pattern-specific parameters, reports considerable performance gains with respect to the sequential versions. Additionally, we demonstrate the benefits of the GrPPI pattern interface from the usability, flexibility and readability points of view.This work was partially supported by the EU project ICT 644235 “RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applications” and the project TIN2013-41350-P “Scalable Data Management Techniques for High-End Computing Systems” from the Ministerio de EconomĂa y Competitividad, Spai
Engineering Crowdsourced Stream Processing Systems
A crowdsourced stream processing system (CSP) is a system that incorporates
crowdsourced tasks in the processing of a data stream. This can be seen as
enabling crowdsourcing work to be applied on a sample of large-scale data at
high speed, or equivalently, enabling stream processing to employ human
intelligence. It also leads to a substantial expansion of the capabilities of
data processing systems. Engineering a CSP system requires the combination of
human and machine computation elements. From a general systems theory
perspective, this means taking into account inherited as well as emerging
properties from both these elements. In this paper, we position CSP systems
within a broader taxonomy, outline a series of design principles and evaluation
metrics, present an extensible framework for their design, and describe several
design patterns. We showcase the capabilities of CSP systems by performing a
case study that applies our proposed framework to the design and analysis of a
real system (AIDR) that classifies social media messages during time-critical
crisis events. Results show that compared to a pure stream processing system,
AIDR can achieve a higher data classification accuracy, while compared to a
pure crowdsourcing solution, the system makes better use of human workers by
requiring much less manual work effort
Stream Fusion, to Completeness
Stream processing is mainstream (again): Widely-used stream libraries are now
available for virtually all modern OO and functional languages, from Java to C#
to Scala to OCaml to Haskell. Yet expressivity and performance are still
lacking. For instance, the popular, well-optimized Java 8 streams do not
support the zip operator and are still an order of magnitude slower than
hand-written loops. We present the first approach that represents the full
generality of stream processing and eliminates overheads, via the use of
staging. It is based on an unusually rich semantic model of stream interaction.
We support any combination of zipping, nesting (or flat-mapping), sub-ranging,
filtering, mapping-of finite or infinite streams. Our model captures
idiosyncrasies that a programmer uses in optimizing stream pipelines, such as
rate differences and the choice of a "for" vs. "while" loops. Our approach
delivers hand-written-like code, but automatically. It explicitly avoids the
reliance on black-box optimizers and sufficiently-smart compilers, offering
highest, guaranteed and portable performance. Our approach relies on high-level
concepts that are then readily mapped into an implementation. Accordingly, we
have two distinct implementations: an OCaml stream library, staged via
MetaOCaml, and a Scala library for the JVM, staged via LMS. In both cases, we
derive libraries richer and simultaneously many tens of times faster than past
work. We greatly exceed in performance the standard stream libraries available
in Java, Scala and OCaml, including the well-optimized Java 8 streams
Taking advantage of hybrid systems for sparse direct solvers via task-based runtimes
The ongoing hardware evolution exhibits an escalation in the number, as well
as in the heterogeneity, of computing resources. The pressure to maintain
reasonable levels of performance and portability forces application developers
to leave the traditional programming paradigms and explore alternative
solutions. PaStiX is a parallel sparse direct solver, based on a dynamic
scheduler for modern hierarchical manycore architectures. In this paper, we
study the benefits and limits of replacing the highly specialized internal
scheduler of the PaStiX solver with two generic runtime systems: PaRSEC and
StarPU. The tasks graph of the factorization step is made available to the two
runtimes, providing them the opportunity to process and optimize its traversal
in order to maximize the algorithm efficiency for the targeted hardware
platform. A comparative study of the performance of the PaStiX solver on top of
its native internal scheduler, PaRSEC, and StarPU frameworks, on different
execution environments, is performed. The analysis highlights that these
generic task-based runtimes achieve comparable results to the
application-optimized embedded scheduler on homogeneous platforms. Furthermore,
they are able to significantly speed up the solver on heterogeneous
environments by taking advantage of the accelerators while hiding the
complexity of their efficient manipulation from the programmer.Comment: Heterogeneity in Computing Workshop (2014
- …