Search CORE

9 research outputs found

The run-time system for parallel programs using Kahn process networks on multi-core machines—a flexible alternative to MapReduce

Author
Publication venue: Springer
Publication date
Field of study

The Nornir run-time system for parallel programs using Kahn process networks on multi-core machines – A flexible alternative to MapReduce

Author: Beskow Paul
Espeland Håvard
Griwodz Carsten
Halvorsen Pål
Johansen Dag
Vrba Zeljko
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Even though shared-memory concurrency is a paradigm frequently used for developing parallel applications on small- and middle-sized machines, experience has shown that it is hard to use. This is largely caused by synchronization primitives which are low-level, inherently non-deterministic, and, consequently, non-intuitive to use. In this paper, we present the Nornir run-time system. Nornir is comparable to well-known frameworks such as MapReduce and Dryad that are recognized for their efficiency and simplicity. Unlike these frameworks, Nornir also supports process structures containing branches and cycles. Nornir is based on the formalism of Kahn process networks, which is a shared-nothing, message-passing model of concurrency. We deem this model a simple and deterministic alternative to shared-memory concurrency. Experiments with real and synthetic benchmarks on up to 8 CPUs show that performance in most cases scales almost linearly with the number of CPUs, when not limited by data dependencies. We also show that the modeling flexibility allows Nornir to outperform its MapReduce counterparts using well-known benchmarks. This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited

Springer - Publisher Connector

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

The Nornir run-time system for parallel programs using Kahn process networks on multi-core machines—a flexible alternative to MapReduce

Author: A Olson
B Gedik
B He
C Olston
C Ranger
Carsten Griwodz
Dag Johansen
DE Knuth
E Kock de
EA Lee
G Allen
H Chih Yang
Håvard Espeland
J Armstrong
J Dean
J Giacomoni
M Geilen
M Isard
M Thompson
MI Gordon
NS Arora
P Hudak
PA Buhr
Paul Beskow
Pål Halvorsen
R Chaiken
R Lämmel
R Pike
SV Valvåg
SV Valvåg
U Catalyurek
Ž Vrba
Ž Vrba
Ž Vrba
Ž Vrba
Željko Vrba
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Dynamic adaptation and distribution of binaries to heterogeneous architectures

Author: Kristiansen Espen Angell
Publication venue
Publication date: 01/01/2011
Field of study

Real time multimedia workloads require progressingly more processing power. Modern many-core architectures provide enough processing power to satisfy the requirements of many real time multimedia workloads. When even they are un- able to satisfy processing power requirements, network-distribution can provide many workloads with even more computing power. In this thesis, we present solutions that can be used to make it practical to use the processing power that networks of many-core architectures can provide. The research focus on solutions that can be included in our Parallel Processing Graphs (P2G) project. We have developed the foundation for network distribution in P2G, and we have suggested a viable solution for execution of workloads on heterogeneous multi- core architectures

NORA - Norwegian Open Research Archives

StreamDrive: A Dynamic Dataflow Framework for Clustered Embedded Architectures

Author: arthur stoutchinin
luca benini
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

In this paper, we present StreamDrive, a dynamic dataflow framework for programming clustered embedded multicore architectures. StreamDrive simplifies development of dynamic dataflow applications starting from sequential reference C code and allows seamless handling of heterogeneous and applicationspecific processing elements by applications. We address issues of ecient implementation of the dynamic dataflow runtime system in the context of constrained embedded environments, which have not been sufficiently addressed by previous research. We conducted a detailed performance evaluation of the StreamDrive implementation on our Application Specic MultiProcessor (ASMP) cluster using the Oriented FAST and Rotated BRIEF (ORB) algorithm typical of image processing domain.We have used the proposed incremental development flow for the transformation of the ORB original reference C code into an optimized dynamic dataflow implementation. Our implementation has less than 10% parallelization overhead, near-linear speedup when the number of processors increases from 1 to 8, and achieves the performance of 15 VGA frames per second with a small cluster configuration of 4 processing elements and 64KB of shared memory, and of 30 VGA frames per second with 8 processors and 128KB of shared memory

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

A Dataflow Framework For Developing Flexible Embedded Accelerators A Computer Vision Case Study.

Author: Stoutchinin Arthur <1967>
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 08/04/2019
Field of study

The focus of this dissertation is the design and the implementation of a computing platform which can accelerate data processing in the embedded computation domain. We focus on a heterogeneous computing platform, whose hardware implementation can approach the power and area efficiency of specialized designs, while remaining flexible across the application domain. The multi-core architectures require parallel programming, which is widely-regarded as more challenging than sequential programming. Although shared memory parallel programs may be fairly easy to write (using OpenMP, for example), they are quite hard to optimize; providing embedded application developers with optimizing tools and programming frameworks is a challenge. The heterogeneous specialized elements make the problem even more difficult. Dataflow is a parallel computation model that relies exclusively on message passing, and that has some advantages over parallel programming tools in wide use today: simplicity, graphical representation, and determinism. Dataflow model is also a good match to streaming applications, such as audio, video and image processing, which operate on large sequences of data and are characterized by abundant parallelism and regular memory access patterns. Dataflow model of computation has gained acceptance in simulation and signal-processing communities. This thesis evaluates the applicability of the dataflow model for implementing domain-specific embedded accelerators for streaming applications

AMS Tesi di Dottorato

The Nornir run-time system for parallel programs using Kahn process networks on multi-core machines-a flexible alternative to MapReduce

Author: Beskow Paul
Espeland Håvard
Griwodz Carsten
Halvorsen Pål
Johansen Dag
Vrba Zeljko
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Munin - Open Research Archive

NORA - Norwegian Open Research Archives