Search CORE

2,245 research outputs found

Managing Communication Latency-Hiding at Runtime for Parallel Programming Languages and Libraries

Author: Kristensen Mads Ruben Burgdorff
Vinter Brian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

This work introduces a runtime model for managing communication with support for latency-hiding. The model enables non-computer science researchers to exploit communication latency-hiding techniques seamlessly. For compiled languages, it is often possible to create efficient schedules for communication, but this is not the case for interpreted languages. By maintaining data dependencies between scheduled operations, it is possible to aggressively initiate communication and lazily evaluate tasks to allow maximal time for the communication to finish before entering a wait state. We implement a heuristic of this model in DistNumPy, an auto-parallelizing version of numerical Python that allows sequential NumPy programs to run on distributed memory architectures. Furthermore, we present performance comparisons for eight benchmarks with and without automatic latency-hiding. The results shows that our model reduces the time spent on waiting for communication as much as 27 times, from a maximum of 54% to only 2% of the total execution time, in a stencil application.Comment: PREPRIN

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Flush communication channels: Effective implementation and verification

Author: Camp Tracy Kay
Publication venue: W&M ScholarWorks
Publication date: 01/01/1993
Field of study

Flush communication channels, or F-channels, generalize more conventional asynchronous communication paradigms. A distributed system which uses an F-channel allows a programmer to define the delivery order of each message in relation to other messages transmitted on the channel. Unreliable datagrams and FIFO (first-in-first-out) communication channels have strictly defined delivery semantics. No restrictions are allowed on message delivery order with unreliable datagrams--message delivery is completely unordered. FIFO channels, on the other hand, insist messages are delivered in the order of their transmission. Flush channels can provide either of these delivery order semantics; in addition, F-channels allow the user to define the delivery of a message to be after the delivery of all messages previously transmitted or before the delivery of all messages subsequently transmitted or both. A system which communicates with a flush channel has a message delivery order that is a partial order.;Dynamically specifying a partial message delivery order complicates many aspects of how we implement and reason about the communication channel. From the system\u27s perspective, we develop a feasible implementation protocol and prove its correctness. The protocol effectively handles the partially ordered message delivery. From the user\u27s perspective, we derive an axiomatic verification methodology for flush applications. The added flexibility of defining the delivery order dynamically slightly increases the complexity for the application programmer. Our verification work helps the user effectively deal with the partially ordered message delivery in flush communication

College of William & Mary: W&M Publish

Flush communication channels: Effective implementation and verification

Author: Horton John
Kraftl Peter
Publication venue: W&M ScholarWorks
Publication date: 01/01/1993
Field of study

University of Northampton's Research Explorer

College of William & Mary: W&M Publish

Doctor of Philosophy

Author: Tew Kevin B.
Publication venue: University of Utah
Publication date: 01/08/2013
Field of study

dissertationPlaces and distributed places bring new support for message-passing parallelism to Racket. This dissertation describes the programming model and how Racket's sequential runtime-system was modified to support places and distributed places. The freedom to design the places programming model helped make the implementation tractable; specifically, the conventional pain of adding just the right amount of locking to a big, legacy runtime system was avoided. The dissertation presents an evaluation of the places design that includes both real-world applications and standard parallel benchmarks. Distributed places are introduced as a language extension of the places design and architecture. The distributed places extension augments places with the features of remote process launch, remote place invocation, and distributed message passing. Distributed places provide a foundation for constructing higher-level distributed frameworks. Example implementations of RPC, MPI, map reduce, and nested data parallelism demonstrate the extensibility of the distributed places API

The University of Utah: J. Willard Marriott Digital Library

A Message-Passing, Thread-Migrating Operating System for a Non-Cache-Coherent Many-Core Architecture

Author: Ziwisky Michael W.
Publication venue: e-Publications@Marquette
Publication date: 01/07/2012
Field of study

The difference between emerging many-core architectures and their multi-core predecessors goes beyond just the number of cores incorporated on a chip. Current technologies for maintaining cache coherency are not scalable beyond a few dozen cores, and a lack of coherency presents a new paradigm for software developers to work with. While shared memory multithreading has been a viable and popular programming technique for multi-cores, the distributed nature of many-cores is more amenable to a model of share-nothing, message-passing threads. This model places different demands on a many-core operating system, and this thesis aims to understand and accommodate those demands. We introduce Xipx, a port of the lightweight Embedded Xinu operating system to the many-core Intel Single-chip Cloud Computer (SCC). The SCC is a 48-core x86 architecture that lacks cache coherency. It features a fast mesh network-on-chip (NoC) and on-die message passing buffers to facilitate message-passing communications between cores. Running as a separate instance per core, Xipx takes advantage of this hardware in its implementation of a message-passing device. The device multiplexes the message passing hardware, thereby allowing multiple concurrent threads to share the hardware without interfering with each other. Xipx also features a limited framework for transparent thread migration. This achievement required fundamental modifications to the kernel, including incorporation of a new type of thread. Additionally, a minimalistic framework for bare-metal development on the SCC has been produced as a pragmatic offshoot of the work on Xipx. This thesis discusses the design and implementation of the many-core extensions described above. While Xipx serves as a foundation for continued research on many-core operating systems, test results show good performance from both message passing and thread migration suggesting that, as it stands, Xipx is an effective platform for exploration of many-core development at the application level as well

epublications@Marquette

Distributed dispatchers for partially clairvoyant schedulers

Author: Yellajyosula Kiran S.
Publication venue: The Research Repository @ WVU
Publication date: 01/12/2003
Field of study

This work focuses on the empirical evaluation of distributed dispatching strategies on shared and distributed memory architectures for hard real-time systems. The dispatching model accommodates process parameter variability and analyzes the effect of variable execution times.;Hard real-time systems are modeled in the E-T-C scheduling framework and dispatched if a valid schedule exists. We examine the dispatchability of Partially Clairvoyant schedules of different sizes and varying deadlines under reasonable assumptions. The effect of scaling up the number of processors used by the dispatcher is also studied. The results validate the superiority of the distributed strategies over sequential dispatching and scalability of the distributed strategies. Certain system limitations which lead to Loss of Dispatchability in the experiments were pointed out.;The model finds applications in diverse areas like safety critical systems, robotics and machine control, real-time data management, and this approach is targeted at powering up the controllers

The Research Repository @ WVU (West Virginia University)

On mapping distributed S-Net to the 48-core Intel SCC processor

Author: Bakker R.
Grelck C.
Jesshope C.R.
van Tol M.W.
Verstraaten M.
Publication venue: KIT Scientific Publishing
Publication date: 01/01/2011
Field of study

International Migration, Integration and Social Cohesion online publications

On mapping distributed S-Net to the 48-core Intel SCC processor

Author: Bakker R.
Grelck C.
Jesshope C.R.
van Tol M.W.
Verstraaten M.
Publication venue: KIT Scientific Publishing
Publication date: 01/01/2011
Field of study

International Migration, Integration and Social Cohesion online publications