4,972 research outputs found
Process-Oriented Collective Operations
Distributing process-oriented programs across a cluster of machines requires careful attention to the effects of network latency. The MPI standard, widely used for cluster computation, defines a number of collective operations: efficient, reusable algorithms for performing operations among a group of machines in the cluster. In this paper, we describe our techniques for implementing MPI communication patterns in process-oriented languages, and how we have used them to implement collective operations in PyCSP and occam-pi on top of an asynchronous messaging framework. We show how to make use of collective operations in distributed processoriented applications. We also show how the process-oriented model can be used to increase concurrency in existing collective operation algorithms
Design and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming
Loosely coupled programming is a powerful paradigm for rapidly creating
higher-level applications from scientific programs on petascale systems,
typically using scripting languages. This paradigm is a form of many-task
computing (MTC) which focuses on the passing of data between programs as
ordinary files rather than messages. While it has the significant benefits of
decoupling producer and consumer and allowing existing application programs to
be executed in parallel with no recoding, its typical implementation using
shared file systems places a high performance burden on the overall system and
on the user who will analyze and consume the downstream data. Previous efforts
have achieved great speedups with loosely coupled programs, but have done so
with careful manual tuning of all shared file system access. In this work, we
evaluate a prototype collective IO model for file-based MTC. The model enables
efficient and easy distribution of input data files to computing nodes and
gathering of output results from them. It eliminates the need for such manual
tuning and makes the programming of large-scale clusters using a loosely
coupled model easier. Our approach, inspired by in-memory approaches to
collective operations for parallel programming, builds on fast local file
systems to provide high-speed local file caches for parallel scripts, uses a
broadcast approach to handle distribution of common input data, and uses
efficient scatter/gather and caching techniques for input and output. We
describe the design of the prototype model, its implementation on the Blue
Gene/P supercomputer, and present preliminary measurements of its performance
on synthetic benchmarks and on a large-scale molecular dynamics application.Comment: IEEE Many-Task Computing on Grids and Supercomputers (MTAGS08) 200
CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance
In order to efficiently use the future generations of supercomputers, fault
tolerance and power consumption are two of the prime challenges anticipated by
the High Performance Computing (HPC) community. Checkpoint/Restart (CR) has
been and still is the most widely used technique to deal with hard failures.
Application-level CR is the most effective CR technique in terms of overhead
efficiency but it takes a lot of implementation effort. This work presents the
implementation of our C++ based library CRAFT (Checkpoint-Restart and Automatic
Fault Tolerance), which serves two purposes. First, it provides an extendable
library that significantly eases the implementation of application-level
checkpointing. The most basic and frequently used checkpoint data types are
already part of CRAFT and can be directly used out of the box. The library can
be easily extended to add more data types. As means of overhead reduction, the
library offers a build-in asynchronous checkpointing mechanism and also
supports the Scalable Checkpoint/Restart (SCR) library for node level
checkpointing. Second, CRAFT provides an easier interface for User-Level
Failure Mitigation (ULFM) based dynamic process recovery, which significantly
reduces the complexity and effort of failure detection and communication
recovery mechanism. By utilizing both functionalities together, applications
can write application-level checkpoints and recover dynamically from process
failures with very limited programming effort. This work presents the design
and use of our library in detail. The associated overheads are thoroughly
analyzed using several benchmarks
Building real-time embedded applications on QduinoMC: a web-connected 3D printer case study
Single Board Computers (SBCs) are now emerging
with multiple cores, ADCs, GPIOs, PWM channels, integrated
graphics, and several serial bus interfaces. The low power
consumption, small form factor and I/O interface capabilities of
SBCs with sensors and actuators makes them ideal in embedded
and real-time applications. However, most SBCs run non-realtime
operating systems based on Linux and Windows, and do
not provide a user-friendly API for application development. This
paper presents QduinoMC, a multicore extension to the popular
Arduino programming environment, which runs on the Quest
real-time operating system. QduinoMC is an extension of our earlier
single-core, real-time, multithreaded Qduino API. We show
the utility of QduinoMC by applying it to a specific application: a
web-connected 3D printer. This differs from existing 3D printers,
which run relatively simple firmware and lack operating system
support to spool multiple jobs, or interoperate with other devices
(e.g., in a print farm). We show how QduinoMC empowers devices with the capabilities to run new services without impacting their timing guarantees. While it is possible to modify existing operating systems to provide suitable timing guarantees, the effort to do so is cumbersome and does not provide the ease of programming afforded by QduinoMC.http://www.cs.bu.edu/fac/richwest/papers/rtas_2017.pdfAccepted manuscrip
- …