4,320 research outputs found
Revisiting Matrix Product on Master-Worker Platforms
This paper is aimed at designing efficient parallel matrix-product algorithms
for heterogeneous master-worker platforms. While matrix-product is
well-understood for homogeneous 2D-arrays of processors (e.g., Cannon algorithm
and ScaLAPACK outer product algorithm), there are three key hypotheses that
render our work original and innovative:
- Centralized data. We assume that all matrix files originate from, and must
be returned to, the master.
- Heterogeneous star-shaped platforms. We target fully heterogeneous
platforms, where computational resources have different computing powers.
- Limited memory. Because we investigate the parallelization of large
problems, we cannot assume that full matrix panels can be stored in the worker
memories and re-used for subsequent updates (as in ScaLAPACK).
We have devised efficient algorithms for resource selection (deciding which
workers to enroll) and communication ordering (both for input and result
messages), and we report a set of numerical experiments on various platforms at
Ecole Normale Superieure de Lyon and the University of Tennessee. However, we
point out that in this first version of the report, experiments are limited to
homogeneous platforms
From Dataflow Specification to Multiprocessor Partitioned Time-triggered Real-time Implementation *
International audienceOur objective is to facilitate the development of complex time-triggered systems by automating the allocation and scheduling steps. We show that full automation is possible while taking into account the elements of complexity needed by a complex embedded control system. More precisely, we consider deterministic functional specifications provided (as often in an industrial setting) by means of synchronous data-flow models with multiple modes and multiple relative periods. We first extend this functional model with an original real-time characterization that takes advantage of our time-triggered framework to provide a simpler representation of complex end-to-end flow requirements. We also extend our specifications with additional non-functional properties specifying partitioning, allocation , and preemptability constraints. Then, weprovide novel algorithms for the off-line scheduling of these extended specifications onto partitioned time-triggered architectures à la ARINC 653. The main originality of our work is that it takes into account at the same time multiple complexity elements: various types of non-functional properties (real-time, partitioning, allocation, preemptability) and functional specifications with conditional execution and multiple modes. Allocation of time slots/windows to partitions can be fullyor partially provided, or synthesized by our tool. Our algorithms allow the automatic allocation and scheduling onto multi-processor (distributed) sys-tems with a global time base, taking into account communication costs. We demonstrate our technique on a model of space flight software systemwith strong real-time determinism requirements
DeSyRe: on-Demand System Reliability
The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
Parallel symbolic state-space exploration is difficult, but what is the alternative?
State-space exploration is an essential step in many modeling and analysis
problems. Its goal is to find the states reachable from the initial state of a
discrete-state model described. The state space can used to answer important
questions, e.g., "Is there a dead state?" and "Can N become negative?", or as a
starting point for sophisticated investigations expressed in temporal logic.
Unfortunately, the state space is often so large that ordinary explicit data
structures and sequential algorithms cannot cope, prompting the exploration of
(1) parallel approaches using multiple processors, from simple workstation
networks to shared-memory supercomputers, to satisfy large memory and runtime
requirements and (2) symbolic approaches using decision diagrams to encode the
large structured sets and relations manipulated during state-space generation.
Both approaches have merits and limitations. Parallel explicit state-space
generation is challenging, but almost linear speedup can be achieved; however,
the analysis is ultimately limited by the memory and processors available.
Symbolic methods are a heuristic that can efficiently encode many, but not all,
functions over a structured and exponentially large domain; here the pitfalls
are subtler: their performance varies widely depending on the class of decision
diagram chosen, the state variable order, and obscure algorithmic parameters.
As symbolic approaches are often much more efficient than explicit ones for
many practical models, we argue for the need to parallelize symbolic
state-space generation algorithms, so that we can realize the advantage of both
approaches. This is a challenging endeavor, as the most efficient symbolic
algorithm, Saturation, is inherently sequential. We conclude by discussing
challenges, efforts, and promising directions toward this goal
- …