71 research outputs found
Formal derivation of concurrent assignments from scheduled single assignments
Concurrent assignments are commonly used to describe
synchronous parallel computations. We show how a sequence of concurrent
assignments can be formally derived from the schedule of an acyclic single
assignment task graph and a memory allocation. In order to do this we
develop a formal model of memory allocation in synchronous systems. We
use weakest precondition semantics to show that the sequence of concurrent
assignments computes the same values as the scheduled single assignments.
We give a lower bound on the memory requirements of memory allocations for
a given schedule. This bound is tight: we define a class of memory
allocations whose memory requirements always meets the bound. This class
corresponds to conventional register allocation for DAGs and is suitable
when memory access times are uniform. We furthermore define a class of
simple ``shift register'' memory allocation. These allocations have the
advantage of a minimum of explicit storage control and they yield local or
nearest-neighbour accesses in distributed systems whenever the schedule
allows this. Thus, this class of allocations is suitable when designing
parallel special-purpose hardware, like systolic arrays
Computing transitive closure on systolic arrays of fixed size
Forming the transitive closure of a binary relation (or directed graph) is an important part of many algorithms. When the relation is represented by a bit matrix, the transitive closure can be efficiently computed in parallel in a systolic array. Various such arrays for computing the transitive closure have been proposed. They all have in common, though, that the size of the array must be proportional to the number of nodes. Here we propose two ways of computing the transitive closure of an arbitrarily big graph on a systolic array of fixed size. The first method is a simple partitioning of a well-known systolic algorithm for computing the transitive closure. The second is a block-structured algorithm for computing the transitive closure. This algorithm is suitable for execution on a systolic array, that can multiply fixed size bit matrices and compute transitive closure of graphs with a fixed number of nodes. The algorithm is, however, not limited to systolic array implementations; it works on any parallel architecture that can form the transitive closure and product of fixed-size bit matrices efficiently. The shortest path problem, for directed graphs with weighted edges, can also be solved for arbitrarily large graphs on a fixed-size systolic array in the same manner, devised above, as the transitive closure is computed
Data Cache Locking for Higher Program Predictability
ABSTRACT Caches have become increasingly important with the widening gap between main memory and processor speeds. However, they are a source of unpredictability due to their characteristics, resulting in programs behaving in a different way than expected. Cache locking mechanisms adapt caches to the needs of real-time systems. Locking the cache is a solution that trades performance for predictability: at a cost of generally lower performance, the time of accessing the memory becomes predictable. This paper combines compile-time cache analysis with data cache locking to estimate the worst-case memory performance (WCMP) in a safe, tight and fast way. In order to get predictable cache behavior, we first lock the cache for those parts of the code where the static analysis fails. To minimize the performance degradation, our method loads the cache, if necessary, with data likely to be accessed. Experimental results show that this scheme is fully predictable, without compromising the performance of the transformed program. When compared to an algorithm that assumes compulsory misses when the state of the cache is unknown, our approach eliminates all overestimation for the set of benchmarks, giving an exact WCMP of the transformed program without any significant decrease in performance
Timing Analysis of Parallel Software Using Abstract Execution
Abstract. A major trend in computer architecture is multi-core processors. To fully exploit this type of parallel processor chip, programs running on it will have to be parallel as well. This means that even hard real-time embedded systems will be parallel. Therefore, it is of utmost importance that methods to analyze the timing properties of parallel real-time systems are developed. This paper presents an algorithm that is founded on abstract interpretation and derives safe approximations of the execution times of parallel programs. The algorithm is formulated and proven correct for a simple parallel language with parallel threads, shared memory and synchronization via locks
Technical Report: Feedback-Based Generation of Hardware Characteristics
ABSTRACT In large complex server-like computer systems it is difficult to characterise hardware usage in early stages of system development. Many times the applications running on the platform are not ready at the time of platform deployment leading to postponed metrics measurement. In our study we seek answers to the questions: (1) Can we use a feedbackbased control system to create a characteristics model of a real production system? (2) Can such a model be sufficiently accurate to detect characteristics changes instead of executing the production application? The model we have created runs a signalling application, similar to the production application, together with a PIDregulator generating L1 and L2 cache misses to the same extent as the production system. Our measurements indicate that we have managed to mimic a similar environment regarding cache characteristics. Additionally we have applied the model on a software update for a production system and detected characteristics changes using the model. This has later been verified on the complete production system, which in this study is a large scale telecommunication system with a substantial market share
The WCET Tool Challenge 2011
Following the successful WCET Tool Challenges in 2006 and 2008, the third event in this series was organized in 2011, again with support from the ARTIST DESIGN Network of Excellence. Following the practice established in the previous Challenges, the WCET Tool Challenge 2011 (WCC'11) defined two kinds of problems to be solved by the Challenge participants with their tools, WCET problems, which ask for bounds on the execution time, and flow-analysis problems, which ask for bounds on the number of times certain parts of the code can be executed. The benchmarks to be used in WCC'11 were debie1, PapaBench, and an industrial-strength application from the automotive domain provided by Daimler AG. Two default execution platforms were suggested to the participants, the ARM7 as "simple target'' and the MPC5553/5554 as a "complex target,'' but participants were free to use other platforms as well. Ten tools participated in WCC'11: aiT, Astr\'ee, Bound-T, FORTAS, METAMOC, OTAWA, SWEET, TimeWeaver, TuBound and WCA
Unfolding of Programs with Nondeterminism
In [13], some results were proved regarding properties of unfolding of purely functional programs. Especially, a theorem was shown that relates the termination of symbolic evaluation of a "less instantiated" term relative to the termination of a "more instantiated" term. An application is partial evaluation, where unfolding of function definitions is frequently performed to enable further simplifications of the resulting specialized program. The unfolding must then be kept under control to ensure that the partial evaluation terminates. In this paper, we extend the termination result from purely functional programs programs with nondeterministic operations. We give an operational semantics where the behaviour of operators is defined through rewrite rules: nondeterminism then occurs when the resulting term rewriting system is nonconfluent. For the confluent part, the previous termination results carry over. It is, however, not guaranteed in general that the resulting unfolded program has..
Computing in Unpredictable Environments: Semantics, Reduction Strategies, and Program Transformations
We study systems where deterministic computations take place in
environments which may behave nondeterministically. We give a simple
formalization by unions of abstract reduction systems, on which various
semantics can be based in a straightforward manner. We then prove that under
a simple condition on the reduction systems, the following holds: reduction
strategies which are cofinal for the deterministic reduction system will
implement the semantics for the combined system, provided the environment
behaves in a "fair" manner, and certain program transformations, such as
folding and unfolding, will preserve the semantics. An application is
evaluation strategies and program transformations for concurrent languages
- …