26,690 research outputs found
Formal and Informal Methods for Multi-Core Design Space Exploration
We propose a tool-supported methodology for design-space exploration for
embedded systems. It provides means to define high-level models of applications
and multi-processor architectures and evaluate the performance of different
deployment (mapping, scheduling) strategies while taking uncertainty into
account. We argue that this extension of the scope of formal verification is
important for the viability of the domain.Comment: In Proceedings QAPL 2014, arXiv:1406.156
Improving early design stage timing modeling in multicore based real-time systems
This paper presents a modelling approach for the timing behavior of real-time embedded systems (RTES) in early design phases. The model focuses on multicore processors - accepted as the next computing platform for RTES - and in particular it predicts the contention tasks suffer in the access to multicore on-chip shared resources. The model
presents the key properties of not requiring the application's source code or binary and having high-accuracy and low overhead. The former is of paramount importance in those common scenarios in which several software suppliers work in parallel implementing different applications for a system integrator, subject to different intellectual property (IP) constraints. Our model helps reducing the risk of exceeding the assigned budgets for each application in late design
stages and its associated costs.This work has received funding from the European Space
Agency under Project Reference AO=17722=13=NL=LvH,
and has also been supported by the Spanish Ministry of
Science and Innovation grant TIN2015-65316-P. Jaume Abella
has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft
Toward Contention Analysis for Parallel Executing Real-Time Tasks
In measurement-based probabilistic timing analysis, the execution conditions imposed to tasks as measurement scenarios, have a strong impact to the worst-case execution time estimates. The scenarios and their effects on the task execution behavior have to be deeply investigated. The aim has to be to identify and to guarantee the scenarios that lead to the maximum measurements, i.e. the worst-case scenarios, and use them to assure the worst-case execution time estimates.
We propose a contention analysis in order to identify the worst contentions that a task can suffer from concurrent executions. The work focuses on the interferences on shared resources (cache memories and memory buses) from parallel executions in multi-core real-time systems. Our approach consists of searching for possible task contenders for parallel executions, modeling their contentiousness, and classifying the measurement scenarios accordingly. We identify the most contentious ones and their worst-case effects on task execution times. The measurement-based probabilistic timing analysis is then used to verify the analysis proposed, qualify the scenarios with contentiousness, and compare them. A parallel execution simulator for multi-core real-time system is developed and used for validating our framework.
The framework applies heuristics and assumptions that simplify the system behavior. It represents a first step for developing a complete approach which would be able to guarantee the worst-case behavior
Well-Structured Futures and Cache Locality
In fork-join parallelism, a sequential program is split into a directed
acyclic graph of tasks linked by directed dependency edges, and the tasks are
executed, possibly in parallel, in an order consistent with their dependencies.
A popular and effective way to extend fork-join parallelism is to allow threads
to create futures. A thread creates a future to hold the results of a
computation, which may or may not be executed in parallel. That result is
returned when some thread touches that future, blocking if necessary until the
result is ready.
Recent research has shown that while futures can, of course, enhance
parallelism in a structured way, they can have a deleterious effect on cache
locality. In the worst case, futures can incur deviations, which implies
additional cache misses, where is the number of cache lines, is the
number of processors, is the number of touches, and is the
\emph{computation span}. Since cache locality has a large impact on software
performance on modern multicores, this result is troubling.
In this paper, however, we show that if futures are used in a simple,
disciplined way, then the situation is much better: if each future is touched
only once, either by the thread that created it, or by a thread to which the
future has been passed from the thread that created it, then parallel
executions with work stealing can incur at most additional
cache misses, a substantial improvement. This structured use of futures is
characteristic of many (but not all) parallel applications
Towards Efficient Abstractions for Concurrent Consensus
Consensus is an often occurring problem in concurrent and distributed
programming. We present a programming language with simple semantics and
build-in support for consensus in the form of communicating transactions. We
motivate the need for such a construct with a characteristic example of
generalized consensus which can be naturally encoded in our language. We then
focus on the challenges in achieving an implementation that can efficiently run
such programs. We setup an architecture to evaluate different implementation
alternatives and use it to experimentally evaluate runtime heuristics. This is
the basis for a research project on realistic programming language support for
consensus.Comment: 15 pages, 5 figures, symposium: TFP 201
Relating goal scheduling, precedence, and memory management in and-parallel execution of logic programs
The interactions among three important issues involved in the implementation of logic programs in parallel (goal scheduling, precedence, and memory management) are discussed. A simplified, parallel memory management model and an efficient, load-balancing goal scheduling strategy are presented. It is shown how, for systems which support "don't know" non-determinism, special care has to be taken during goal scheduling if the space recovery characteristics
of sequential systems are to be preserved. A solution based on selecting only "newer" goals for execution is described, and an algorithm is proposed for efficiently maintaining and determining precedence relationships and variable ages across parallel goals. It is argued that the proposed schemes and algorithms make it possible to extend the storage performance of sequential systems to parallel execution without the considerable overhead previously associated with it. The results are applicable to a wide class of parallel and coroutining systems, and they represent an efficient alternative to "all heap" or "spaghetti stack" allocation models
Recommended from our members
Fault tolerance in super-scalar and VLIW processors
In this paper, we present a method for utilizing the spare capacity in super-scalar and very long instruction word (VLIW) processors to tolerate functional unit failures. Unlike previous work that was primarily interested in detection of transient faults, we are concerned with more permanent and/or intermittent faults which necessitate processor reconfiguration. Our method utilizes the VLIW compiler or the superscalar scheduler to insert redundant operations whenever idle functional units exist. The results of these redundant operations are used to detect and diagnose functional unit failures. For super-scalar processors, the scheduler can then utilize this information to ensure that operations are performed only on non-faulty units. In VLIW processors, this is equivalent to recompiling the code to run on the remaining non-faulty functional units. Since in certain applications, recompilation may not be possible, we consider two alternative reconfiguration strategies for VLIW processors. These strategies sacrifice storage space and execution time, respectively, in order to reconfigure without recompiling. We present Markov models that describe the behavior of processors using these different approaches and we evaluate their reliabilities. The results show that, while super-scalar and VLIW with recompilation provide the highest reliability, all proposed strategies significantly increase reliability over that of an unprotected processor
Costing JIT Traces
Tracing JIT compilation generates units of compilation that
are easy to analyse and are known to execute frequently. The AJITPar
project aims to investigate whether the information in JIT traces can be
used to make better scheduling decisions or perform code transformations
to adapt the code for a specific parallel architecture. To achieve this goal,
a cost model must be developed to estimate the execution time of an
individual trace.
This paper presents the design and implementation of a system for extracting
JIT trace information from the Pycket JIT compiler. We define
three increasingly parametric cost models for Pycket traces. We perform
a search of the cost model parameter space using genetic algorithms to
identify the best weightings for those parameters. We test the accuracy
of these cost models for predicting the cost of individual traces on a set
of loop-based micro-benchmarks. We also compare the accuracy of the
cost models for predicting whole program execution time over the Pycket
benchmark suite. Our results show that the weighted cost model
using the weightings found from the genetic algorithm search has the
best accuracy
An Algebra of Synchronous Scheduling Interfaces
In this paper we propose an algebra of synchronous scheduling interfaces
which combines the expressiveness of Boolean algebra for logical and functional
behaviour with the min-max-plus arithmetic for quantifying the non-functional
aspects of synchronous interfaces. The interface theory arises from a
realisability interpretation of intuitionistic modal logic (also known as
Curry-Howard-Isomorphism or propositions-as-types principle). The resulting
algebra of interface types aims to provide a general setting for specifying
type-directed and compositional analyses of worst-case scheduling bounds. It
covers synchronous control flow under concurrent, multi-processing or
multi-threading execution and permits precise statements about exactness and
coverage of the analyses supporting a variety of abstractions. The paper
illustrates the expressiveness of the algebra by way of some examples taken
from network flow problems, shortest-path, task scheduling and worst-case
reaction times in synchronous programming.Comment: In Proceedings FIT 2010, arXiv:1101.426
Work Analysis with Resource-Aware Session Types
While there exist several successful techniques for supporting programmers in
deriving static resource bounds for sequential code, analyzing the resource
usage of message-passing concurrent processes poses additional challenges. To
meet these challenges, this article presents an analysis for statically
deriving worst-case bounds on the total work performed by message-passing
processes. To decompose interacting processes into components that can be
analyzed in isolation, the analysis is based on novel resource-aware session
types, which describe protocols and resource contracts for inter-process
communication. A key innovation is that both messages and processes carry
potential to share and amortize cost while communicating. To symbolically
express resource usage in a setting without static data structures and
intrinsic sizes, resource contracts describe bounds that are functions of
interactions between processes. Resource-aware session types combine standard
binary session types and type-based amortized resource analysis in a linear
type system. This type system is formulated for a core session-type calculus of
the language SILL and proved sound with respect to a multiset-based operational
cost semantics that tracks the total number of messages that are exchanged in a
system. The effectiveness of the analysis is demonstrated by analyzing standard
examples from amortized analysis and the literature on session types and by a
comparative performance analysis of different concurrent programs implementing
the same interface.Comment: 25 pages, 2 pages of references, 11 pages of appendix, Accepted at
LICS 201
- …