Search CORE

26,690 research outputs found

Formal and Informal Methods for Multi-Core Design Space Exploration

Author: Kempf Jean-Francois
Lebeltel Olivier
Maler Oded
Publication venue: 'Open Publishing Association'
Publication date: 01/06/2014
Field of study

We propose a tool-supported methodology for design-space exploration for embedded systems. It provides means to define high-level models of applications and multi-processor architectures and evaluate the performance of different deployment (mapping, scheduling) strategies while taking uncertainty into account. We argue that this extension of the scope of formal verification is important for the viability of the domain.Comment: In Proceedings QAPL 2014, arXiv:1406.156

arXiv.org e-Print Archive

Directory of Open Access Journals

Improving early design stage timing modeling in multicore based real-time systems

Author: Abella Ferrer Jaume
Cazorla Almeida Francisco Javier
Fernández Mikel
Jalle Ibarra Javier
Trilla David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

This paper presents a modelling approach for the timing behavior of real-time embedded systems (RTES) in early design phases. The model focuses on multicore processors - accepted as the next computing platform for RTES - and in particular it predicts the contention tasks suffer in the access to multicore on-chip shared resources. The model presents the key properties of not requiring the application's source code or binary and having high-accuracy and low overhead. The former is of paramount importance in those common scenarios in which several software suppliers work in parallel implementing different applications for a system integrator, subject to different intellectual property (IP) constraints. Our model helps reducing the risk of exceeding the assigned budgets for each application in late design stages and its associated costs.This work has received funding from the European Space Agency under Project Reference AO=17722=13=NL=LvH, and has also been supported by the Spanish Ministry of Science and Innovation grant TIN2015-65316-P. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Toward Contention Analysis for Parallel Executing Real-Time Tasks

Author: Guet Fabrice
Jenn Eric
Phavorin Guillaume
Santinelli Luca
Publication venue: OASIcs - OpenAccess Series in Informatics. 18th International Workshop on Worst-Case Execution Time Analysis (WCET 2018)
Publication date: 01/01/2018
Field of study

In measurement-based probabilistic timing analysis, the execution conditions imposed to tasks as measurement scenarios, have a strong impact to the worst-case execution time estimates. The scenarios and their effects on the task execution behavior have to be deeply investigated. The aim has to be to identify and to guarantee the scenarios that lead to the maximum measurements, i.e. the worst-case scenarios, and use them to assure the worst-case execution time estimates. We propose a contention analysis in order to identify the worst contentions that a task can suffer from concurrent executions. The work focuses on the interferences on shared resources (cache memories and memory buses) from parallel executions in multi-core real-time systems. Our approach consists of searching for possible task contenders for parallel executions, modeling their contentiousness, and classifying the measurement scenarios accordingly. We identify the most contentious ones and their worst-case effects on task execution times. The measurement-based probabilistic timing analysis is then used to verify the analysis proposed, qualify the scenarios with contentiousness, and compare them. A parallel execution simulator for multi-core real-time system is developed and used for validating our framework. The framework applies heuristics and assumptions that simplify the system behavior. It represents a first step for developing a complete approach which would be able to guarantee the worst-case behavior

Dagstuhl Research Online Publication Server

Well-Structured Futures and Cache Locality

Author: Herlihy Maurice
Liu Zhiyu
Publication venue
Publication date: 16/08/2016
Field of study

In fork-join parallelism, a sequential program is split into a directed acyclic graph of tasks linked by directed dependency edges, and the tasks are executed, possibly in parallel, in an order consistent with their dependencies. A popular and effective way to extend fork-join parallelism is to allow threads to create futures. A thread creates a future to hold the results of a computation, which may or may not be executed in parallel. That result is returned when some thread touches that future, blocking if necessary until the result is ready. Recent research has shown that while futures can, of course, enhance parallelism in a structured way, they can have a deleterious effect on cache locality. In the worst case, futures can incur

\Omega(P T_\infty + t T_\infty)

deviations, which implies

\Omega(C P T_\infty + C t T_\infty)

additional cache misses, where

C

is the number of cache lines,

P

is the number of processors,

t

is the number of touches, and

T_\infty

is the \emph{computation span}. Since cache locality has a large impact on software performance on modern multicores, this result is troubling. In this paper, however, we show that if futures are used in a simple, disciplined way, then the situation is much better: if each future is touched only once, either by the thread that created it, or by a thread to which the future has been passed from the thread that created it, then parallel executions with work stealing can incur at most

O(C P T^2_\infty)

additional cache misses, a substantial improvement. This structured use of futures is characteristic of many (but not all) parallel applications

arXiv.org e-Print Archive

CiteSeerX

Towards Efficient Abstractions for Concurrent Consensus

Author: Koutavas Vasileios
Spaccasassi Carlo
Publication venue
Publication date: 07/05/2013
Field of study

Consensus is an often occurring problem in concurrent and distributed programming. We present a programming language with simple semantics and build-in support for consensus in the form of communicating transactions. We motivate the need for such a construct with a characteristic example of generalized consensus which can be naturally encoded in our language. We then focus on the challenges in achieving an implementation that can efficiently run such programs. We setup an architecture to evaluate different implementation alternatives and use it to experimentally evaluate runtime heuristics. This is the basis for a research project on realistic programming language support for consensus.Comment: 15 pages, 5 figures, symposium: TFP 201

arXiv.org e-Print Archive

CiteSeerX

Relating goal scheduling, precedence, and memory management in and-parallel execution of logic programs

Author: Hermenegildo Manuel V.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/05/1987
Field of study

The interactions among three important issues involved in the implementation of logic programs in parallel (goal scheduling, precedence, and memory management) are discussed. A simplified, parallel memory management model and an efficient, load-balancing goal scheduling strategy are presented. It is shown how, for systems which support "don't know" non-determinism, special care has to be taken during goal scheduling if the space recovery characteristics of sequential systems are to be preserved. A solution based on selecting only "newer" goals for execution is described, and an algorithm is proposed for efficiently maintaining and determining precedence relationships and variable ages across parallel goals. It is argued that the proposed schemes and algorithms make it possible to extend the storage performance of sequential systems to parallel execution without the considerable overhead previously associated with it. The results are applicable to a wide class of parallel and coroutining systems, and they represent an efficient alternative to "all heap" or "spaghetti stack" allocation models

Archivo Digital UPM

Recommended from our members

Fault tolerance in super-scalar and VLIW processors

Author: Blough Douglas M.
Nicolau Alexandru
Publication venue: eScholarship, University of California
Publication date: 01/01/1991
Field of study

In this paper, we present a method for utilizing the spare capacity in super-scalar and very long instruction word (VLIW) processors to tolerate functional unit failures. Unlike previous work that was primarily interested in detection of transient faults, we are concerned with more permanent and/or intermittent faults which necessitate processor reconfiguration. Our method utilizes the VLIW compiler or the superscalar scheduler to insert redundant operations whenever idle functional units exist. The results of these redundant operations are used to detect and diagnose functional unit failures. For super-scalar processors, the scheduler can then utilize this information to ensure that operations are performed only on non-faulty units. In VLIW processors, this is equivalent to recompiling the code to run on the remaining non-faulty functional units. Since in certain applications, recompilation may not be possible, we consider two alternative reconfiguration strategies for VLIW processors. These strategies sacrifice storage space and execution time, respectively, in order to reconfigure without recompiling. We present Markov models that describe the behavior of processors using these different approaches and we evaluate their reliabilities. The results show that, while super-scalar and VLIW with recompilation provide the highest reliability, all proposed strategies significantly increase reliability over that of an unprotected processor

eScholarship - University of California

Costing JIT Traces

Author: Maier Patrick
Morton John Magnus
Trinder Philip
Publication venue: School of Computing Science, University of Glasgow
Publication date
Field of study

Tracing JIT compilation generates units of compilation that are easy to analyse and are known to execute frequently. The AJITPar project aims to investigate whether the information in JIT traces can be used to make better scheduling decisions or perform code transformations to adapt the code for a specific parallel architecture. To achieve this goal, a cost model must be developed to estimate the execution time of an individual trace. This paper presents the design and implementation of a system for extracting JIT trace information from the Pycket JIT compiler. We define three increasingly parametric cost models for Pycket traces. We perform a search of the cost model parameter space using genetic algorithms to identify the best weightings for those parameters. We test the accuracy of these cost models for predicting the cost of individual traces on a set of loop-based micro-benchmarks. We also compare the accuracy of the cost models for predicting whole program execution time over the Pycket benchmark suite. Our results show that the weighted cost model using the weightings found from the genetic algorithm search has the best accuracy

Enlighten

An Algebra of Synchronous Scheduling Interfaces

Author: Mendler Michael
Publication venue: 'Open Publishing Association'
Publication date: 01/01/2011
Field of study

In this paper we propose an algebra of synchronous scheduling interfaces which combines the expressiveness of Boolean algebra for logical and functional behaviour with the min-max-plus arithmetic for quantifying the non-functional aspects of synchronous interfaces. The interface theory arises from a realisability interpretation of intuitionistic modal logic (also known as Curry-Howard-Isomorphism or propositions-as-types principle). The resulting algebra of interface types aims to provide a general setting for specifying type-directed and compositional analyses of worst-case scheduling bounds. It covers synchronous control flow under concurrent, multi-processing or multi-threading execution and permits precise statements about exactness and coverage of the analyses supporting a variety of abstractions. The paper illustrates the expressiveness of the algebra by way of some examples taken from network flow problems, shortest-path, task scheduling and worst-case reaction times in synchronous programming.Comment: In Proceedings FIT 2010, arXiv:1101.426

arXiv.org e-Print Archive

Directory of Open Access Journals

Work Analysis with Resource-Aware Session Types

Author: Carbonneaux Quentin
Griffith Dennis
Haemmerlé Rémy
Hofmann Martin
Pfenning Frank
Silva Miguel
Terauchi Tachio
Willsey Max
Çiçek Ezgi
Publication venue
Publication date: 26/04/2018
Field of study

While there exist several successful techniques for supporting programmers in deriving static resource bounds for sequential code, analyzing the resource usage of message-passing concurrent processes poses additional challenges. To meet these challenges, this article presents an analysis for statically deriving worst-case bounds on the total work performed by message-passing processes. To decompose interacting processes into components that can be analyzed in isolation, the analysis is based on novel resource-aware session types, which describe protocols and resource contracts for inter-process communication. A key innovation is that both messages and processes carry potential to share and amortize cost while communicating. To symbolically express resource usage in a setting without static data structures and intrinsic sizes, resource contracts describe bounds that are functions of interactions between processes. Resource-aware session types combine standard binary session types and type-based amortized resource analysis in a linear type system. This type system is formulated for a core session-type calculus of the language SILL and proved sound with respect to a multiset-based operational cost semantics that tracks the total number of messages that are exchanged in a system. The effectiveness of the analysis is demonstrated by analyzing standard examples from amortized analysis and the literature on session types and by a comparative performance analysis of different concurrent programs implementing the same interface.Comment: 25 pages, 2 pages of references, 11 pages of appendix, Accepted at LICS 201

arXiv.org e-Print Archive

Crossref