60 research outputs found
Une architecture SMT pour le temps-réel strict
12 pagesLes processeurs multi-flot simultané (Simultaneous Multithreading ou SMT) peuvent être de bons candidats pour satisfaire les exigences en performances toujours croissantes des applications embarquées. Toutefois, les architectures SMT classiques ne présentent pas toujours la prévisibilité temporelle nécessaire pour permettre une analyse statique de temps d'exécution pire cas (Worst-Case Execution Times ou WCET). Dans cet article, nous analysons la prévisibilité de différentes politiques de contrôle des ressources partagées implémentées sur les coeurs SMT existants. Ensuite, nous proposons une architecture SMT conçue pour exécuter un thread temps-réel strict de façon à ce que son temps d'exécution pire cas soit analysable même si d'autres threads (non critiques) sont exécutés simultanément. Des résultats expérimentaux montrent que cette architecture reste performante, en moyenne et dans le pire cas
Automatic WCET Analysis of Real-Time Parallel Applications
National audienceTomorrow’s real-time embedded systems will be built upon multicore architectures. This raises two challenges. First, shared resources should be arbitrated in such a way that the WCET of independent threads running concurrently can be computed: in this paper, we assume that time-predictable multicore architectures are available. The second challenge is to develop software that achieves a high level of performance without impairing timing predictability. We investigate parallel software based on the POSIX threads standard and we show how the WCET of a parallel program can be analysed. We report experimental results obtained for typical parallel programs with an extended version of the OTAWA toolset
Génération automatique de simulateurs fonctionnels de processeurs
12 pagesLe développement d'un simulateur de processeur est long et fastidieux. Découpler la partie fonctionnelle (émulation) de la partie structure (analyse des temps de traitement) permet de réutiliser plus facilement du code existant (principalement le code d'émulation, les jeux d'instructions évoluant moins vite que les architectures matérielles). Dans ce contexte, plusieurs équipes ont proposé des solutions pour une génération automatique de la partie fonctionnelle d'un simulateur à partir d'une description plus ou moins formelle du jeu d'instructions. S'il est relativement aisé de générer automatiquement un émulateur pour l'architecture DLX, il s'avère plus compliqué de réaliser un générateur supportant à la fois des architectures de type CISC, RISC ou VLIW et produisant un code efficace. Dans cet article, nous décrivons plusieurs techniques mises en œuvre dans l'outil GLISS que nous avons développé et qui se veut aussi « polyvalent » que possible
Multi-architecture Value Analysis for Machine Code
International audienceSafety verification of critical real-time embedded systems requires Worst Case Execution Time information (WCET). Among the existing approaches to estimate the WCET, static analysis at the machine code level has proven to get safe results. A lot of different architectures are used in real-time systems but no generic solution provides the ability to perform static analysis of values handled by machine instructions. Nonetheless, results of such analyses are worth to improve the precision of other analyzes like data cache, indirect branches, etc. This paper proposes a semantic language aimed at expressing semantics of machine instructions whatever the underlying instruction set is. This ensures abstraction and portability of the value analysis or any analysis based on the semantic expression of the instructions. As a proof of concept, we adapted and refined an existing analysis representing values as Circular-Linear Progression (CLP), that is, as a sparse integer interval effective to model pointers. In addition, we show how our semantic instructions allow to build back conditions of loop in order to refine the CLP values and improve the precision of the analysis. Both contributions have been implemented in our framework, OTAWA, and experimented on the Malardalen benchmark to demonstrate the effectiveness of the approach
Alternative Schemes for High-Bandwidth Instruction Fetching
Future processors combining out-of-order execution with aggressive speculation techniques will need to fetch multiple non-consecutive instruction blocks in a single cycle to achieve high-performance. Several high-bandwidth instruction fetching schemes have been proposed in the past few years. The Two-Block Ahead (TBA) branch predictor predicts two non-consecutive instruction blocks per cycle while relying on a conventional instruction cache. The trace cache (TC) records traces of instructions and delivers multiple non-consecutive instruction blocks to the execution core. The aim of this paper is to investigate the pros and cons of both approaches. Maintaining consistency between memory and TC is not a straightforward issue. We propose a simple hardware scheme to maintain consistency at a reasonable performance loss (1 to 5%). We also introduce a new fill unit heuristic for TC, the mispredict hint, that leads to significantly better performance (up to 20 %). This is mainly due to better prediction accuracy results and TC miss ratios. TBA requires double-ported or bank-interleaved structures to supply two non-consecutive blocks in a single cycle. We show that a 4-way interleaving scheme is cost-effective since it impairs performance by only 3 to 5%. Finally, simulation results show that such an enhanced TC scheme delivers higher performance than TBA when caches are large, due to a lower branch misprediction penalty and a higher instruction bandwidth on mispredictions. When the hardware budget is smaller, TBA outperforms TC because of a higher TC miss ratio and branch misprediction rate
Combining Symbolic Execution and Path Enumeration in Worst-Case Execution Time Analysis
his paper examines the problem of determining bounds on execution time of real-time programs. Execution time estimation is generally useful in real-time software verification phase, but may be used in other phases of the design and execution of real-time programs (scheduling, automatic parallelizing, etc.). This paper is devoted to the worst-case execution time (WCET) analysis. We present a static WCET analysis approach aimed to automatically extract flow information used in WCET estimate computing. The approach combines symbolic execution and path enumeration. The main idea is to avoid unfolding loops performed by symbolic execution-based approaches while providing tight and safe WCET estimate
Optimisations du chargement des instructions
National audienceLes processeurs actuels et à venir, dont le coeur d'exécution exploite le parallélisme entre instructions, ne peuvent atteindre leurs performances maximales que s'ils sont alimentés par un débit d'instructions suffisant. Dans cet article, nous montrons que la bande passante d'accès au cache d'instructions est en général sous-exploitée. Nous proposons deux solutions pour optimiser les accès au cache d'instructions : l'une consiste à combiner plusieurs accès à une même ligne de cache ; l'autre prévoit de réordonner les accès pour limiter le nombre de conflits de bancs dans un cache multi-port. Les résultats de simulation montrent que ces deux optimisations améliorent sensiblement le débit de chargement des instructions. Par ailleurs, leur mise en oeuvre se fait au travers de séquences de contrôle du chargement qui tiennent également lieu de prédicteur multiple de branchements
OTAWA: An Open Toolbox for Adaptive WCET Analysis
International audienceThe analysis of worst-case execution times has become mandatory in the design of hard real-time systems: it is absolutely necessary to know an upper bound of the execution time of each task to determine a task schedule that insures that deadlines will all be met. The OTAWA toolbox presented in this paper has been designed to host algorithms resulting from research in the domain of WCET analysis so that they can be combined to compute tight WCET estimates. It features an abstraction layer that decouples the analyses from the target hardware and from the instruction set architecture, as well as a set of functionalities that facilitate the implementation of new approaches
Predictable composition of memory accesses on many-core processors
International audienceThe use of many-core COTS processors in safety critical embedded systems is a challenging research topic. The predictable execution of several applications on those processors is not possible without a precise analysis and mitigation of the possible sources of interference. In this paper, we identify the external DDR-SDRAM and the Network on Chip to be the main bottlenecks for both average performance and predictability in such platforms. As DDR-SDRAM memories are intrinsically stateful, the naive calculation of the Worst-Case Execution Times (WCETs) of tasks involves a significantly pessimistic upper-bounding of the memory access latencies. Moreover, the worst-case end-to-end delays of wormhole switched networks cannot be bounded without strong assumptions on the system model because of the possibility of deadlock. We provide an analysis of each potential source of interference and we give recommendations in order to build viable execution models enabling efficient composable computation of worst-case end-to-end memory access latencies compared to the naive worst-case-everywhere approach
Accurate analysis of memory latencies for WCET estimation
International audienceThese last years, many researchers have proposed solutions to estimate the Worst-Case Execution Time of a critical application when it is run on modern hardware. Several schemes commonly implemented to improve performance have been considered so far in the context of static WCET analysis: pipelines, instruction caches, dynamic branch predictors, execution cores supporting out-of-order execution, etc. Comparatively, components that are external to the processor have received lesser attention. In particular, the latency of memory accesses is generally considered as a fixed value. Now, modern DRAM devices support the open page policy that reduces the memory latency when successive memory accesses address the same memory row. This scheme, also known as row buffer, induces variable memory latencies, depending on whether the access hits or misses in the row buffer. In this paper, we propose an algorithm to take the open page policy into account when estimating WCETs for a processor with an instruction cache. Experimental results show that WCET estimates are refined thanks to the consideration of tighter memory latencies instead of pessimistic values
- …