Search CORE

3,765 research outputs found

Improving early design stage timing modeling in multicore based real-time systems

Author: Abella Ferrer Jaume
Cazorla Almeida Francisco Javier
Fernández Mikel
Jalle Ibarra Javier
Trilla David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

This paper presents a modelling approach for the timing behavior of real-time embedded systems (RTES) in early design phases. The model focuses on multicore processors - accepted as the next computing platform for RTES - and in particular it predicts the contention tasks suffer in the access to multicore on-chip shared resources. The model presents the key properties of not requiring the application's source code or binary and having high-accuracy and low overhead. The former is of paramount importance in those common scenarios in which several software suppliers work in parallel implementing different applications for a system integrator, subject to different intellectual property (IP) constraints. Our model helps reducing the risk of exceeding the assigned budgets for each application in late design stages and its associated costs.This work has received funding from the European Space Agency under Project Reference AO=17722=13=NL=LvH, and has also been supported by the Spanish Ministry of Science and Innovation grant TIN2015-65316-P. Jaume Abella has been partially supported by the MINECO under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Exploring Hybrid SPM-Cache Architectures to Improve Performance and Energy Efficiency for Real-time Computing

Author: Wu Lan
Publication venue: VCU Scholars Compass
Publication date: 04/12/2013
Field of study

Real-time computing is not just fast computing but time-predictable computing. Many tasks in safety-critical embedded real-time systems have hard real-time characteristics. Failure to meet deadlines may result in the loss of life or in large damages. Known of Worst Case Execution Time (WCET) is important for reliability or correct functional behavior of the system. As multi-core processors are increasingly adopted in industry, it has become a great challenge to accurately bound the worst-case execution time (WCET) for real-time systems running on multi-core chips. This is particularly true because of the inter-thread interferences in accessing shared resources on multi-cores, such as shared L2 caches, which can significantly affect the performance but are very difficult to be estimate statically. We propose an approach to analyzing Worst Case Execution Time (WCET) for multi-core processors with shared L2 instruction caches by using a model checking based method. Our experiments indicate that compared to the static analysis technique based on extended ILP (Integer Linear Programming), our approach improves the tightness of WCET estimation more than 31.1% for the benchmarks we studied. However, due to the inherent complexity of multi-core timing analysis and the state explosion problem, the model checking based approach currently can only work with small real-time kernels for dual-core processors. At the same time, improving the average-case performance and energy efficiency has also been important for real-time systems. Recently, Hybrid SPM-Cache (HSC) architectures by combining caches and Scratch-Pad Memories (SPMs) have been increasingly used in commercial processors and research prototypes. Our research explores HSC architectures for real-time systems to reconcile time predictability, performance, and energy consumption. We study the energy dissipation of a number of hybrid on-chip memory architectures by combining both caches and Scratch-Pad Memories (SPM) without increasing the total on-chip memory size. Our experimental results indicate that with the equivalent total on-chip memory size, several hybrid SPM-Cache architectures are more energy-efficient than either pure software controlled SPMs or pure hardware-controlled caches. In particular, using the hybrid SPM-cache to store both instructions and data can achieve the best energy efficiency. However, the SPM allocation for the HSC architecture must be aware of the cache to harness the full potential of the HSC architecture. First, we propose and evaluate four SPM allocation strategies to reduce WCET for hybrid SPM-Caches with different complexities. These algorithms differ by whether or not they can cooperate with the cache or be aware of the WCET. Our evaluation shows that the cache aware and WCET-oriented SPM allocation can maximally reduce the WCET with minimum or even positive impact on the average-case execution time (ACET). Moreover, we explore four SPM allocation algorithms to maximize performance on the HSC architecture, including three heuristic-based algorithms, and an optimal algorithm based on model checking. Our experiments indicate that the Greedy Stack Distance based Allocation (GSDA) can run efficiently while achieving performance either the same as or close to the optimal results got by the Optimal Stack Distance based Allocation (OSDA). Last but not the least, we extend the two stack distance based allocation algorithms to GSDA-E and OSDA-E to minimize the energy consumption of the HSC architecture. Our experimental results show that the GSDA-E can also reduce the energy either the same as or close to the optimal results attained by the OSDA-E, while achieving performance close to the OSDA and GSDA

VCU Scholars Compass

The multi-program performance model: debunking current practice in multi-core simulation

Author: Eeckhout Lieven
Van Craeynest Kenzo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Composing a representative multi-program multi-core workload is non-trivial. A multi-core processor can execute multiple independent programs concurrently, and hence, any program mix can form a potential multi-program workload. Given the very large number of possible multiprogram workloads and the limited speed of current simulation methods, it is impossible to evaluate all possible multi-program workloads. This paper presents the Multi-Program Performance Model (MPPM), a method for quickly estimating multiprogram multi-core performance based on single-core simulation runs. MPPM employs an iterative method to model the tight performance entanglement between co-executing programs on a multi-core processor with shared caches. Because MPPM involves analytical modeling, it is very fast, and it estimates multi-core performance for a very large number of multi-program workloads in a reasonable amount of time. In addition, it provides confidence bounds on its performance estimates. Using SPEC CPU2006 and up to 16 cores, we report an average performance prediction error of 2.3% and 2.9% for system throughput (STP) and average normalized turnaround time (ANTT), respectively, while being up to five orders of magnitude faster than detailed simulation. Subsequently, we demonstrate that randomly picking a limited number of multi-program workloads, as done in current pactice, can lead to incorrect design decisions in practical design and research studies, which is alleviated using MPPM. In addition, MPPM can be used to quickly identify multi-program workloads that stress multi-core performance through excessive conflict behavior in shared caches; these stress workloads can then be used for driving the design process further

Crossref

Ghent University Academic Bibliography

Modelling Contention in Multicore Hardware Resources during Early Design Stages of Real-Time Systems

Author: Trilla Rodríguez David
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2016
Field of study

This thesis presents a modelling approach for the timing behavior of real-time embedded systems in early design phases. The model focuses on multicore processors and it predicts the contention tasks suffer in the access to multicore on-chip shared resources

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Memory performance of and-parallel prolog on shared-memory architectures

Author: Hermenegildo Manuel V.
Tick Evan
Publication venue: Facultad de Informática (UPM)
Publication date: 01/08/1988
Field of study

The goal of the RAP-WAM AND-parallel Prolog abstract architecture is to provide inference speeds significantly beyond those of sequential systems, while supporting Prolog semantics and preserving sequential performance and storage efficiency. This paper presents simulation results supporting these claims with special emphasis on memory performance on a two-level sharedmemory multiprocessor organization. Several solutions to the cache coherency problem are analyzed. It is shown that RAP-WAM offers good locality and storage efficiency and that it can effectively take advantage of broadcast caches. It is argued that speeds in excess of 2 ML IPS on real applications exhibiting medium parallelism can be attained with current technology

Archivo Digital UPM

Eager Stack Cache Memory Transfers

Author: Brandner Florian
Naji Amine
Publication venue: OASIcs - OpenAccess Series in Informatics. 16th International Workshop on Worst-Case Execution Time Analysis (WCET 2016)
Publication date: 01/01/2016
Field of study

The growing complexity of modern computer architectures increasingly complicates the prediction of the run-time behavior of software. For real-time systems, where a safe estimation of the program\u27s worst-case execution time is needed, time-predictable computer architectures promise to resolve this problem. The stack cache, for instance, allows the compiler to efficiently cache a program\u27s stack, while static analysis of its behavior remains easy. This work introduces an optimization of the stack cache that allows to anticipate memory transfers that might be initiated by future stack cache control instructions. These eager memory transfers thus allow to reduce the average-case latency of those control instructions, very similar to "prefetching" techniques known from conventional caches. However, the mechanism proposed here is guaranteed to have no impact on the worst-case execution time estimates computed by static analysis. Measurements on a dual-core platform using the Patmos processor and imedivision-multiplexing-based memory arbitration, show that our technique can eliminate up to 62% (7%) of the memory transfers from (respectively to) the stack cache on average over all programs of the MiBench benchmark suite

Dagstuhl Research Online Publication Server

ret2spec: Speculative Execution Using Return Stack Buffers

Author: Acıiccmez Onur
Aciiccmez Onur
Evtyushkin Dmitry
Fogh Anders
Gruss Daniel
Kocher Paul
Kohlbrenner David
Lipp Moritz
Maisuradze Giorgi
Maisuradze Giorgi
Maisuradze Giorgi
Song Chengyu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/07/2018
Field of study

Speculative execution is an optimization technique that has been part of CPUs for over a decade. It predicts the outcome and target of branch instructions to avoid stalling the execution pipeline. However, until recently, the security implications of speculative code execution have not been studied. In this paper, we investigate a special type of branch predictor that is responsible for predicting return addresses. To the best of our knowledge, we are the first to study return address predictors and their consequences for the security of modern software. In our work, we show how return stack buffers (RSBs), the core unit of return address predictors, can be used to trigger misspeculations. Based on this knowledge, we propose two new attack variants using RSBs that give attackers similar capabilities as the documented Spectre attacks. We show how local attackers can gain arbitrary speculative code execution across processes, e.g., to leak passwords another user enters on a shared system. Our evaluation showed that the recent Spectre countermeasures deployed in operating systems can also cover such RSB-based cross-process attacks. Yet we then demonstrate that attackers can trigger misspeculation in JIT environments in order to leak arbitrary memory content of browser processes. Reading outside the sandboxed memory region with JIT-compiled code is still possible with 80\% accuracy on average.Comment: Updating to the cam-ready version and adding reference to the original pape

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref