1,318 research outputs found

    Random Modulo: A new processor cache design for real-time critical systems

    Get PDF
    Cache memories have a huge impact on software's worst-case execution time (WCET). While enabling the seamless use of caches is key to provide the increasing levels of (guaranteed) performance required by automotive software, caches complicate timing analysis. In the context of Measurement-Based Probabilistic Timing Analysis (MBPTA) - a promising technique to ease timing analyis of complex hardware - we propose Random Modulo (RM), a new cache design that provides the probabilistic behavior required by MBPTA and with the following advantages over existing MBPTA-compliant cache designs: (i) an outstanding reduction in WCET estimates, (ii) lower latency and area overhead, and (iii) competitive average performance w.r.t conventional caches.Peer ReviewedPostprint (author's final draft

    Experimental Evaluation of Cache-Related Preemption Delay Aware Timing Analysis

    Get PDF
    In the presence of caches, preemptive scheduling may incur a significant overhead referred to as cache-related preemption delay (CRPD). CRPD is caused by preempting tasks evicting cached memory blocks of preempted tasks, which have to be reloaded when the preempted tasks resume their execution. In this paper we experimentally evaluate state-of-the-art techniques to account for the CRPD during timing analysis. We find that purely synthetically-generated task sets may yield misleading conclusions regarding the relative precision of different CRPD analysis techniques and the impact of CRPD on schedulability in general. Based on task characterizations obtained by static worst-case execution time (WCET) analysis, we shed new light on the state of the art

    A confidence assessment of WCET estimates for software time randomized caches

    Get PDF
    Obtaining Worst-Case Execution Time (WCET) estimates is a required step in real-time embedded systems during software verification. Measurement-Based Probabilistic Timing Analysis (MBPTA) aims at obtaining WCET estimates for industrial-size software running upon hardware platforms comprising high-performance features. MBPTA relies on the randomization of timing behavior (functional behavior is left unchanged) of hard-to-predict events like the location of objects in memory — and hence their associated cache behavior — that significantly impact software's WCET estimates. Software time-randomized caches (sTRc) have been recently proposed to enable MBPTA on top of Commercial off-the-shelf (COTS) caches (e.g. modulo placement). However, some random events may challenge MBPTA reliability on top of sTRc. In this paper, for sTRc and programs with homogeneously accessed addresses, we determine whether the number of observations taken at analysis, as part of the normal MBPTA application process, captures the cache events significantly impacting execution time and WCET. If this is not the case, our techniques provide the user with the number of extra runs to perform to guarantee that cache events are captured for a reliable application of MBPTA. Our techniques are evaluated with synthetic benchmarks and an avionics application.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under the PROXIMA Project (www.proxima-project.eu), grant agreement no 611085. This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316, the HiPEAC Network of Excellence, and COST Action IC1202: Timing Analysis On Code-Level (TACLe). Jaume Abella has been partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    Detecting time-fragmented cache attacks against AES using Performance Monitoring Counters

    Get PDF
    Cache timing attacks use shared caches in multi-core processors as side channels to extract information from victim processes. These attacks are particularly dangerous in cloud infrastructures, in which the deployed countermeasures cause collateral effects in terms of performance loss and increase in energy consumption. We propose to monitor the victim process using an independent monitoring (detector) process, that continuously measures selected Performance Monitoring Counters (PMC) to detect the presence of an attack. Ad-hoc countermeasures can be applied only when such a risky situation arises. In our case, the victim process is the AES encryption algorithm and the attack is performed by means of random encryption requests. We demonstrate that PMCs are a feasible tool to detect the attack and that sampling PMCs at high frequencies is worse than sampling at lower frequencies in terms of detection capabilities, particularly when the attack is fragmented in time to try to be hidden from detection

    Near-Memory Address Translation

    Full text link
    Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs have highly limited applicability. Unfortunately, conventional translation mechanisms fall short of providing fast translations as contemporary memories exceed the reach of TLBs, making expensive page walks common. In this paper, we are the first to show that the historically important flexibility to map any virtual page to any page frame is unnecessary in today's servers. We find that while limiting the associativity of the virtual-to-physical mapping incurs no penalty, it can break the translate-then-fetch serialization if combined with careful data placement in the MPU's memory, allowing for translation and data fetch to proceed independently and in parallel. We propose the Distributed Inverted Page Table (DIPTA), a near-memory structure in which the smallest memory partition keeps the translation information for its data share, ensuring that the translation completes together with the data fetch. DIPTA completely eliminates the performance overhead of translation, achieving speedups of up to 3.81x and 2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure

    WCET analysis of multi-level set-associative instruction caches

    Get PDF
    With the advent of increasingly complex hardware in real-time embedded systems (processors with performance enhancing features such as pipelines, cache hierarchy, multiple cores), many processors now have a set-associative L2 cache. Thus, there is a need for considering cache hierarchies when validating the temporal behavior of real-time systems, in particular when estimating tasks' worst-case execution times (WCETs). To the best of our knowledge, there is only one approach for WCET estimation for systems with cache hierarchies [Mueller, 1997], which turns out to be unsafe for set-associative caches. In this paper, we highlight the conditions under which the approach described in [Mueller, 1997] is unsafe. A safe static instruction cache analysis method is then presented. Contrary to [Mueller, 1997] our method supports set-associative and fully associative caches. The proposed method is experimented on medium-size and large programs. We show that the method is most of the time tight. We further show that in all cases WCET estimations are much tighter when considering the cache hierarchy than when considering only the L1 cache. An evaluation of the analysis time is conducted, demonstrating that analysing the cache hierarchy has a reasonable computation time
    • …
    corecore