1,318 research outputs found
Random Modulo: A new processor cache design for real-time critical systems
Cache memories have a huge impact on software's worst-case execution time (WCET). While enabling the seamless use of caches is key to provide the increasing levels of (guaranteed) performance required by automotive software, caches complicate timing analysis. In the context of Measurement-Based Probabilistic Timing Analysis (MBPTA) - a promising technique to ease timing analyis of complex hardware - we propose Random Modulo (RM), a new cache design that provides the probabilistic behavior required by MBPTA and with the following advantages over existing MBPTA-compliant cache designs: (i) an outstanding reduction in WCET estimates, (ii) lower latency and area overhead, and (iii) competitive average performance w.r.t conventional caches.Peer ReviewedPostprint (author's final draft
Experimental Evaluation of Cache-Related Preemption Delay Aware Timing Analysis
In the presence of caches, preemptive scheduling may incur a significant overhead referred to as cache-related preemption delay (CRPD). CRPD is caused by preempting tasks evicting cached memory blocks of preempted tasks, which have to be reloaded when the preempted tasks resume their execution.
In this paper we experimentally evaluate state-of-the-art techniques to account for the CRPD during timing analysis. We find that purely synthetically-generated task sets may yield misleading conclusions regarding the relative precision of different CRPD analysis techniques and the impact of CRPD on schedulability in general. Based on task characterizations obtained by static worst-case execution time (WCET) analysis, we shed new light on the state of the art
A confidence assessment of WCET estimates for software time randomized caches
Obtaining Worst-Case Execution Time (WCET) estimates is a required step in real-time embedded systems during software verification. Measurement-Based Probabilistic Timing Analysis (MBPTA) aims at obtaining WCET estimates for industrial-size software running upon hardware platforms comprising high-performance features. MBPTA relies on the randomization of timing behavior (functional behavior is left unchanged) of hard-to-predict events like the location of objects in memory — and hence their associated cache behavior — that significantly impact software's WCET estimates. Software time-randomized caches (sTRc) have been recently proposed to enable MBPTA on top of Commercial off-the-shelf (COTS) caches (e.g. modulo placement). However, some random events may challenge MBPTA reliability on top of sTRc. In this paper, for sTRc and programs with homogeneously accessed addresses, we determine whether the number of observations taken at analysis, as part of the normal MBPTA application process, captures the cache events significantly impacting execution time and WCET. If this is not the case, our techniques provide the user with the number of extra runs to perform to guarantee that cache events are captured for a reliable application of MBPTA. Our techniques are evaluated with synthetic benchmarks and an avionics application.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under the PROXIMA Project
(www.proxima-project.eu), grant agreement no 611085. This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316, the HiPEAC Network of Excellence, and COST Action IC1202: Timing Analysis On Code-Level (TACLe). Jaume Abella has been partially supported by the Ministry of Economy and
Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft
Detecting time-fragmented cache attacks against AES using Performance Monitoring Counters
Cache timing attacks use shared caches in multi-core processors as side
channels to extract information from victim processes. These attacks are
particularly dangerous in cloud infrastructures, in which the deployed
countermeasures cause collateral effects in terms of performance loss and
increase in energy consumption. We propose to monitor the victim process using
an independent monitoring (detector) process, that continuously measures
selected Performance Monitoring Counters (PMC) to detect the presence of an
attack. Ad-hoc countermeasures can be applied only when such a risky situation
arises. In our case, the victim process is the AES encryption algorithm and the
attack is performed by means of random encryption requests. We demonstrate that
PMCs are a feasible tool to detect the attack and that sampling PMCs at high
frequencies is worse than sampling at lower frequencies in terms of detection
capabilities, particularly when the attack is fragmented in time to try to be
hidden from detection
Near-Memory Address Translation
Memory and logic integration on the same chip is becoming increasingly cost
effective, creating the opportunity to offload data-intensive functionality to
processing units placed inside memory chips. The introduction of memory-side
processing units (MPUs) into conventional systems faces virtual memory as the
first big showstopper: without efficient hardware support for address
translation MPUs have highly limited applicability. Unfortunately, conventional
translation mechanisms fall short of providing fast translations as
contemporary memories exceed the reach of TLBs, making expensive page walks
common.
In this paper, we are the first to show that the historically important
flexibility to map any virtual page to any page frame is unnecessary in today's
servers. We find that while limiting the associativity of the
virtual-to-physical mapping incurs no penalty, it can break the
translate-then-fetch serialization if combined with careful data placement in
the MPU's memory, allowing for translation and data fetch to proceed
independently and in parallel. We propose the Distributed Inverted Page Table
(DIPTA), a near-memory structure in which the smallest memory partition keeps
the translation information for its data share, ensuring that the translation
completes together with the data fetch. DIPTA completely eliminates the
performance overhead of translation, achieving speedups of up to 3.81x and
2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure
WCET analysis of multi-level set-associative instruction caches
With the advent of increasingly complex hardware in real-time embedded
systems (processors with performance enhancing features such as pipelines,
cache hierarchy, multiple cores), many processors now have a set-associative L2
cache. Thus, there is a need for considering cache hierarchies when validating
the temporal behavior of real-time systems, in particular when estimating
tasks' worst-case execution times (WCETs). To the best of our knowledge, there
is only one approach for WCET estimation for systems with cache hierarchies
[Mueller, 1997], which turns out to be unsafe for set-associative caches. In
this paper, we highlight the conditions under which the approach described in
[Mueller, 1997] is unsafe. A safe static instruction cache analysis method is
then presented. Contrary to [Mueller, 1997] our method supports set-associative
and fully associative caches. The proposed method is experimented on
medium-size and large programs. We show that the method is most of the time
tight. We further show that in all cases WCET estimations are much tighter when
considering the cache hierarchy than when considering only the L1 cache. An
evaluation of the analysis time is conducted, demonstrating that analysing the
cache hierarchy has a reasonable computation time
- …