Search CORE

15 research outputs found

Software trace cache

Author: Larriba Pey Josep
Ramírez Bellido Alejandro
Valero Cortés Mateo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

We explore the use of compiler optimizations, which optimize the layout of instructions in memory. The target is to enable the code to make better use of the underlying hardware resources regardless of the specific details of the processor/architecture in order to increase fetch performance. The Software Trace Cache (STC) is a code layout algorithm with a broader target than previous layout optimizations. We target not only an improvement in the instruction cache hit rate, but also an increase in the effective fetch width of the fetch engine. The STC algorithm organizes basic blocks into chains trying to make sequentially executed basic blocks reside in consecutive memory positions, then maps the basic block chains in memory to minimize conflict misses in the important sections of the program. We evaluate and analyze in detail the impact of the STC, and code layout optimizations in general, on the three main aspects of fetch performance; the instruction cache hit rate, the effective fetch width, and the branch prediction accuracy. Our results show that layout optimized, codes have some special characteristics that make them more amenable for high-performance instruction fetch. They have a very high rate of not-taken branches and execute long chains of sequential instructions; also, they make very effective use of instruction cache lines, mapping only useful instructions which will execute close in time, increasing both spatial and temporal locality.Peer ReviewedPostprint (published version

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Linux kernel compaction through cold code swapping

Author: A. Milanova
B. Ford
B. Sutter De
C.-T. Lee
D. Citron
D. Ferrari
D. Gay
D.J. Hatfield
J.L. Hennessy
K. Pettis
N. Gloy
S. Bhatia
S. Debray
S.K. Debray
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

There is a growing trend to use general-purpose operating systems like Linux in embedded systems. Previous research focused on using compaction and specialization techniques to adapt a general-purpose OS to the memory-constrained environment, presented by most, embedded systems. However, there is still room for improvement: it has been shown that even after application of the aforementioned techniques more than 50% of the kernel code remains unexecuted under normal system operation. We introduce a new technique that reduces the Linux kernel code memory footprint, through on-demand code loading of infrequently executed code, for systems that support virtual memory. In this paper, we describe our general approach, and we study code placement algorithms to minimize the performance impact of the code loading. A code, size reduction of 68% is achieved, with a 2.2% execution speedup of the system-mode execution time, for a case study based on the MediaBench II benchmark suite

Crossref

Ghent University Academic Bibliography

Measurement-Based Timing Analysis of the AURIX Caches

Author: Abella Jaume
Cazorla Francisco J.
Compagnin Davide
Kosmidis Leonidas
Mezzetti Enrico
Morales David
Quinones Eduardo
Vardanega Tullio
Publication venue: OASIcs - OpenAccess Series in Informatics. 16th International Workshop on Worst-Case Execution Time Analysis (WCET 2016)
Publication date: 01/01/2016
Field of study

Cache memories are one of the hardware resources with higher potential to reduce worst-case execution time (WCET) costs for software programs with tight real-time constraints. Yet, the complexity of cache analysis has caused a large fraction of real-time systems industry to avoid using them, especially in the automotive sector. For measurement-based timing analysis (MBTA) - the dominant technique in domains such as automotive - cache challenges the definition of test scenarios stressful enough to produce (cache) layouts that causing high contention. In this paper, we present our experience in enabling the use of caches for a real automotive application running on an AURIX multiprocessor, using software randomization and measurement-based probabilistic timing analysis (MBPTA). Our results show that software randomization successfully exposes - in the experiments performed for timing analysis - cache related variability, in a manner that can be effectively captured by MBPTA

UPCommons. Portal del coneixement obert de la UPC

Dagstuhl Research Online Publication Server

Archivio istituzionale della ricerca - Università di Padova

Restructuring field layouts for embedded memory systems

Author: Daejeon
Hwansoo Han
Jungeun Kim
Keoncheol Shin
Korea
Korea Daejeon
Korea Daejeon
Korea Suwon
Seonggun Kim
Publication venue
Publication date: 01/01/2006
Field of study

Abstrac

CiteSeerX

Recommended from our members

Statement of Work for Studies in BlueGene/L Scalability and Reconfigurability

Author: Henning A
McKee S A
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 13/09/2005
Field of study

As referenced in the subcontract, the work included three major goals: (1) study the performance of an ASCI application, (2) study tradeoffs in using the second CPU in coprocessor mode to optimize use of the L3 scratchpad memory for performing vector-like gather/scatter and streamlining operations, and (3) perform simulator studies of hardware phase detection and identification. We made some modifications to the work contract. Work involving the integration of a cache-conscious data placement algorithm to improve cache utilization on BlueGene/L has been added and work involving the L3 scratchpad memory has been eliminated. This was explained in the previous milestones. In this milestone, we continue to focus on the last goal by modifying a cycle-accurate simulator, sim-alpha [4]. As premise to hardware phase detection and identification, we need to have an infrastructure for testing various cache-conscious data placement methods. For this milestone, we discuss the completed framework that handles cache-conscious placement optimizations, which includes profiling data accesses and handling remapped addresses. We will also introduce an algorithm (ccdp profiling tool) that we implemented for assigning remapped addresses for a given code. Our performance results show that by using our ccdp profiling tool, we achieve reduced miss rates and an improved overall simulation performance. For our test cases, we use four applications from the SPEC CPU 2000 suite [2]. In our past milestones, we studied research that involves implementing cache-conscious data placement techniques. By becoming more familiar with previous research, we can make better decisions on designing our cache-conscious profiling tool. It is important to have a firm understanding of the existing techniques that have proven to be efficient at improving memory performance, since our tool will produce trace files as input to our enhanced simulator framework

UNT Digital Library

Control/Architecture co-design for cyber-physical systems

Author: Chakraborty Samarjit
Chang Wanli
Roy Debayan
Zhang Licong
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2017
Field of study

White Rose Research Online

Statement of Work for Studies in BlueGene/L Scalability and Reconfigurability

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Semantics-preserving cosynthesis of cyber-physical systems

Author: Chakraborty Samarjit
Chang Wanli
Mitter Sanjoy
Roy Debayan
Zhang Licong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Crossref

White Rose Research Online