Search CORE

28,497 research outputs found

An Interpolative Analytical Cache Model with Application to Performance-Power Design Space Exploration

Author: Peng Bing
Tay Yong Chiang
Wong Weng Fai
Publication venue
Publication date: 01/01/2005
Field of study

Caches are known to consume up to half of all system power in embedded processors. Co-optimizing performance and power of the cache subsystems is therefore an important step in the design of embedded systems, especially those employing application specific instruction processors. In this project, we propose an analytical cache model that succinctly captures the miss performance of an application over the entire cache parameter space. Unlike exhaustive trace driven simulation, our model requires that the program be simulated once so that a few key characteristics can be obtained. Using these application-dependent characteristics, the model can span the entire cache parameter space consisting of cache sizes, associativity and cache block sizes. In our unified model, we are able to cater for direct-mapped, set and fully associative instruction, data and unified caches. Validation against full trace-driven simulations shows that our model has a high degree of fidelity. Finally, we show how the model can be coupled with a power model for caches such that one can very quickly decide on pareto-optimal performance-power design points for rapid design space exploration.Singapore-MIT Alliance (SMA

DSpace@MIT

A Cache Management Strategy to Replace Wear Leveling Techniques for Embedded Flash Memory

Author: Boukhobza Jalil
Olivier Pierre
Rubini Stéphane
Publication venue
Publication date: 27/06/2011
Field of study

Prices of NAND flash memories are falling drastically due to market growth and fabrication process mastering while research efforts from a technological point of view in terms of endurance and density are very active. NAND flash memories are becoming the most important storage media in mobile computing and tend to be less confined to this area. The major constraint of such a technology is the limited number of possible erase operations per block which tend to quickly provoke memory wear out. To cope with this issue, state-of-the-art solutions implement wear leveling policies to level the wear out of the memory and so increase its lifetime. These policies are integrated into the Flash Translation Layer (FTL) and greatly contribute in decreasing the write performance. In this paper, we propose to reduce the flash memory wear out problem and improve its performance by absorbing the erase operations throughout a dual cache system replacing FTL wear leveling and garbage collection services. We justify this idea by proposing a first performance evaluation of an exclusively cache based system for embedded flash memories. Unlike wear leveling schemes, the proposed cache solution reduces the total number of erase operations reported on the media by absorbing them in the cache for workloads expressing a minimal global sequential rate.Comment: Ce papier a obtenu le "Best Paper Award" dans le "Computer System track" nombre de page: 8; International Symposium on Performance Evaluation of Computer & Telecommunication Systems, La Haye : Netherlands (2011

arXiv.org e-Print Archive

HAL-Université de Bretagne Occidentale

LUT saving in embedded FPGAs for cache locking in real-time systems

Author: Martí Campoy Antonio
Ors Carot Rafael
Rodríguez Ballester Francisco
Publication venue
Publication date: 01/01/2013
Field of study

[EN] In recent years, cache locking have appeared as a solution to ease the schedulability analysis of real-time systems using cache memories maintaining, at the same time, similar performance improvements than regular cache memories. New devices for the embedded market couple a processor and a programmable logic device designed to enhance system flexibility and increase the possibilities of customisation in the field. This arrangement may help to improve the use of cache locking in real-time systems. This work proposes the use of this embedded programmable logic device to implement a logic function that provides the cache controller the information it needs in order to determine if a referenced main memory block has to be loaded and locked into the cache; we have called this circuit a Locking State Generator. Experiments show the requirements in terms of number of hardware resources and a way to reduce them and the circuit complexity. This reduction ranges from 50% up to 80% of the number of hardware resources originally needed to build the Locking State Generator circuit.This work has been partially supported by PAID-06-11/2055 of Universitat Politecnica de València and TIN2011-28435- C03-01 of Ministerio de Ciencia e Innovacion.Martí Campoy, A.; Rodríguez Ballester, F.; Ors Carot, R. (2013). LUT saving in embedded FPGAs for cache locking in real-time systems. International Journal On Advances in Systems and Measurements. 6(1-2):190-199. http://hdl.handle.net/10251/46758S19019961-

RiuNet

Revisiting LP-NUCA Energy Consumption: Cache Access Policies and Adaptive Block Dropping

Author: Ferrerón Alexandra
Monreal Arnal Teresa
Montesano del Campo Luis
Suárez Gracia Darío
Viñals Yúfera Víctor
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Cache working-set adaptation is key as embedded systems move to multiprocessor and Simultaneous Multithreaded Architectures (SMT) because interthread pollution harms system performance and battery life. Light-Power NUCA (LP-NUCA) is a working-set adaptive cache that depends on temporal-locality to save energy. This work identifies the sources of energy waste in LP-NUCAs: parallel access to the tag and data arrays of the tiles and low locality phases with useless block migration. To counteract both issues, we prove that switching to serial access reduces energy without harming performance and propose a machine learning Adaptive Drop Rate (ADR) controller that minimizes the amount of replacement and migration when locality is low. This work demonstrates that these techniques efficiently adapt the cache drop and access policies to save energy. They reduce LP-NUCA consumption 22.7% for 1SMT. With interthread cache contention in 2SMT, the savings rise to 29%. Versus a conventional organization, energy--delay improves 20.8% and 25% for 1- and 2SMT benchmarks, and, in 65% of the 2SMT mixes, gains are larger than 20%

Repositorio Universidad de Zaragoza

Data Cache-Energy and Throughput Models: Design Exploration for Embedded Processors

Author: McDonald-Maier Klaus
Qadri Muhammad Yasir
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2009
Field of study

Most modern 16-bit and 32-bit embedded processors contain cache memories to further increase instruction throughput of the device. Embedded processors that contain cache memories open an opportunity for the low-power research community to model the impact of cache energy consumption and throughput gains. For optimal cache memory configuration mathematical models have been proposed in the past. Most of these models are complex enough to be adapted for modern applications like run-time cache reconfiguration. This paper improves and validates previously proposed energy and throughput models for a data cache, which could be used for overhead analysis for various cache types with relatively small amount of inputs. These models analyze the energy and throughput of a data cache on an application basis, thus providing the hardware and software designer with the feedback vital to tune the cache or application for a given energy budget. The models are suitable for use at design time in the cache optimization process for embedded processors considering time and energy overhead or could be employed at runtime for reconfigurable architectures

University of Essex Research Repository

Springer - Publisher Connector

Directory of Open Access Journals

Linux kernel compaction through cold code swapping

Author: A. Milanova
B. Ford
B. Sutter De
C.-T. Lee
D. Citron
D. Ferrari
D. Gay
D.J. Hatfield
J.L. Hennessy
K. Pettis
N. Gloy
S. Bhatia
S. Debray
S.K. Debray
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

There is a growing trend to use general-purpose operating systems like Linux in embedded systems. Previous research focused on using compaction and specialization techniques to adapt a general-purpose OS to the memory-constrained environment, presented by most, embedded systems. However, there is still room for improvement: it has been shown that even after application of the aforementioned techniques more than 50% of the kernel code remains unexecuted under normal system operation. We introduce a new technique that reduces the Linux kernel code memory footprint, through on-demand code loading of infrequently executed code, for systems that support virtual memory. In this paper, we describe our general approach, and we study code placement algorithms to minimize the performance impact of the code loading. A code, size reduction of 68% is achieved, with a 2.2% execution speedup of the system-mode execution time, for a case study based on the MediaBench II benchmark suite

Crossref

Ghent University Academic Bibliography

Transparent code authentication at the processor level

Author: A.O. Durahim
Aoki
B. Sunar
Bellare
Black
Boneh
Brassard
Carter
Chevallier-Mames
Choukri
Clarke
Clarke
E. Savaş
Gaj
Gassend
Gaubatz
Hodjat
Hopkins
Joye
Kaps
Krawczyk
Lee
Lim
McCune
Ö. Kocabaş
Reyhani-Masoleh
Satoh
Suh
Sunar
T.B. Pedersen
Yan
Yang
Zhang
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2009
Field of study

The authors present a lightweight authentication mechanism that verifies the authenticity of code and thereby addresses the virus and malicious code problems at the hardware level eliminating the need for trusted extensions in the operating system. The technique proposed tightly integrates the authentication mechanism into the processor core. The authentication latency is hidden behind the memory access latency, thereby allowing seamless on-the-fly authentication of instructions. In addition, the proposed authentication method supports seamless encryption of code (and static data). Consequently, while providing the software users with assurance for authenticity of programs executing on their hardware, the proposed technique also protects the software manufacturers’ intellectual property through encryption. The performance analysis shows that, under mild assumptions, the presented technique introduces negligible overhead for even moderate cache sizes

Crossref

Sabanci University Research Database