Search CORE

1,142 research outputs found

Software-Based Self-Test of Set-Associative Cache Memories

Author: Di Carlo Stefano
Prinetto Paolo Ernesto
Savino Alessandro
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

Embedded microprocessor cache memories suffer from limited observability and controllability creating problems during in-system tests. This paper presents a procedure to transform traditional march tests into software-based self-test programs for set-associative cache memories with LRU replacement. Among all the different cache blocks in a microprocessor, testing instruction caches represents a major challenge due to limitations in two areas: 1) test patterns which must be composed of valid instruction opcodes and 2) test result observability: the results can only be observed through the results of executed instructions. For these reasons, the proposed methodology will concentrate on the implementation of test programs for instruction caches. The main contribution of this work lies in the possibility of applying state-of-the-art memory test algorithms to embedded cache memories without introducing any hardware or performance overheads and guaranteeing the detection of typical faults arising in nanometer CMOS technologie

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Towards a theory of cache-efficient algorithms

Author: Chatterjee Siddhartha
Sen Sandeep
Publication venue: Society for Industrial and Applied Mathematics Philadelphia
Publication date: 01/01/2000
Field of study

We describe a model that enables us to analyze the running time of an algorithm in a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our model, an extension of Aggarwal and Vitter's I/O model, enables us to establish useful relationships between the cache complexity and the I/O complexity of computations. As a corollary, we obtain cache-optimal algorithms for some fundamental problems like sorting, FFT, and an important subclass of permutations in the single-level cache model. We also show that ignoring associativity concerns could lead to inferior performance, by analyzing the average-case cache behavior of mergesort. We further extend our model to multiple levels of cache with limited associativity and present optimal algorithms for matrix transpose and sorting. Our techniques may be used for systematic exploitation of the memory hierarchy starting from the algorithm design stage, and dealing with the hitherto unresolved problem of limited associativity

Acceleration by Inline Cache for Memory-Intensive Algorithms on FPGA via High-Level Synthesis

Author: Arif Arslan
Lavagno Luciano
Lazarescu MIHAI TEODOR
Ma Liang
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Using FPGA-based acceleration of high-performance computing (HPC) applications to reduce energy and power consumption is becoming an interesting option, thanks to the availability of high-level synthesis (HLS) tools that enable fast design cycles. However, obtaining good performance for memory-intensive algorithms, which often exchange large data arrays with external DRAM, still requires time-consuming optimization and good knowledge of hardware design. This article proposes a new design methodology, based on dedicated application- and data array-specific caches. These caches provide most of the benefits that can be achieved by coding optimized DMA-like transfer strategies by hand into the HPC application code, but require only limited manual tuning (basically the selection of architecture and size), are neutral to target HLS tool and technology (FPGA or ASIC), and do not require changes to application code. We show experimental results obtained on five common memory-intensive algorithms from very diverse domains, namely machine learning, data sorting, and computer vision. We test the cost and performance of our caches against both out-of-the-box code originally optimized for a GPU, and manually optimized implementations specifically targeted for FPGAs via HLS. The implementation using our caches achieved an 8X speedup and 2X energy reduction on average with respect to out-of-the-box models using only simple directive-based optimizations (e.g., pipelining). They also achieved comparable performance with much less design effort when compared with the versions that were manually optimized to achieve efficient memory transfers specifically for an FPGA

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

An Associativity Threshold Phenomenon in Set-Associative Caches

Author: Bender Michael A.
Das Rathish
Farach-Colton Martín
Tagliavini Guido
Publication venue
Publication date: 10/04/2023
Field of study

In an

\alpha

-way set-associative cache, the cache is partitioned into disjoint sets of size

\alpha

, and each item can only be cached in one set, typically selected via a hash function. Set-associative caches are widely used and have many benefits, e.g., in terms of latency or concurrency, over fully associative caches, but they often incur more cache misses. As the set size

\alpha

decreases, the benefits increase, but the paging costs worsen. In this paper we characterize the performance of an

\alpha

-way set-associative LRU cache of total size

k

, as a function of

\alpha = \alpha(k)

. We prove the following, assuming that sets are selected using a fully random hash function: - For

\alpha = \omega(\log k)

, the paging cost of an

\alpha

-way set-associative LRU cache is within additive

O(1)

of that a fully-associative LRU cache of size

(1-o(1))k

, with probability

1 - 1/\operatorname{poly}(k)

, for all request sequences of length

\operatorname{poly}(k)

. - For

\alpha = o(\log k)

, and for all

c = O(1)

and

r = O(1)

, the paging cost of an

\alpha

-way set-associative LRU cache is not within a factor

c

of that a fully-associative LRU cache of size

k/r

, for some request sequence of length

O(k^{1.01})

. - For

\alpha = \omega(\log k)

, if the hash function can be occasionally changed, the paging cost of an

\alpha

-way set-associative LRU cache is within a factor

1 + o(1)

of that a fully-associative LRU cache of size

(1-o(1))k

, with probability

1 - 1/\operatorname{poly}(k)

, for request sequences of arbitrary (e.g., super-polynomial) length. Some of our results generalize to other paging algorithms besides LRU, such as least-frequently used (LFU)

arXiv.org e-Print Archive

Applying measurement-based probabilistic timing analysis to buffer resources

Author: Abella Ferrer Jaume
Cazorla Almeida Francisco Javier
Kosmidis Leonidas
Quiñones Eduardo
Vardanega Tulio
Publication venue
Publication date: 01/01/2013
Field of study

The use of complex hardware makes it difficult for current timing analysis techniques to compute trustworthy and tight worst-case execution time (WCET) bounds. Those techniques require detailed knowledge of the internal operation and state of the platform, at both the software and hardware level. Obtaining that information for modern hardware platforms is increasingly difficult. Measurement-Based Probabilistic Timing Analysis (MBPTA) reduces the cost of acquiring the knowledge needed for computing trustworthy and tight WCET bounds. MBPTA based on Extreme Value Theory requires the execution time of processor instructions to be independent and identically distributed (i.i.d.), which can be achieved with some hardware support. Previous proposals show how those properties can be achieved for caches. This paper considers, for the first time, the implications on MBPTA of using buffer resources. Buffers in general, and firstcome first-served (FCFS) buffers in particular, are of paramount importance as the complexity of hardware increases, since they allow managing contention in those resources where multiple requests may be pending. We show how buffers can be used in the context of MBPTA and provide illustrative examples.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Dagstuhl Research Online Publication Server

Archivio istituzionale della ricerca - Università di Padova