8 research outputs found
Author retrospective for the dual data cache
In this paper we present a retrospective on our paper published in ICS 1995, which to best of our knowledge was the first paper that introduced the concept of a cache memory with multiple subcaches, each tuned for a different type of locality. In this retrospective, we summarize the main ideas of the original paper and outline some of the later work that exploited similar ideas and could have been influenced by our original paper, including two actual industrial microprocessors.Peer ReviewedPostprint (author’s final draft
A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems
Recent technological advances have greatly improved the performance and
features of embedded systems. With the number of just mobile devices now
reaching nearly equal to the population of earth, embedded systems have truly
become ubiquitous. These trends, however, have also made the task of managing
their power consumption extremely challenging. In recent years, several
techniques have been proposed to address this issue. In this paper, we survey
the techniques for managing power consumption of embedded systems. We discuss
the need of power management and provide a classification of the techniques on
several important parameters to highlight their similarities and differences.
This paper is intended to help the researchers and application-developers in
gaining insights into the working of power management techniques and designing
even more efficient high-performance embedded systems of tomorrow
Evaluación de Sistemas de Cache WEB particionadas en función del tamaño de los objetos
Los objetos WEB son cacheados en los proxies (caches WEB) reemplazando
objetos grandes por muchos objetos pequeños, esto hace que las tasas de aciertos
se vean afectadas en función del tamaño de los objetos que cachean, en este documento
se evalúa experimentalmente como una jerarquÃa de proxies
puede mejorar las tasas de aciertos.León Medina, DA. (2008). Evaluación de Sistemas de Cache WEB particionadas en función del tamaño de los objetos. http://hdl.handle.net/10251/14197Archivo delegad
Software-assisted cache mechanisms for embedded systems
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (leaves 120-135).Embedded systems are increasingly using on-chip caches as part of their on-chip memory system. This thesis presents cache mechanisms to improve cache performance and provide opportunities to improve data availability that can lead to more predictable cache performance. The first cache mechanism presented is an intelligent cache replacement policy that utilizes information about dead data and data that is very frequently used. This mechanism is analyzed theoretically to show that the number of misses using intelligent cache replacement is guaranteed to be no more than the number of misses using traditional LRU replacement. Hardware and software-assisted mechanisms to implement intelligent cache replacement are presented and evaluated. The second cache mechanism presented is that of cache partitioning which exploits disjoint access sequences that do not overlap in the memory space. A theoretical result is proven that shows that modifying an access sequence into a concatenation of disjoint access sequences is guaranteed to improve the cache hit rate. Partitioning mechanisms inspired by the concept of disjoint sequences are designed and evaluated. A profit-based analysis, annotation, and simulation framework has been implemented to evaluate the cache mechanisms. This framework takes a compiled benchmark program and a set of program inputs and evaluates various cache mechanisms to provide a range of possible performance improvement scenarios. The proposed cache mechanisms have been evaluated using this framework by measuring cache miss rates and Instructions Per Clock (IPC) information. The results show that the proposed cache mechanisms show promise in improving cache performance and predictability with a modest increase in silicon area.by Prabhat Jain.Ph.D
Datapath and memory co-optimization for FPGA-based computation
With the large resource densities available on modern FPGAs it is often the available
memory bandwidth that limits the parallelism (and therefore performance) that can be
achieved. For this reason the focus of this thesis is the development of an integrated
scheduling and memory optimisation methodology to allow high levels of parallelism to be
exploited in FPGA based designs.
A manual translation from C to hardware is first investigated as a case study,
exposing a number of potential optimisation techniques that have not been exploited in
existing work. An existing outer loop pipelining approach, originally developed for VLIW
processors, is extended and adapted for application to FPGAs. The outer loop pipelining
methodology is first developed to use a fixed memory subsystem design and then extended
to automate the optimisation of the memory subsystem. This approach allocates arrays
to physical memories and selects the set of data reuse structures to implement to match
the available and required memory bandwidths as the pipelining search progresses. The
final extension to this work is to include the partitioning of data from a single array across
multiple physical memories, increasing the number of memory ports through which data
my be accessed. The facility for loop unrolling is also added to increase the potential for
parallelism and exploit the additional bandwidth that partitioning can provide.
We describe our approach based on formal methodologies and present the results
achieved when these methods are applied to a number of benchmarks. These results show
the advantages of both extending pipelining to levels above the innermost loop and the
co-optimisation of the datapath and memory subsystem