Search CORE

8 research outputs found

Author retrospective for the dual data cache

Author: Aliagas Castell Carles
González Colás Antonio María
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

In this paper we present a retrospective on our paper published in ICS 1995, which to best of our knowledge was the first paper that introduced the concept of a cache memory with multiple subcaches, each tuned for a different type of locality. In this retrospective, we summarize the main ideas of the original paper and outline some of the later work that exploited similar ideas and could have been influenced by our original paper, including two actual industrial microprocessors.Peer ReviewedPostprint (author’s final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Author: Mittal Sparsh
Publication venue
Publication date: 01/01/2014
Field of study

Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

arXiv.org e-Print Archive

Crossref

Low-power data memory communication for application-specific embedded processors

Author: Alex Orailoglu
Peter Petrov
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Evaluación de Sistemas de Cache WEB particionadas en función del tamaño de los objetos

Author: León Medina Diego Alejandro
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 30/12/2011
Field of study

Los objetos WEB son cacheados en los proxies (caches WEB) reemplazando objetos grandes por muchos objetos pequeños, esto hace que las tasas de aciertos se vean afectadas en función del tamaño de los objetos que cachean, en este documento se evalúa experimentalmente como una jerarquía de proxies puede mejorar las tasas de aciertos.León Medina, DA. (2008). Evaluación de Sistemas de Cache WEB particionadas en función del tamaño de los objetos. http://hdl.handle.net/10251/14197Archivo delegad

RiuNet

Performance and power effectiveness in embedded processors - customizable partitioned caches

Author: A. Orailoglu
P. Petrov
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Software-assisted cache mechanisms for embedded systems

Author: Jain Prabhat
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2008
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (leaves 120-135).Embedded systems are increasingly using on-chip caches as part of their on-chip memory system. This thesis presents cache mechanisms to improve cache performance and provide opportunities to improve data availability that can lead to more predictable cache performance. The first cache mechanism presented is an intelligent cache replacement policy that utilizes information about dead data and data that is very frequently used. This mechanism is analyzed theoretically to show that the number of misses using intelligent cache replacement is guaranteed to be no more than the number of misses using traditional LRU replacement. Hardware and software-assisted mechanisms to implement intelligent cache replacement are presented and evaluated. The second cache mechanism presented is that of cache partitioning which exploits disjoint access sequences that do not overlap in the memory space. A theoretical result is proven that shows that modifying an access sequence into a concatenation of disjoint access sequences is guaranteed to improve the cache hit rate. Partitioning mechanisms inspired by the concept of disjoint sequences are designed and evaluated. A profit-based analysis, annotation, and simulation framework has been implemented to evaluate the cache mechanisms. This framework takes a compiled benchmark program and a set of program inputs and evaluates various cache mechanisms to provide a range of possible performance improvement scenarios. The proposed cache mechanisms have been evaluated using this framework by measuring cache miss rates and Instructions Per Clock (IPC) information. The results show that the proposed cache mechanisms show promise in improving cache performance and predictability with a modest increase in silicon area.by Prabhat Jain.Ph.D

DSpace@MIT

Datapath and memory co-optimization for FPGA-based computation

Author: Turkington Kieron
Turkington Kieron
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/01/2010
Field of study

With the large resource densities available on modern FPGAs it is often the available memory bandwidth that limits the parallelism (and therefore performance) that can be achieved. For this reason the focus of this thesis is the development of an integrated scheduling and memory optimisation methodology to allow high levels of parallelism to be exploited in FPGA based designs. A manual translation from C to hardware is first investigated as a case study, exposing a number of potential optimisation techniques that have not been exploited in existing work. An existing outer loop pipelining approach, originally developed for VLIW processors, is extended and adapted for application to FPGAs. The outer loop pipelining methodology is first developed to use a fixed memory subsystem design and then extended to automate the optimisation of the memory subsystem. This approach allocates arrays to physical memories and selects the set of data reuse structures to implement to match the available and required memory bandwidths as the pipelining search progresses. The final extension to this work is to include the partitioning of data from a single array across multiple physical memories, increasing the number of memory ports through which data my be accessed. The facility for loop unrolling is also added to increase the potential for parallelism and exploit the additional bandwidth that partitioning can provide. We describe our approach based on formal methodologies and present the results achieved when these methods are applied to a number of benchmarks. These results show the advantages of both extending pipelining to levels above the innermost loop and the co-optimisation of the datapath and memory subsystem

Spiral - Imperial College Digital Repository