Search CORE

67 research outputs found

Hardware Architectural Support for Caching Partitioned Reconfigurations in Reconfigurable Systems

Author: Chocano Gómez Abel
Clemente Barreira Juan Antonio
Gran Rubén
Prado Carlos del
Resano Javier
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

The efficiency of the reconfiguration process in modern FPGAs can improve drastically if an on-chip configuration memory is included in the system because it can reduce both the reconfiguration latency and its energy consumption. However, FPGA on-chip memory resources are very limited. Thus, it is very important to manage them effectively in order to improve the reconfiguration process as much as possible even when the size of the on-chip configuration memory is small. This paper presents a hardware implementation of an on-chip configuration memory controller that efficiently manages run-time reconfigurations. In order to optimize the use of the on-chip memory, this controller includes support to deal with configurations that have been divided into blocks of customizable size. When a reconfiguration must be carried out, our controller provides the blocks stored on-chip and looks for the remaining blocks by accessing to the off-chip configuration memory. Moreover, it dynamically decides which blocks must be stored on-chip. To this end, the designed controller implements a simple but efficient technique that allows maximizing the benefits of the on-chip memories. Experimental results will demonstrate that its implementation cost is very affordable and that it introduces negligible run-time management overheads

Docta Complutense

Repositorio Universidad de Zaragoza

Implementación hardware de un controlador de memoria cache de reconfiguraciones en VHDL

Author: Chocano Gómez Abel
Prado Mota Carlos del
Publication venue
Publication date: 01/01/2014
Field of study

Este proyecto presenta una implementación hardware de un controlador que gestiona de manera eficiente las reconfiguraciones que se realizan en tiempo de ejecución en un sistema que aplica cacheo de reconfiguraciones. Esta técnica consiste en utilizar una memoria on-chip que sirve de cache entre la memoria de configuración del dispositivo reconfigurable y la memoria principal, donde se guardarán todas y cada una de las reconfiguraciones que se quieran cargar en el dispositivo. La eficiencia de la técnica se puede mejorar particionando las configuraciones en bloques, y mapeando las configuraciones en diferentes memorias cache, en vez de en una sola. De este modo, dada una asignación de reconfiguraciones de tareas en diferentes memorias on-chip, el controlador hardware presentado gestiona la reconfiguración de las tareas de manera adecuada y eficiente. Los resultados experimentales que se presentan muestran que nuestro controlador realiza las operaciones necesarias en unos pocos cientos ciclos de reloj, mientras que su coste de implementación en términos de recursos hardware es muy asequible

Docta Complutense

Towards adaptive balanced computing (ABC) using reconfigurable functional caches (RFCs)

Author: Kim Hue-Sung
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2001
Field of study

The general-purpose computing processor performs a wide range of functions. Although the performance of general-purpose processors has been steadily increasing, certain software technologies like multimedia and digital signal processing applications demand ever more computing power. Reconfigurable computing has emerged to combine the versatility of general-purpose processors with the customization ability of ASICs. The basic premise of reconfigurability is to provide better performance and higher computing density than fixed configuration processors. Most of the research in reconfigurable computing is dedicated to on-chip functional logic. If computing resources are adaptable to the computing requirement, the maximum performance can be achieved. To overcome the gap between processor and memory technology, the size of on-chip cache memory has been consistently increasing. The larger cache memory capacity, though beneficial in general, does not guarantee a higher performance for all the applications as they may not utilize all of the cache efficiently. To utilize on-chip resources effectively and to accelerate the performance of multimedia applications specifically, we propose a new architecture---Adaptive Balanced Computing (ABC). ABC uses dynamic resource configuration of on-chip cache memory by integrating Reconfigurable Functional Caches (RFC). RFC can work as a conventional cache or as a specialized computing unit when necessary. In order to convert a cache memory to a computing unit, we include additional logic to embed multi-bit output LUTs into the cache structure. We add the reconfigurability of cache memory to a conventional processor with minimal modification to the load/store microarchitecture and with minimal compiler assistance. ABC architecture utilizes resources more efficiently by reconfiguring the cache memory to computing units dynamically. The area penalty for this reconfiguration is about 50--60% of the memory cell cache array-only area with faster cache access time. In a base array cache (parallel decoding caches), the area penalty is 10--20% of the data array with 1--2% increase in the cache access time. However, we save 27% for FIR and 44% for DCT/IDCT in area with respect to memory cell array cache and about 80% for both applications with respect to base array cache if we were to implement all these units separately (such as ASICs). The simulations with multimedia and DSP applications (DCT/IDCT and FIR/IIR) show that the resource configuration with the RFC speedups ranging from 1.04X to 3.94X in overall applications and from 2.61X to 27.4X in the core computations. The simulations with various parameters indicate that the impact of reconfiguration can be minimized if an appropriate cache organization is selected

Digital Repository @ Iowa State University (ISU)

Software instruction caching

Author: Miller Jason Eric, 1976-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2007
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 185-193).As microprocessor complexities and costs skyrocket, designers are looking for ways to simplify their designs to reduce costs, improve energy efficiency, or squeeze more computational elements on each chip. This is particularly true for the embedded domain where cost and energy consumption are paramount. Software instruction caches have the potential to provide the required performance while using simpler, more efficient hardware. A software cache consists of a simple array memory (such as a scratchpad) and a software system that is capable of automatically managing that memory as a cache. Software caches have several advantages over traditional hardware caches. Without complex cache-management logic, the processor hardware is cheaper and easier to design, verify and manufacture. The reduced access energy of simple memories can result in a net energy savings if management overhead is kept low. Software caches can also be customized to each individual program's needs, improving performance or eliminating unpredictable timing for real-time embedded applications. The greatest challenge for a software cache is providing good performance using general-purpose instructions for cache management rather than specially-designed hardware. This thesis designs and implements a working system (Flexicache) on an actual embedded processor and uses it to investigate the strengths and weaknesses of software instruction caches. Although both data and instruction caches can be implemented in software, very different techniques are used to optimize performance; this work focuses exclusively on software instruction caches. The Flexicache system consists of two software components: a static off-line preprocessor to add caching to an application and a dynamic runtime system to manage memory during execution. Key interfaces and optimizations are identified and characterized. The system is evaluated in detail from the standpoints of both performance and energy consumption. The results indicate that software instruction caches can perform comparably to hardware caches in embedded processors. On most benchmarks, the overhead relative to a hardware cache is less than 12% and can be as low as 2.4%. At the same time, the software cache uses up to 6% less energy. This is achieved using a simple, directly-addressed memory and without requiring any complex, specialized hardware structures.by Jason Eric Miller.Ph.D

DSpace@MIT

Recommended from our members

The Grand Challenge of Managing the Petascale Facility.

Author: Aiken R. J.
Science Mathematics and Computer
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 28/02/2007
Field of study

This report is the result of a study of networks and how they may need to evolve to support petascale leadership computing and science. As Dr. Ray Orbach, director of the Department of Energy's Office of Science, says in the spring 2006 issue of SciDAC Review, 'One remarkable example of growth in unexpected directions has been in high-end computation'. In the same article Dr. Michael Strayer states, 'Moore's law suggests that before the end of the next cycle of SciDAC, we shall see petaflop computers'. Given the Office of Science's strong leadership and support for petascale computing and facilities, we should expect to see petaflop computers in operation in support of science before the end of the decade, and DOE/SC Advanced Scientific Computing Research programs are focused on making this a reality. This study took its lead from this strong focus on petascale computing and the networks required to support such facilities, but it grew to include almost all aspects of the DOE/SC petascale computational and experimental science facilities, all of which will face daunting challenges in managing and analyzing the voluminous amounts of data expected. In addition, trends indicate the increased coupling of unique experimental facilities with computational facilities, along with the integration of multidisciplinary datasets and high-end computing with data-intensive computing; and we can expect these trends to continue at the petascale level and beyond. Coupled with recent technology trends, they clearly indicate the need for including capability petascale storage, networks, and experiments, as well as collaboration tools and programming environments, as integral components of the Office of Science's petascale capability metafacility. The objective of this report is to recommend a new cross-cutting program to support the management of petascale science and infrastructure. The appendices of the report document current and projected DOE computation facilities, science trends, and technology trends, whose combined impact can affect the manageability and stewardship of DOE's petascale facilities. This report is not meant to be all-inclusive. Rather, the facilities, science projects, and research topics presented are to be considered examples to clarify a point

UNT Digital Library

High Speed Cycle-Approximate Simulation of Embedded Cache-Incoherent and Coherent Chip-Multiprocessors

Author: Christopher Thompson
D Burger
D Chiou
Daniel Sanchez
E Argollo
ES Chung
J Chen
K Skadron
K Wang
M Monchiero
Miles Gould
MMK Martin
N Binkert
N Hardavellas
Nigel Topham
P Ren
PS Magnusson
SS Mukherjee
X Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2018
Field of study

Crossref

Edinburgh Research Explorer

Clock Synchronisation Assisted Clock and Data Recovery for Sub-Nanosecond Data Centre Optical Switching

Author: Clark Kari Aaron
Publication venue: UCL (University College London)
Publication date: 28/01/2021
Field of study

In current `Cloud' data centres, switching of data between servers is performed using deep hierarchies of interconnected electronic packet switches. Demand for network bandwidth from emerging data centre workloads, combined with the slowing of silicon transistor scaling, is leading to a widening gap between data centre traffic demand and electronically-switched data centre network capacity. All-optical switches could offer a future-proof alternative, with potentially under a third of the power consumption and cost of electronically-switched networks. However, the effective bandwidth of optical switches depends on their overall switching time. This is dominated by the clock and data recovery (CDR) locking time, which takes hundreds of nanoseconds in commercial receivers. Current data centre traffic is dominated by small packets that transmit in tens of nanoseconds, leading to low effective bandwidth, as a high proportion of receiver time is spent performing CDR locking instead of receiving data, removing the benefits of optical switching. High-performance optical switching requires sub-nanosecond CDR locking time to overcome this limitation. This thesis proposes, models, and demonstrates clock synchronisation assisted CDR, which can achieve this. This approach uses clock synchronisation to simplify the complexity of CDR versus previous asynchronous approaches. An analytical model of the technique is first derived that establishes its potential viability. Following this, two approaches to clock synchronisation assisted CDR are investigated: 1. Clock phase caching, which uses clock phase storage and regular updates in a 2km intra-building scale data centre network interconnected by single-mode optical fibre. 2. Single calibration clock synchronisation assisted CDR}, which leverages the 20 times lower thermal sensitivity of hollow core optical fibre versus single-mode fibre to synchronise a 100m cluster scale data centre network, with a single initial phase calibration step. Using a real-time FPGA-based optical switch testbed, sub-nanosecond CDR locking time was demonstrated for both approaches

UCL Discovery

Accelerated Profile HMM Searches

Author: A Jacob
A Krogh
A Milosavljević
A Wozniak
AA Schäffer
B Rekapalli
C Camacho
DR Horn
EK Freyhult
EM Gertz
G Chukkapalli
GA Price
J Landman
JP Walters
JP Walters
K Karplus
LR Rabiner
LS Johnson
M Farrar
M Madera
R Durbin
RD Finn
RP Maddimsetty
S Derrien
S Hunter
S Johnson
Sean R. Eddy
SF Altschul
SF Altschul
SF Altschul
SF Altschul
SJ Melnikoff
SR Eddy
T Oliver
T Rognes
T Rognes
TF Smith
V Chaudhary
V Sachdeva
William R. Pearson
WN Grundy
WR Pearson
Y Sun
Y Sun
YK Yu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, practical use of profile HMM methods has been hindered by the computational expense of existing software implementations. Here I describe an acceleration heuristic for profile HMMs, the “multiple segment Viterbi” (MSV) algorithm. The MSV algorithm computes an optimal sum of multiple ungapped local alignment segments using a striped vector-parallel approach previously described for fast Smith/Waterman alignment. MSV scores follow the same statistical distribution as gapped optimal local alignment scores, allowing rapid evaluation of significance of an MSV score and thus facilitating its use as a heuristic filter. I also describe a 20-fold acceleration of the standard profile HMM Forward/Backward algorithms using a method I call “sparse rescaling”. These methods are assembled in a pipeline in which high-scoring MSV hits are passed on for reanalysis with the full HMM Forward/Backward algorithm. This accelerated pipeline is implemented in the freely available HMMER3 software package. Performance benchmarks show that the use of the heuristic MSV filter sacrifices negligible sensitivity compared to unaccelerated profile HMM searches. HMMER3 is substantially more sensitive and 100- to 1000-fold faster than HMMER2. HMMER3 is now about as fast as BLAST for protein searches

Public Library of Science (PLOS)

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central