Search CORE

26 research outputs found

Significant papers from the First 25 Years of the FPL Conference

Author: Amano HC
Anderson J
Bertels K
Cardoso JMP
Diessel O
Gogniat G
Hutton M
Lee JK
Leong PHW
Luk W
Lysaght P
Platzner M
Prasanna VK
Rissa T
Silvano C
So HKH
Wang Y
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

The list of significant papers from the first 25 years of the Field-Programmable Logic and Applications conference (FPL) is presented in this paper. These 27 papers represent those which have most strongly influenced theory and practice in the field.postprin

HKU Scholars Hub

A fuzzy logic based dynamic reconfiguration scheme for optimal energy and throughput in symmetric chip multiprocessors

Author: McDonald-Maier Klaus
Qadri Muhammad Yasir
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2010
Field of study

Embedded systems architectures have traditionally often been investigated and designed in order to achieve a greater throughput combined with minimum energy consumption. With the advent of reconfigurable architectures it is now possible to support algorithms to find optimal solutions for an improved energy and throughput balance. As a result of ongoing research several online and offline techniques and algorithm have been proposed for hardware adaptation. This paper presents a novel coarse-grained reconfigurable symmetric chip multiprocessor (SCMP) architecture managed by a fuzzy logic engine that balances performance and energy consumption. The architecture incorporates reconfigurable level 1 (L1) caches, power gated cores and adaptive on-chip network routers to allow minimizing leakage energy effects for inactive components. A coarse grained architecture was selected as to be a focus for this study as it typically allows for fast reconfiguration as compared to the fine-grained architectures, thus making it more feasible to be used for runtime adaption schemes. The presented architecture is analyzed using a set of OpenMP based parallel benchmarks and the results show significant improvements in performance while maintaining minimum energy consumption

University of Essex Research Repository

Crossref

Fault-tolerant sub-lithographic design with rollback recovery

Author: DeHon André
Naeimi Helia
Publication venue: 'AIP Publishing'
Publication date: 19/03/2008
Field of study

Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme

Caltech Authors

ScholarlyCommons@Penn

Operating System Interfaces: Bridging the Gap Between CPU and FPGA Accelerators

Author: Burke Dan
Gelado Isaac
Hwang Kuangwei
Hwu Wen-Mei
Kelm John
Lumetta Steve
Navarro Nacho
Ueng Sain-Zee
Publication venue: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/10/2006
Field of study

Coordinated Science Laboratory was formerly known as Control Systems LaboratoryGigascale Systems Research Center / C8559_SA4241-79952_UC-Berkele

Illinois Digital Environment for Access to Learning and Scholarship Repository

Fault-tolerant sub-lithographic design with rollback recovery

Crossref

A study of on-chip FPGA system with 2D mesh network

Author: Keung Ka-ming
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2010
Field of study

The advance in fabrication technology hugely increases the number of available transistors on a single chip. It allows the industry to build the entire system on a single chip which was only realizable on a board in the past. On-chip System not only reduces the computer physical size, but also increases the computation performance because modules/cores/intellectual properties (IPs) are packed closely together. When simply increasing the clock frequency to increase the computer performance becomes harder because of the wire delay, putting more computation units on a single chip becomes a good alternative for improving computer performance. Building more cores on a chip in the future is expected. With many IPs on a chip, traditional bus is no longer able to provide enough bandwidth to support the communication between IPs. Providing a high performance on-chip network infrastructure for the IP communication becomes a key to high performance on-chip computation. This thesis focuses on an on-chip network supporting on-chip system. This thesis is composed of two main parts. In the first part, a high performance deadlock free dual-coded on-chip router using adaptive multicast routing is built. Compared with the traditional deterministic XY unicast router, this router can reduce both packet latency and energy consumption. In the second part, a co-processor placement algorithm for an on-chip system built from FPGAs with an on-chip network is proposed. The algorithm aims to place the communicating modules as close as possible. In addition, an algorithm for sharing a FPGA by multiple co-processors and an algorithm for supporting polymorphic co-processor are proposed to increase on-chip FPGA system throughput

Digital Repository @ Iowa State University (ISU)

Design and Implementation of Hardware Accelerators for Neural Processing Applications

Author: Mayannavar Shilpa
Wali Uday
Publication venue
Publication date: 24/01/2024
Field of study

Primary motivation for this work was the need to implement hardware accelerators for a newly proposed ANN structure called Auto Resonance Network (ARN) for robotic motion planning. ARN is an approximating feed-forward hierarchical and explainable network. It can be used in various AI applications but the application base was small. Therefore, the objective of the research was twofold: to develop a new application using ARN and to implement a hardware accelerator for ARN. As per the suggestions given by the Doctoral Committee, an image recognition system using ARN has been implemented. An accuracy of around 94% was achieved with only 2 layers of ARN. The network also required a small training data set of about 500 images. Publicly available MNIST dataset was used for this experiment. All the coding was done in Python. Massive parallelism seen in ANNs presents several challenges to CPU design. For a given functionality, e.g., multiplication, several copies of serial modules can be realized within the same area as a parallel module. Advantage of using serial modules compared to parallel modules under area constraints has been discussed. One of the module often useful in ANNs is a multi-operand addition. One problem in its implementation is that the estimation of carry bits when the number of operands changes. A theorem to calculate exact number of carry bits required for a multi-operand addition has been presented in the thesis which alleviates this problem. The main advantage of the modular approach to multi-operand addition is the possibility of pipelined addition with low reconfiguration overhead. This results in overall increase in throughput for large number of additions, typically seen in several DNN configurations

arXiv.org e-Print Archive