26 research outputs found

    Significant papers from the First 25 Years of the FPL Conference

    Get PDF
    The list of significant papers from the first 25 years of the Field-Programmable Logic and Applications conference (FPL) is presented in this paper. These 27 papers represent those which have most strongly influenced theory and practice in the field.postprin

    A fuzzy logic based dynamic reconfiguration scheme for optimal energy and throughput in symmetric chip multiprocessors

    Get PDF
    Embedded systems architectures have traditionally often been investigated and designed in order to achieve a greater throughput combined with minimum energy consumption. With the advent of reconfigurable architectures it is now possible to support algorithms to find optimal solutions for an improved energy and throughput balance. As a result of ongoing research several online and offline techniques and algorithm have been proposed for hardware adaptation. This paper presents a novel coarse-grained reconfigurable symmetric chip multiprocessor (SCMP) architecture managed by a fuzzy logic engine that balances performance and energy consumption. The architecture incorporates reconfigurable level 1 (L1) caches, power gated cores and adaptive on-chip network routers to allow minimizing leakage energy effects for inactive components. A coarse grained architecture was selected as to be a focus for this study as it typically allows for fast reconfiguration as compared to the fine-grained architectures, thus making it more feasible to be used for runtime adaption schemes. The presented architecture is analyzed using a set of OpenMP based parallel benchmarks and the results show significant improvements in performance while maintaining minimum energy consumption

    Fault-tolerant sub-lithographic design with rollback recovery

    Get PDF
    Shrinking feature sizes and energy levels coupled with high clock rates and decreasing node capacitance lead us into a regime where transient errors in logic cannot be ignored. Consequently, several recent studies have focused on feed-forward spatial redundancy techniques to combat these high transient fault rates. To complement these studies, we analyze fine-grained rollback techniques and show that they can offer lower spatial redundancy factors with no significant impact on system performance for fault rates up to one fault per device per ten million cycles of operation (Pf = 10^-7) in systems with 10^12 susceptible devices. Further, we concretely demonstrate these claims on nanowire-based programmable logic arrays. Despite expensive rollback buffers and general-purpose, conservative analysis, we show the area overhead factor of our technique is roughly an order of magnitude lower than a gate level feed-forward redundancy scheme

    Operating System Interfaces: Bridging the Gap Between CPU and FPGA Accelerators

    Get PDF
    Coordinated Science Laboratory was formerly known as Control Systems LaboratoryGigascale Systems Research Center / C8559_SA4241-79952_UC-Berkele

    A study of on-chip FPGA system with 2D mesh network

    Get PDF
    The advance in fabrication technology hugely increases the number of available transistors on a single chip. It allows the industry to build the entire system on a single chip which was only realizable on a board in the past. On-chip System not only reduces the computer physical size, but also increases the computation performance because modules/cores/intellectual properties (IPs) are packed closely together. When simply increasing the clock frequency to increase the computer performance becomes harder because of the wire delay, putting more computation units on a single chip becomes a good alternative for improving computer performance. Building more cores on a chip in the future is expected. With many IPs on a chip, traditional bus is no longer able to provide enough bandwidth to support the communication between IPs. Providing a high performance on-chip network infrastructure for the IP communication becomes a key to high performance on-chip computation. This thesis focuses on an on-chip network supporting on-chip system. This thesis is composed of two main parts. In the first part, a high performance deadlock free dual-coded on-chip router using adaptive multicast routing is built. Compared with the traditional deterministic XY unicast router, this router can reduce both packet latency and energy consumption. In the second part, a co-processor placement algorithm for an on-chip system built from FPGAs with an on-chip network is proposed. The algorithm aims to place the communicating modules as close as possible. In addition, an algorithm for sharing a FPGA by multiple co-processors and an algorithm for supporting polymorphic co-processor are proposed to increase on-chip FPGA system throughput

    Design and Implementation of Hardware Accelerators for Neural Processing Applications

    Full text link
    Primary motivation for this work was the need to implement hardware accelerators for a newly proposed ANN structure called Auto Resonance Network (ARN) for robotic motion planning. ARN is an approximating feed-forward hierarchical and explainable network. It can be used in various AI applications but the application base was small. Therefore, the objective of the research was twofold: to develop a new application using ARN and to implement a hardware accelerator for ARN. As per the suggestions given by the Doctoral Committee, an image recognition system using ARN has been implemented. An accuracy of around 94% was achieved with only 2 layers of ARN. The network also required a small training data set of about 500 images. Publicly available MNIST dataset was used for this experiment. All the coding was done in Python. Massive parallelism seen in ANNs presents several challenges to CPU design. For a given functionality, e.g., multiplication, several copies of serial modules can be realized within the same area as a parallel module. Advantage of using serial modules compared to parallel modules under area constraints has been discussed. One of the module often useful in ANNs is a multi-operand addition. One problem in its implementation is that the estimation of carry bits when the number of operands changes. A theorem to calculate exact number of carry bits required for a multi-operand addition has been presented in the thesis which alleviates this problem. The main advantage of the modular approach to multi-operand addition is the possibility of pipelined addition with low reconfiguration overhead. This results in overall increase in throughput for large number of additions, typically seen in several DNN configurations
    corecore