Search CORE

64 research outputs found

Dynamically Reconfigurable Systolic Array Accelerators: A Case Study with Extended Kalman Filter and Discrete Wavelet Transform Algorithms

Author: Barnes Robert C
Publication venue: DigitalCommons@USU
Publication date: 01/05/2009
Field of study

Field programmable grid arrays (FPGA) are increasingly being adopted as the primary on-board computing system for autonomous deep space vehicles. There is a need to support several complex applications for navigation and image processing in a rapidly responsive on-board FPGA-based computer. This requires exploring and combining several design concepts such as systolic arrays, hardware-software partitioning, and partial dynamic reconfiguration. A microprocessor/co-processor design that can accelerate two single precision oating-point algorithms, extended Kalman lter and a discrete wavelet transform, is presented. This research makes three key contributions. (i) A polymorphic systolic array framework comprising of recofigurable partial region-based sockets to accelerate algorithms amenable to being mapped onto linear systolic arrays. When implemented on a low end Xilinx Virtex4 SX35 FPGA the design provides a speedup of at least 4.18x and 6.61x over a state of the art microprocessor used in spacecraft systems for the extended Kalman lter and discrete wavelet transform algorithms, respectively. (ii) Switchboxes to enable communication between static and partial reconfigurable regions and a simple protocol to enable schedule changes when a socket\u27s contents are dynamically reconfigured to alter the concurrency of the participating systolic arrays. (iii) A hybrid partial dynamic reconfiguration method that combines Xilinx early access partial reconfiguration, on-chip bitstream decompression, and bitstream relocation to enable fast scaling of systolic arrays on the PolySAF. This technique provided a 2.7x improvement in reconfiguration time compared to an o-chip partial reconfiguration technique that used a Flash card on the FPGA board, and a 44% improvement in BRAM usage compared to not using compression

DigitalCommons@USU

Accelerated Frame Data Relocation on Xilinx Field Programmable Gate Array

Author: Kallam Ramachandra
Publication venue: DigitalCommons@USU
Publication date: 01/05/2010
Field of study

Emerging reconfiguration techniques that include partial dynamic reconfiguration and partial bitstream relocation have been addressed in the past in order to expose the flexibility of field programmable gate array at runtime. Partial bitstream relocation is a technique used to target a partial bitstream of a partial reconfigurable region (PRR) onto other identical reconfigurable regions inside an FPGA, while partial dynamic reconfiguration is used to target a single reconfigurable region. Prior works in this domain aim to minimize relocation time with the help of on-chip or on-line processing. In this thesis, a novel PRR-PRR relocation algorithm is proposed and implemented both in software and hardware. Dedicated hardware architecture, called the accelerated relocation circuit (ARC), is designed and presented for fast relocation. An analytical model is also proposed to evaluate the performance of the PRR-PRR relocation algorithm and highlight the speed-up obtained by the proposed hardware implementation. ARC has been tested on two categories of designs: dynamically scalable systolic array designs and fault tolerant designs. It has been compared against the software implementation of the algorithm, BiRF, hardware architecture for bitstream relocation, and a software solution for bitstream relocation. An average speed-up of 153x for ARC over BiRF is observed, with the additional advantage of not storing any bitstreams, thus saving invaluable block random access memory (BRAMs). Accuracy of proposed analytical model was found to be more than 95% for all the test cases

DigitalCommons@USU

Recommended from our members

Algorithm Based Fault Tolerance in Massively Parallel Systems

Author: Lerner Mark D.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/1988
Field of study

An A complex computer system consists of billions of transistors, miles of wires, and many interactions with an unpredictable environment. Correct results must be produced despite faults that dynamically occur in some of these components. Many techniques have been developed for fault tolerant computation. General purpose methods are independent of the application, yet incur an overhead cost which may be unacceptable for massively parallel systems. Algorithm-specific methods, which can operate at lower cost, are a developing alternative [1, 72]. This paper first reviews the general-purpose approach and then focuses on the algorithm-specific method, with an eye toward massively parallel processors. Algorithm-based fault tolerance has the attraction of low overhead; furthermore it addresses both the detection and also the correction problems. The principle is to build low-cost checking and correcting mechanism based exclusively on the redundancies inherent in the system

Columbia University Academic Commons

Interconnect yield analysis and fault tolerance for field programmable gate arrays

Author: Campregher Nicola
Campregher Nicola
Publication venue
Publication date: 01/01/2007
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Hexarray: A Novel Self-Reconfigurable Hardware System

Author: Hussein Fady
Publication venue: 'IUScholarWorks'
Publication date: 01/05/2017
Field of study

Evolvable hardware (EHW) is a powerful autonomous system for adapting and finding solutions within a changing environment. EHW consists of two main components: a reconfigurable hardware core and an evolutionary algorithm. The majority of prior research focuses on improving either the reconfigurable hardware or the evolutionary algorithm in place, but not both. Thus, current implementations suffer from being application oriented and having slow reconfiguration times, low efficiencies, and less routing flexibility. In this work, a novel evolvable hardware platform is proposed that combines a novel reconfigurable hardware core and a novel evolutionary algorithm. The proposed reconfigurable hardware core is a systolic array, which is called HexArray. HexArray was constructed using processing elements with a redesigned architecture, called HexCells, which provide routing flexibility and support for hybrid reconfiguration schemes. The improved evolutionary algorithm is a genome-aware genetic algorithm (GAGA) that accelerates evolution. Guided by a fitness function the GAGA utilizes context-aware genetic operators to evolve solutions. The operators are genome-aware constrained (GAC) selection, genome-aware mutation (GAM), and genome-aware crossover (GAX). The GAC selection operator improves parallelism and reduces the redundant evaluations. The GAM operator restricts the mutation to the part of the genome that affects the selected output. The GAX operator cascades, interleaves, or parallel-recombines genomes at the cell level to generate better genomes. These operators improve evolution while not limiting the algorithm from exploring all areas of a solution space. The system was implemented on a SoC that includes a programmable logic (i.e., field-programmable gate array) to realize the HexArray and a processing system to execute the GAGA. A computationally intensive application that evolves adaptive filters for image processing was chosen as a case study and used to conduct a set of experiments to prove the developed system robustness. Through an iterative process using the genetic operators and a fitness function, the EHW system configures and adapts itself to evolve fitter solutions. In a relatively short time (e.g., seconds), HexArray is able to evolve autonomously to the desired filter. By exploiting the routing flexibility in the HexArray architecture, the EHW has a simple yet effective mechanism to detect and tolerate faulty cells, which improves system reliability. Finally, a mechanism that accelerates the evolution process by hiding the reconfiguration time in an “evolve-while-reconfigure” process is presented. In this process, the GAGA utilizes the array routing flexibility to bypass cells that are being configured and evaluates several genomes in parallel

Boise State University - ScholarWorks

Fault tolerance issues in nanoelectronics

Author: Spagocci S.
Publication venue: UCL (University College London)
Publication date: 01/11/2008
Field of study

The astonishing success story of microelectronics cannot go on indefinitely. In fact, once devices reach the few-atom scale (nanoelectronics), transient quantum effects are expected to impair their behaviour. Fault tolerant techniques will then be required. The aim of this thesis is to investigate the problem of transient errors in nanoelectronic devices. Transient error rates for a selection of nanoelectronic gates, based upon quantum cellular automata and single electron devices, in which the electrostatic interaction between electrons is used to create Boolean circuits, are estimated. On the bases of such results, various fault tolerant solutions are proposed, for both logic and memory nanochips. As for logic chips, traditional techniques are found to be unsuitable. A new technique, in which the voting approach of triple modular redundancy (TMR) is extended by cascading TMR units composed of nanogate clusters, is proposed and generalised to other voting approaches. For memory chips, an error correcting code approach is found to be suitable. Various codes are considered and a lookup table approach is proposed for encoding and decoding. We are then able to give estimations for the redundancy level to be provided on nanochips, so as to make their mean time between failures acceptable. It is found that, for logic chips, space redundancies up to a few tens are required, if mean times between failures have to be of the order of a few years. Space redundancy can also be traded for time redundancy. As for memory chips, mean times between failures of the order of a few years are found to imply both space and time redundancies of the order of ten

UCL Discovery

Dynamic partial reconfiguration management for high performance and reliability in FPGAs

Author: Ebrahim Ali
Publication venue: The University of Edinburgh
Publication date: 26/11/2015
Field of study

Modern Field-Programmable Gate Arrays (FPGAs) are no longer used to implement small “glue logic” circuitries. The high-density of reconfigurable logic resources in today’s FPGAs enable the implementation of large systems in a single chip. FPGAs are highly flexible devices; their functionality can be altered by simply loading a new binary file in their configuration memory. While the flexibility of FPGAs is comparable to General-Purpose Processors (GPPs), in the sense that different functions can be performed using the same hardware, the performance gain that can be achieved using FPGAs can be orders of magnitudes higher as FPGAs offer the ability for customisation of parallel computational architectures. Dynamic Partial Reconfiguration (DPR) allows for changing the functionality of certain blocks on the chip while the rest of the FPGA is operational. DPR has sparked the interest of researchers to explore new computational platforms where computational tasks are off-loaded from a main CPU to be executed using dedicated reconfigurable hardware accelerators configured on demand at run-time. By having a battery of custom accelerators which can be swapped in and out of the FPGA at runtime, a higher computational density can be achieved compared to static systems where the accelerators are bound to fixed locations within the chip. Furthermore, the ability of relocating these accelerators across several locations on the chip allows for the implementation of adaptive systems which can mitigate emerging faults in the FPGA chip when operating in harsh environments. By porting the appropriate fault mitigation techniques in such computational platforms, the advantages of FPGAs can be harnessed in different applications in space and military electronics where FPGAs are usually seen as unreliable devices due to their sensitivity to radiation and extreme environmental conditions. In light of the above, this thesis investigates the deployment of DPR as: 1) a method for enhancing performance by efficient exploitation of the FPGA resources, and 2) a method for enhancing the reliability of systems intended to operate in harsh environments. Achieving optimal performance in such systems requires an efficient internal configuration management system to manage the reconfiguration and execution of the reconfigurable modules in the FPGA. In addition, the system needs to support “fault-resilience” features by integrating parameterisable fault detection and recovery capabilities to meet the reliability standard of fault-tolerant applications. This thesis addresses all the design and implementation aspects of an Internal Configuration Manger (ICM) which supports a novel bitstream relocation model to enable the placement of relocatable accelerators across several locations on the FPGA chip. In addition to supporting all the configuration capabilities required to implement a Reconfigurable Operating System (ROS), the proposed ICM also supports the novel multiple-clone configuration technique which allows for cloning several instances of the same hardware accelerator at the same time resulting in much shorter configuration time compared to traditional configuration techniques. A faulttolerant (FT) version of the proposed ICM which supports a comprehensive faultrecovery scheme is also introduced in this thesis. The proposed FT-ICM is designed with a much smaller area footprint compared to Triple Modular Redundancy (TMR) hardening techniques while keeping a comparable level of fault-resilience. The capabilities of the proposed ICM system are demonstrated with two novel applications. The first application demonstrates a proof-of-concept reliable FPGA server solution used for executing encryption/decryption queries. The proposed server deploys bitstream relocation and modular redundancy to mitigate both permanent and transient faults in the device. It also deploys a novel Built-In Self- Test (BIST) diagnosis scheme, specifically designed to detect emerging permanent faults in the system at run-time. The second application is a data mining application where DPR is used to increase the computational density of a system used to implement the Frequent Itemset Mining (FIM) problem

Edinburgh Research Archive

Fault-Tolerant Computing: An Overview

Author: Banerjee P.
Fuchs W.K.
Horst R.
Iyer R.K.
Patel J.H.
Publication venue: Center for Reliable and High-Performance Computing, Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/06/1991
Field of study

Coordinated Science Laboratory was formerly known as Control Systems LaboratoryNASA / NAG-1-613Semiconductor Research Corporation / 90-DP-109Joint Services Electronics Program / N00014-90-J-127

Illinois Digital Environment for Access to Learning and Scholarship Repository

Self test and self repair strategies in VLSI architectures for high speed digital correlation

Author: Blackley William Sinclair
Publication venue: The University of Edinburgh
Publication date: 01/01/1985
Field of study

Edinburgh Research Archive

Fault tolerance in space-based digital signal processing and switching systems: Protecting up-link processing resources, demultiplexer, demodulator, and decoder

Author: Redinbo Robert
Publication venue
Publication date
Field of study

Fault tolerance features in the first three major subsystems appearing in the next generation of communications satellites are described. These satellites will contain extensive but efficient high-speed processing and switching capabilities to support the low signal strengths associated with very small aperture terminals. The terminals' numerous data channels are combined through frequency division multiplexing (FDM) on the up-links and are protected individually by forward error-correcting (FEC) binary convolutional codes. The front-end processing resources, demultiplexer, demodulators, and FEC decoders extract all data channels which are then switched individually, multiplexed, and remodulated before retransmission to earth terminals through narrow beam spot antennas. Algorithm based fault tolerance (ABFT) techniques, which relate real number parity values with data flows and operations, are used to protect the data processing operations. The additional checking features utilize resources that can be substituted for normal processing elements when resource reconfiguration is required to replace a failed unit

NASA Technical Reports Server