90 research outputs found

    Design and Implementation of Fault Tolerant Adders on Field Programmable Gate Arrays

    Get PDF
    Fault tolerance on various adder architectures implemented on Field Programmable Gate Arrays (FPGAs) is studied in this thesis. This involves developing error detection and correction techniques for the sparse Kogge-Stone adder and comparing it with Triple Modular Redundancy (TMR) techniques. Fault tolerance is implemented on a Kogge-Stone adder by taking advantage of the inherent redundancy in the carry tree. On a sparse Kogge-Stone adder, fault tolerance is realized by introducing additional ripple carry adders into the design. The implementation of this fault tolerance approach on the sparse Kogge-Stone adder is successfully completed and verified by introducing faults either on the ripple carry adder or in the carry tree. Two types of Xilinx FPGAs were used in this study: the Spartan 3E and Virtex 5. The fault tolerant adders were analyzed in terms of their delay and resource utilization as a function of the widths of the adders. The results of this research provide important design guidelines for the implementation of fault tolerant adders on FPGAs. The Triple Modular Redundancy-Ripple Carry Adder (TMR-RCA) is the most efficient approach for fault tolerant design on an FPGA in terms of its resources due to its simplicity and the ability to take advantage of the fast-carry chain. However, for very large bit widths, there are indications that the sparse Kogge-Stone adder offers superior performance over an RCA when implemented on an FPGA. Two fault tolerant approaches were implemented using a sparse Kogge-Stone architecture. First, a fault tolerant sparse Kogge-Stone adder is designed by taking advantage of the existing ripple carry adders in the architecture and adopting a similar approach to the TMR-RCA by inserting two additional ripple carry adders into the design. Second, a graceful degradation approach is implemented with the sparse Kogge-Stone adder. In this approach, a faulty block is permanently replaced with a spare block. As the spare block is initially used for fault checking, the fault tolerant capability of the circuit is degraded in order to continue fault-free operation. The adder delay is smaller for the graceful degradation approach by approximately 1 ns from measured results and 2 ns from the synthesis results independent of the bit widths when compared with the fault tolerant Kogge-Stone adder. However, the resource utilization is similar for both adders

    FPGA ARCHITECTURE AND VERIFICATION OF BUILT IN SELF-TEST (BIST) FOR 32-BIT ADDER/SUBTRACTER USING DE0-NANO FPGA AND ANALOG DISCOVERY 2 HARDWARE

    Get PDF
    The integrated circuit (IC) is an integral part of everyday modern technology, and its application is very attractive to hardware and software design engineers because of its versatility, integration, power consumption, cost, and board area reduction. IC is available in various types such as Field Programming Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), System on Chip (SoC) architecture, Digital Signal Processing (DSP), microcontrollers (μC), and many more. With technology demand focused on faster, low power consumption, efficient IC application, design engineers are facing tremendous challenges in developing and testing integrated circuits that guaranty functionality, high fault coverage, and reliability as the transistor technology is shrinking to the point where manufacturing defects of ICs are affecting yield which associates with the increased cost of the part. The competitive IC market is pressuring manufactures of ICs to develop and market IC in a relatively quick turnaround which in return requires design and verification engineers to develop an integrated self-test structure that would ensure fault-free and the quality product is delivered on the market. 70-80% of IC design is spent on verification and testing to ensure high quality and reliability for the enduser. To test complex and sophisticated IC designs, the verification engineers must produce laborious and costly test fixtures which affect the cost of the part on the competitive market. To avoid increasing the part cost due to yield and test time to the end-user and to keep up with the competitive market many IC design engineers are deviating from complex external test fixture approach and are focusing on integrating Built-in Self-Test (BIST) or Design for Test (DFT) techniques onto IC’s which would reduce time to market but still guarantee high coverage for the product. Understanding the BIST, the architecture, as well as the application of IC, must be understood before developing IC. The architecture of FPGA is elaborated in this paper followed by several BIST techniques and applications of those BIST relative to FPGA, SoC, analog to digital (ADC), or digital to analog converters (DAC) that are integrated on IC. Paper is concluded with verification of BIST for the 32-bit adder/subtracter designed in Quartus II software using the Analog Discovery 2 module as stimulus and DE0-NANO FPGA board for verification

    Techniques for Efficient Implementation of FIR and Particle Filtering

    Full text link

    Comparison of logarithmic and floating-point number systems implemented on Xilinx Virtex-II field-programmable gate arrays

    Get PDF
    The aim of this thesis is to compare the implementation of parameterisable LNS (logarithmic number system) and floating-point high dynamic range number systems on FPGA. The Virtex/Virtex-II range of FPGAs from Xilinx, which are the most popular FPGA technology, are used to implement the designs. The study focuses on using the low level primitives of the technology in an efficient way and so initially the design issues in implementing fixed-point operators are considered. The four basic operations of addition, multiplication, division and square root are considered. Carry- free adders, ripple-carry adders, parallel multipliers and digit recurrence division and square root are discussed. The floating-point operators use the word format and exceptions as described by the IEEE std-754. A dual-path adder implementation is described in detail, as are floating-point multiplier, divider and square root components. Results and comparisons with other works are given. The efficient implementation of function evaluation methods is considered next. An overview of current FPGA methods is given and a new piecewise polynomial implementation using the Taylor series is presented and compared with other designs in the literature. In the next section the LNS word format, accuracy and exceptions are described and two new LNS addition/subtraction function approximations are described. The algorithms for performing multiplication, division and powering in the LNS domain are also described and are compared with other designs in the open literature. Parameterisable conversion algorithms to convert to/from the fixed-point domain from/to the LNS and floating-point domain are described and implementation results given. In the next chapter MATLAB bit-true software models are given that have the exact functionality as the hardware models. The interfaces of the models are given and a serial communication system to perform low speed system tests is described. A comparison of the LNS and floating-point number systems in terms of area and delay is given. Different functions implemented in LNS and floating-point arithmetic are also compared and conclusions are drawn. The results show that when the LNS is implemented with a 6-bit or less characteristic it is superior to floating-point. However, for larger characteristic lengths the floating-point system is more efficient due to the delay and exponential area increase of the LNS addition operator. The LNS is beneficial for larger characteristics than 6-bits only for specialist applications that require a high portion of division, multiplication, square root, powering operations and few additions

    FuncTeller: How Well Does eFPGA Hide Functionality?

    Full text link
    Hardware intellectual property (IP) piracy is an emerging threat to the global supply chain. Correspondingly, various countermeasures aim to protect hardware IPs, such as logic locking, camouflaging, and split manufacturing. However, these countermeasures cannot always guarantee IP security. A malicious attacker can access the layout/netlist of the hardware IP protected by these countermeasures and further retrieve the design. To eliminate/bypass these vulnerabilities, a recent approach redacts the design's IP to an embedded field-programmable gate array (eFPGA), disabling the attacker's access to the layout/netlist. eFPGAs can be programmed with arbitrary functionality. Without the bitstream, the attacker cannot recover the functionality of the protected IP. Consequently, state-of-the-art attacks are inapplicable to pirate the redacted hardware IP. In this paper, we challenge the assumed security of eFPGA-based redaction. We present an attack to retrieve the hardware IP with only black-box access to a programmed eFPGA. We observe the effect of modern electronic design automation (EDA) tools on practical hardware circuits and leverage the observation to guide our attack. Thus, our proposed method FuncTeller selects minterms to query, recovering the circuit function within a reasonable time. We demonstrate the effectiveness and efficiency of FuncTeller on multiple circuits, including academic benchmark circuits, Stanford MIPS processor, IBEX processor, Common Evaluation Platform GPS, and Cybersecurity Awareness Worldwide competition circuits. Our results show that FuncTeller achieves an average accuracy greater than 85% over these tested circuits retrieving the design's functionality.Comment: To be published in the proceedings of the 32st USENIX Security Symposium, 202

    Variable block size motion estimation hardware for video encoders.

    Get PDF
    Li, Man Ho.Thesis submitted in: November 2006.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 137-143).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation --- p.3Chapter 1.2 --- The objectives of this thesis --- p.4Chapter 1.3 --- Contributions --- p.5Chapter 1.4 --- Thesis structure --- p.6Chapter 2 --- Digital video compression --- p.8Chapter 2.1 --- Introduction --- p.8Chapter 2.2 --- Fundamentals of lossy video compression --- p.9Chapter 2.2.1 --- Video compression and human visual systems --- p.10Chapter 2.2.2 --- Representation of color --- p.10Chapter 2.2.3 --- Sampling methods - frames and fields --- p.11Chapter 2.2.4 --- Compression methods --- p.11Chapter 2.2.5 --- Motion estimation --- p.12Chapter 2.2.6 --- Motion compensation --- p.13Chapter 2.2.7 --- Transform --- p.13Chapter 2.2.8 --- Quantization --- p.14Chapter 2.2.9 --- Entropy Encoding --- p.14Chapter 2.2.10 --- Intra-prediction unit --- p.14Chapter 2.2.11 --- Deblocking filter --- p.15Chapter 2.2.12 --- Complexity analysis of on different com- pression stages --- p.16Chapter 2.3 --- Motion estimation process --- p.16Chapter 2.3.1 --- Block-based matching method --- p.16Chapter 2.3.2 --- Motion estimation procedure --- p.18Chapter 2.3.3 --- Matching Criteria --- p.19Chapter 2.3.4 --- Motion vectors --- p.21Chapter 2.3.5 --- Quality judgment --- p.22Chapter 2.4 --- Block-based matching algorithms for motion estimation --- p.23Chapter 2.4.1 --- Full search (FS) --- p.23Chapter 2.4.2 --- Three-step search (TSS) --- p.24Chapter 2.4.3 --- Two-dimensional Logarithmic Search Algorithm (2D-log search) --- p.25Chapter 2.4.4 --- Diamond Search (DS) --- p.25Chapter 2.4.5 --- Fast full search (FFS) --- p.26Chapter 2.5 --- Complexity analysis of motion estimation --- p.27Chapter 2.5.1 --- Different searching algorithms --- p.28Chapter 2.5.2 --- Fixed-block size motion estimation --- p.28Chapter 2.5.3 --- Variable block size motion estimation --- p.29Chapter 2.5.4 --- Sub-pixel motion estimation --- p.30Chapter 2.5.5 --- Multi-reference frame motion estimation . --- p.30Chapter 2.6 --- Picture quality analysis --- p.31Chapter 2.7 --- Summary --- p.32Chapter 3 --- Arithmetic for video encoding --- p.33Chapter 3.1 --- Introduction --- p.33Chapter 3.2 --- Number systems --- p.34Chapter 3.2.1 --- Non-redundant Number System --- p.34Chapter 3.2.2 --- Redundant number system --- p.36Chapter 3.3 --- Addition/subtraction algorithm --- p.38Chapter 3.3.1 --- Non-redundant number addition --- p.39Chapter 3.3.2 --- Carry-save number addition --- p.39Chapter 3.3.3 --- Signed-digit number addition --- p.40Chapter 3.4 --- Bit-serial algorithms --- p.42Chapter 3.4.1 --- Least-significant-bit (LSB) first mode --- p.42Chapter 3.4.2 --- Most-significant-bit (MSB) first mode --- p.43Chapter 3.5 --- Absolute difference algorithm --- p.44Chapter 3.5.1 --- Non-redundant algorithm for absolute difference --- p.44Chapter 3.5.2 --- Redundant algorithm for absolute difference --- p.45Chapter 3.6 --- Multi-operand addition algorithm --- p.47Chapter 3.6.1 --- Bit-parallel non-redundant adder tree implementation --- p.47Chapter 3.6.2 --- Bit-parallel carry-save adder tree implementation --- p.49Chapter 3.6.3 --- Bit serial signed digit adder tree implementation --- p.49Chapter 3.7 --- Comparison algorithms --- p.50Chapter 3.7.1 --- Non-redundant comparison algorithm --- p.51Chapter 3.7.2 --- Signed-digit comparison algorithm --- p.52Chapter 3.8 --- Summary --- p.53Chapter 4 --- VLSI architectures for video encoding --- p.54Chapter 4.1 --- Introduction --- p.54Chapter 4.2 --- Implementation platform - (FPGA) --- p.55Chapter 4.2.1 --- Basic FPGA architecture --- p.55Chapter 4.2.2 --- DSP blocks in FPGA device --- p.56Chapter 4.2.3 --- Advantages employing FPGA --- p.57Chapter 4.2.4 --- Commercial FPGA Device --- p.58Chapter 4.3 --- Top level architecture of motion estimation processor --- p.59Chapter 4.4 --- Bit-parallel architectures for motion estimation --- p.60Chapter 4.4.1 --- Systolic arrays --- p.60Chapter 4.4.2 --- Mapping of a motion estimation algorithm onto systolic array --- p.61Chapter 4.4.3 --- 1-D systolic array architecture (LA-ID) --- p.63Chapter 4.4.4 --- 2-D systolic array architecture (LA-2D) --- p.64Chapter 4.4.5 --- 1-D Tree architecture (GA-1D) --- p.64Chapter 4.4.6 --- 2-D Tree architecture (GA-2D) --- p.65Chapter 4.4.7 --- Variable block size support in bit-parallel architectures --- p.66Chapter 4.5 --- Bit-serial motion estimation architecture --- p.68Chapter 4.5.1 --- Data Processing Direction --- p.68Chapter 4.5.2 --- Algorithm mapping and dataflow design . --- p.68Chapter 4.5.3 --- Early termination scheme --- p.69Chapter 4.5.4 --- Top-level architecture --- p.70Chapter 4.5.5 --- Non redundant positive number to signed digit conversion --- p.71Chapter 4.5.6 --- Signed-digit adder tree --- p.73Chapter 4.5.7 --- SAD merger --- p.74Chapter 4.5.8 --- Signed-digit comparator --- p.75Chapter 4.5.9 --- Early termination controller --- p.76Chapter 4.5.10 --- Data scheduling and timeline --- p.80Chapter 4.6 --- Decision metric in different architectural types . . --- p.80Chapter 4.6.1 --- Throughput --- p.81Chapter 4.6.2 --- Memory bandwidth --- p.83Chapter 4.6.3 --- Silicon area occupied and power consump- tion --- p.83Chapter 4.7 --- Architecture selection for different applications . . --- p.84Chapter 4.7.1 --- CIF and QCIF resolution --- p.84Chapter 4.7.2 --- SDTV resolution --- p.85Chapter 4.7.3 --- HDTV resolution --- p.85Chapter 4.8 --- Summary --- p.86Chapter 5 --- Results and comparison --- p.87Chapter 5.1 --- Introduction --- p.87Chapter 5.2 --- Implementation details --- p.87Chapter 5.2.1 --- Bit-parallel 1-D systolic array --- p.88Chapter 5.2.2 --- Bit-parallel 2-D systolic array --- p.89Chapter 5.2.3 --- Bit-parallel Tree architecture --- p.90Chapter 5.2.4 --- MSB-first bit-serial design --- p.91Chapter 5.3 --- Comparison between motion estimation architectures --- p.93Chapter 5.3.1 --- Throughput and latency --- p.93Chapter 5.3.2 --- Occupied resources --- p.94Chapter 5.3.3 --- Memory bandwidth --- p.95Chapter 5.3.4 --- Motion estimation algorithm --- p.95Chapter 5.3.5 --- Power consumption --- p.97Chapter 5.4 --- Comparison to ASIC and FPGA architectures in past literature --- p.99Chapter 5.5 --- Summary --- p.101Chapter 6 --- Conclusion --- p.102Chapter 6.1 --- Summary --- p.102Chapter 6.1.1 --- Algorithmic optimizations --- p.102Chapter 6.1.2 --- Architecture and arithmetic optimizations --- p.103Chapter 6.1.3 --- Implementation on a FPGA platform . . . --- p.104Chapter 6.2 --- Future work --- p.106Chapter A --- VHDL Sources --- p.108Chapter A.1 --- Online Full Adder --- p.108Chapter A.2 --- Online Signed Digit Full Adder --- p.109Chapter A.3 --- Online Pull Adder Tree --- p.110Chapter A.4 --- SAD merger --- p.112Chapter A.5 --- Signed digit adder tree stage (top) --- p.116Chapter A.6 --- Absolute element --- p.118Chapter A.7 --- Absolute stage (top) --- p.119Chapter A.8 --- Online comparator element --- p.120Chapter A.9 --- Comparator stage (top) --- p.122Chapter A.10 --- MSB-first motion estimation processor --- p.134Bibliography --- p.13

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Embedded electronic systems driven by run-time reconfigurable hardware

    Get PDF
    Abstract This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnología hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando así su implementación física –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnología a través del prototipado de varias aplicaciones de ingeniería (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinàmicament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinàmica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant així la seva implementació física –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware estàtic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria
    • …
    corecore