440 research outputs found
Modeling of glitch effects in FPGA based arithmetic circuits
Published versio
A Framework for the Detection of Crosstalk Noise in FPGAs
In recent years, crosstalk noise has emerged a serious problem because more and more devices and wires have been packed on electronic chips. As integrated circuits are migrated to more advanced technologies, it has become clear that crosstalk noise is the important phenomenon that must be taken into account. Despite of being more immune to crosstalk noise than their ASIC (application specific integrated circuit) counterparts, the dense interconnected structures of FPGAs (field programmable gate arrays) invite more vulnerabilities with crosstalk noise. Due to the lack of electrical detail concerning FPGA devices it is quite difficult to test the faults affected by crosstalk noise. This paper proposes a new approach for detecting the effects such as glitches and delays in transition that are due to crosstalk noise in FPGAs. This approach is similar to the BIST (built-in self test) technique in that it incorporates the test pattern generator to generate the test vectors and the analyzer to analyze the crosstalk faults without any overhead for testing
Recommended from our members
Energy Efficient Loop Unrolling for Low-Cost FPGAs
Many embedded applications implement block ciphers and sorting and searching algorithms which use multiple loop iterations for computation. These applications often demand low power operation. The power consumption of designs varies with the implementation choices made by designers. The sequential implementation of loop operations consumes minimal area, but latency and clock power are high. Alternatively, loop unrolling causes high glitch power. In this work, we propose a low area overhead approach for unrolling loop iterations that exhibits reduced glitch power. A latch based glitch filter is introduced that reduces the propagation of glitches from one iteration to next. We explore the optimal number of filters to be inserted for different applications that give a good balance between area and power. We also implement partial unrolling with glitch filters. This approach consumes less area while still giving energy savings comparable to the fully unrolled implementation.
Our approach is targeted to Xilinx and Altera FPGAs. We simulate different implementation choices and compare energy results to evaluate the savings. We demonstrate our approach on SIMON-128 and AES-256 block ciphers and a sorting algorithm. We prototype our design on Xilinx Artix-7 and Altera Cyclone-IV-GX FPGA development boards and measure the actual power savings. Results show up-to 90% dynamic energy reduction in Xilinx designs, and 97% reduction in Altera designs with our glitch filtering approach due to glitch power reduction
Implementation of Static and Semi-Static Versions of a 24+8x8 Quad-rail NULL Convention Multiply and Accumulate Unit
This paper focuses on implementing a 2s complement 8x8 dual-rail bit-wise pipelined multiplier using the asynchronous null convention logic (NCL) paradigm. The design utilizes a Wallace tree for partial product summation, and is implemented and simulated in VHDL, the transistor level, and the physical level, using a 1.8V 0.18mum TSMC CMOS process. The multiplier is realized using both static and semi-static versions of the NCL gates; and these two implementations are compared in terms of area, power, and speed
Pipeline-Based Power Reduction in FPGA Applications
This paper shows how temporal parallelism has an important role in the power dissipation reduction in the FPGA field. Glitches propagation is blocked by the flip-flops or registers in the pipeline. Several multiplication structures are implemented over modern FPGAs, StratixII and Virtex4, comparing their results with and without pipeline and hardware duplication
High-level power optimisation for Digital Signal Processing in Recon gurable Logic
This thesis is concerned with the optimisation of Digital Signal Processing (DSP) algorithm
implementations on recon gurable hardware via the selection of appropriate word-lengths
for the signals in these algorithms, in order to minimise system power consumption. Whilst
existing word-length optimisation work has concentrated on the minimisation of the area of
algorithm implementations, this work introduces the rst set of power consumption models
that can be evaluated quickly enough to be used within the search of the enormous design
space of multiple word-length optimisation problems. These models achieve their speed by
estimating both the power consumed within the arithmetic components of an algorithm
and the power in the routing wires that connect these components, using only a high-level
description of the algorithm itself. Trading o a small reduction in power model accuracy
for a large increase in speed is one of the major contributions of this thesis.
In addition to the work on power consumption modelling, this thesis also develops a
new technique for selecting the appropriate word-lengths for an algorithm implementation
in order to minimise its cost in terms of power (or some other metric for which models
are available). The method developed is able to provide tight lower and upper bounds on
the optimal cost that can be obtained for a particular word-length optimisation problem
and can, as a result, nd provably near-optimal solutions to word-length optimisation
problems without resorting to an NP-hard search of the design space.
Finally the costs of systems optimised via the proposed technique are compared to
those obtainable by word-length optimisation for minimisation of other metrics (such as
logic area) and the results compared, providing greater insight into the nature of wordlength
optimisation problems and the extent of the improvements obtainable by them
Fast self-reconfigurable embedded system on Spartan-3
Many image-processing algorithms require several stages to be processed that cannot
be resolved by embedded microprocessors in a reasonable time, due to their high-computational
cost. A set of dedicated coprocessors can accelerate the resolution of these algorithms, alt
hough
the main drawback is the area needed for their implementation. The main advantage of a
reconfigurable system is that several coprocessors designed to perform different operations can
be mapped on the same area in a time-multiplexed
way. This work presents the architecture of
an embedded system composed of a microprocessor and a run-time reconfigurable coprocessor,
mapped on Spartan-3, the low-cost family of Xilinx FPGAs. Designing reconfigurable systems
on Spartan-3 requires much design effort, since unlike higher cost families of Xilinx FPGAs,
this device does not officially support partial reconfiguration. In order to overcome this
drawback, the paper also describes the main steps used in the design flow to obtain a successful
design. The main goal of the presented architecture is to reduce the coprocessor reconfiguration
time, as well as accelerate image-processing algorithms. The experimental results demonstrate
significant improvement in both objectives. The reconfiguration rate nearly achieves 320 Mb/s
which is far superior to th
e previous related works.Peer ReviewedPostprint (published version
- …