300 research outputs found

    Memory-efficient and fast run-time reconfiguration of regularly structured designs

    Get PDF
    Previous work has shown that run-time reconfiguration of FPGAs benefits greatly from the use of Tunable LUT (TLUT) circuits. These can be rapidly transformed into a specialized LUT circuit and are also very memory efficient when representing regularly structured designs, where the same hardware module is instantiated many times. However, the memory requirements and reconfiguration time of a run-time reconfigurable application are also dependent on the reconfiguration mechanism. In this paper, we will show that the memory requirements of conventional ICAP reconfiguration grow very fast with the number of modules, resulting in excessive memory usage. We propose to use Shift-Register-LUT (SRL) reconfiguration which is faster and results in a memory usage that is independent of the number of modules

    Techniques for low-overhead dynamic partial reconfiguration of FPGAs

    Get PDF

    Computing moments of a binary horizontally/vertically convex image using run-time reconfiguration

    Get PDF
    In this thesis, we present a design for computing moments of a binary horizontally/vertically convex image on an FPGA chip, using run-time reconfiguration. We compute the moments of up to third order for a total of 16 moments. We address how run-time reconfiguration speeds up moment computations without taking up huge hardware resources. Since we are considering a binary horizontally/vertically convex image, we look at an alternative method in moment computations that utilizes constant coefficient multipliers. We divide the image into segments and process one segment at a time. We reconfigure the constant coefficient multipliers before processing the next segment. This thesis also looks at the interactions between different logic units for moment computations. We provide an estimate of the total number of CLBs used to implement this design on an FPGA chip. Finally, we address variations of this particular type of image, such as non-binary and non-convex and determine whether this design is still applicable in those instances

    A committee machine gas identification system based on dynamically reconfigurable FPGA

    Get PDF
    This paper proposes a gas identification system based on the committee machine (CM) classifier, which combines various gas identification algorithms, to obtain a unified decision with improved accuracy. The CM combines five different classifiers: K nearest neighbors (KNNs), multilayer perceptron (MLP), radial basis function (RBF), Gaussian mixture model (GMM), and probabilistic principal component analysis (PPCA). Experiments on real sensors' data proved the effectiveness of our system with an improved accuracy over individual classifiers. Due to the computationally intensive nature of CM, its implementation requires significant hardware resources. In order to overcome this problem, we propose a novel time multiplexing hardware implementation using a dynamically reconfigurable field programmable gate array (FPGA) platform. The processing is divided into three stages: sampling and preprocessing, pattern recognition, and decision stage. Dynamically reconfigurable FPGA technique is used to implement the system in a sequential manner, thus using limited hardware resources of the FPGA chip. The system is successfully tested for combustible gas identification application using our in-house tin-oxide gas sensors

    Adaptive image filtering using run-time reconfiguration

    Get PDF
    This thesis implements an adaptive linear smoothing image filtering algorithm, on a Virtex™-E FPGA using run-time reconfiguration (RTR). An adaptive filter uses a filtering window that runs over the entire image pixel-by-pixel, generating new (filtered) values of the pixels. As the name suggests, an adaptive filter can adapt to the varying nature of an image by adjusting the coefficients of the filtering window depending upon the local variance in the intensity values of pixels. It filters an image in a non-uniform fashion providing greater smoothing in largely uniform areas of the image and lesser smoothing when it encounters edges and step changes in the image. These continual changes, in the coefficient values of the adaptive filter pose a problem in utilizing run-time reconfiguration (RTR) for its implementation, as benefits of RTR emerge only with considerable computing time between reconfigurations. This thesis provides a solution to this problem and reduces the running time of the algorithm through aggressive use of RTR. This work provides details on the RTR implementation of an adaptive filter, along with an estimate of running time and hardware resource requirements, when synthesized on the Virtex™-E FPGA. We use a 3 ×3 size filtering window, and a 256 256 ×size gray scale image as a specific case, achieving speedup of 31 and 84 over pure software implementations running on Pentium III and Sun Ultra systems respectively

    FPGA structures for high speed and low overhead dynamic circuit specialization

    Get PDF
    A Field Programmable Gate Array (FPGA) is a programmable digital electronic chip. The FPGA does not come with a predefined function from the manufacturer; instead, the developer has to define its function through implementing a digital circuit on the FPGA resources. The functionality of the FPGA can be reprogrammed as desired and hence the name “field programmable”. FPGAs are useful in small volume digital electronic products as the design of a digital custom chip is expensive. Changing the FPGA (also called configuring it) is done by changing the configuration data (in the form of bitstreams) that defines the FPGA functionality. These bitstreams are stored in a memory of the FPGA called configuration memory. The SRAM cells of LookUp Tables (LUTs), Block Random Access Memories (BRAMs) and DSP blocks together form the configuration memory of an FPGA. The configuration data can be modified according to the user’s needs to implement the user-defined hardware. The simplest way to program the configuration memory is to download the bitstreams using a JTAG interface. However, modern techniques such as Partial Reconfiguration (PR) enable us to configure a part in the configuration memory with partial bitstreams during run-time. The reconfiguration is achieved by swapping in partial bitstreams into the configuration memory via a configuration interface called Internal Configuration Access Port (ICAP). The ICAP is a hardware primitive (macro) present in the FPGA used to access the configuration memory internally by an embedded processor. The reconfiguration technique adds flexibility to use specialized ci rcuits that are more compact and more efficient t han t heir b ulky c ounterparts. An example of such an implementation is the use of specialized multipliers instead of big generic multipliers in an FIR implementation with constant coefficients. To specialize these circuits and reconfigure during the run-time, researchers at the HES group proposed the novel technique called parameterized reconfiguration that can be used to efficiently and automatically implement Dynamic Circuit Specialization (DCS) that is built on top of the Partial Reconfiguration method. It uses the run-time reconfiguration technique that is tailored to implement a parameterized design. An application is said to be parameterized if some of its input values change much less frequently than the rest. These inputs are called parameters. Instead of implementing these parameters as regular inputs, in DCS these inputs are implemented as constants, and the application is optimized for the constants. For every change in parameter values, the design is re-optimized (specialized) during run-time and implemented by reconfiguring the optimized design for a new set of parameters. In DCS, the bitstreams of the parameterized design are expressed as Boolean functions of the parameters. For every infrequent change in parameters, a specialized FPGA configuration is generated by evaluating the corresponding Boolean functions, and the FPGA is reconfigured with the specialized configuration. A detailed study of overheads of DCS and providing suitable solutions with appropriate custom FPGA structures is the primary goal of the dissertation. I also suggest different improvements to the FPGA configuration memory architecture. After offering the custom FPGA structures, I investigated the role of DCS on FPGA overlays and the use of custom FPGA structures that help to reduce the overheads of DCS on FPGA overlays. By doing so, I hope I can convince the developer to use DCS (which now comes with minimal costs) in real-world applications. I start the investigations of overheads of DCS by implementing an adaptive FIR filter (using the DCS technique) on three different Xilinx FPGA platforms: Virtex-II Pro, Virtex-5, and Zynq-SoC. The study of how DCS behaves and what is its overhead in the evolution of the three FPGA platforms is the non-trivial basis to discover the costs of DCS. After that, I propose custom FPGA structures (reconfiguration controllers and reconfiguration drivers) to reduce the main overhead (reconfiguration time) of DCS. These structures not only reduce the reconfiguration time but also help curbing the power hungry part of the DCS system. After these chapters, I study the role of DCS on FPGA overlays. I investigate the effect of the proposed FPGA structures on Virtual-Coarse-Grained Reconfigurable Arrays (VCGRAs). I classify the VCGRA implementations into three types: the conventional VCGRA, partially parameterized VCGRA and fully parameterized VCGRA depending upon the level of parameterization. I have designed two variants of VCGRA grids for HPC image processing applications, namely, the MAC grid and Pixie. Finally, I try to tackle the reconfiguration time overhead at the hardware level of the FPGA by customizing the FPGA configuration memory architecture. In this part of my research, I propose to use a parallel memory structure to improve the reconfiguration time of DCS drastically. However, this improvement comes with a significant overhead of hardware resources which will need to be solved in future research on commercial FPGA configuration memory architectures

    An Evolvable Combinational Unit for FPGAs

    Get PDF
    A complete hardware implementation of an evolvable combinational unit for FPGAs is presented. The proposed combinational unit consisting of a virtual reconfigurable circuit and evolutionary algorithm was described in VHDL independently of a target platform, i.e. as a soft IP core, and realized in the COMBO6 card. In many cases the unit is able to evolve (i.e. to design) the required function automatically and autonomously, in a few seconds, only on the basis of interactions with an environment. A number of circuits were successfully evolved directly in the FPGA, in particular, 3-bit multipliers, adders, multiplexers and parity encoders. The evolvable unit was also tested in a simulated dynamic environment and used to design various circuits specified by randomly generated truth tables

    An area-optimized N-bit multiplication technique using N/2-bit multiplication algorithm

    Get PDF
    A unique design for an optimized N-bit multiplier is proposed and implemented which utilizes a modified divide-and-conquer technique. The conventional technique requires four N/2-bit multipliers to perform N-bit multiplication, whereas the proposed design uses only one multiplier module in hardware to perform the functionality of four modules. It uses Dadda algorithm in its multiplier module. It has been implemented using Verilog HDL, and a good accuracy of results was observed in simulations which effectively verify its functionality. Design was also synthesized on various FPGAs including Spartan 3E, Virtex-5 and Virtex-7. Performance summary, after place and route, showed that the proposed approach significantly reduces hardware utilization. Furthermore, the proposed design is almost 75% more efficient in terms of resources utilization and operating frequency as compared to the conventional design
    corecore