217 research outputs found

    An FPGA Implementation of the Powering Function with Single Precision Floating-Point Arithm

    Get PDF
    n this work we present an FPGA implementation of a single-precision °oating-point arith- metic powering unit. Our powering unit is based on an indirect method that transforms xy into a chain of operations involving a logarithm, a multiplication, an exponential function and dedicated logic for the case of a negative base. This approach allows to use the full input range for the base and exponent without limiting the range of the exponent as in direct methods. A tailored hardware implementation is exploited to increase the accuracy of the unit reducing the relative errors of the operations while high performance is obtained taking advantage of the FPGA capabilities for parallel architectures. A careful design of the pipeline stages of the involved operators allows a clock cycle of 201.3 MHz on a Xilinx Virtex-4 FPG

    Customizing floating-point units for FPGAs: Area-performance-standard trade-offs

    Get PDF
    The high integration density of current nanometer technologies allows the implementation of complex floating-point applications in a single FPGA. In this work the intrinsic complexity of floating-point operators is addressed targeting configurable devices and making design decisions providing the most suitable performance-standard compliance trade-offs. A set of floating-point libraries composed of adder/subtracter, multiplier, divisor, square root, exponential, logarithm and power function are presented. Each library has been designed taking into account special characteristics of current FPGAs, and with this purpose we have adapted the IEEE floating-point standard (software-oriented) to a custom FPGA-oriented format. Extended experimental results validate the design decisions made and prove the usefulness of reducing the format complexit

    Performance comparison of single-precision SPICE Model-Evaluation on FPGA, GPU, Cell, and multi-core processors

    Get PDF
    Automated code generation and performance tuning techniques for concurrent architectures such as GPUs, Cell and FPGAs can provide integer factor speedups over multi-core processor organizations for data-parallel, floating-point computation in SPICE model-evaluation. Our Verilog AMS compiler produces code for parallel evaluation of non-linear circuit models suitable for use in SPICE simulations where the same model is evaluated several times for all the devices in the circuit. Our compiler uses architecture specific parallelization strategies (OpenMP for multi-core, PThreads for Cell, CUDA for GPU, statically scheduled VLIW for FPGA) when producing code for these different architectures. We automatically explore different implementation configurations (e.g. unroll factor, vector length) using our performance-tuner to identify the best possible configuration for each architecture. We demonstrate speedups of 3- 182times for a Xilinx Virtex5 LX 330T, 1.3-33times for an IBM Cell, and 3-131times for an NVIDIA 9600 GT GPU over a 3 GHz Intel Xeon 5160 implementation for a variety of single-precision device models

    Floating-point exponential functions for DSP-enabled FPGAs

    Get PDF
    International audienceThis article presents a floating-point exponential operator generator targeting recent FPGAs with embedded memories and DSP blocks. A single-precision operator consumes just one DSP block, 18Kbits of dual-port memory, and 392 slices on Virtex-4. For larger precisions, a generic approach based on polynomial approximation is used and proves more resource-efficient than the literature. For instance a double-precision operator consumes 5 BlockRAM and 12 DSP48 blocks on Virtex-5, or 10 M9k and 22 18x18 multipliers on Stratix III. This approach is flexible, scales well beyond double-precision, and enables frequencies close to the FPGA's nominal frequency. All the proposed architectures are last-bit accurate for all the floating-point range.They are available in the open-source FloPoCo framework

    Generating high-performance custom floating-point pipelines

    Get PDF
    International audienceCustom operators, working at custom precisions, are a key ingredient to fully exploit the FPGA flexibility advantage for high-performance computing. Unfortunately, such operators are costly to design, and application designers tend to rely on less efficient off-the-shelf operators. To address this issue, an open-source architecture generator framework is introduced. Its salient features are an easy learning curve from VHDL, the ability to embedd arbitrary synthesisable VHDL code, portability to mainstream FPGA targets from Xilinx and Altera, automatic management of complex pipelines with support for frequency-directed pipeline, automatic test-bench generation. This generator is presented around the simple example of a collision detector, which it significantly improves in accuracy, DSP count, logic usage, frequency and latency with respect to an implementation using standard floating-point operators

    Accelerating SPICE Model-Evaluation using FPGAs

    Get PDF
    Single-FPGA spatial implementations can provide an order of magnitude speedup over sequential microprocessor implementations for data-parallel, floating-point computation in SPICE model-evaluation. Model-evaluation is a key component of the SPICE circuit simulator and it is characterized by large irregular floating-point compute graphs. We show how to exploit the parallelism available in these graphs on single-FPGA designs with a low-overhead VLIW-scheduled architecture. Our architecture uses spatial floating-point operators coupled to local high-bandwidth memories and interconnected by a time-shared network. We retime operation inputs in the model-evaluation to allow independent scheduling of computation and communication. With this approach, we demonstrate speedups of 2–18× over a dual-core 3GHz Intel Xeon 5160 when using a Xilinx Virtex 5 LX330T for a variety of SPICE device models

    Fonctions élémentaires en virgule flottante pour les accélérateurs reconfigurables

    Get PDF
    National audienceLes circuits reconfigurables FPGA ont dĂ©sormais une capacitĂ© telle qu'ils peuvent ĂȘtre utilisĂ©s Ă  des tĂąches d'accĂ©lĂ©ration de calcul en virgule flottante. La littĂ©rature (et depuis peu les constructeurs) proposent des opĂ©rateurs pour les quatre opĂ©rations. L'Ă©tape suivante est de proposer des opĂ©rateurs pour les fonctions Ă©lĂ©mentaires les plus utilisĂ©es. Parmi celles-ci, nous proposons des architectures dĂ©diĂ©es pour l'Ă©valuation des fonctions exponentielles, logarithme, sinus et cosinus, et Ă©tudions les compromis possibles. Pour chacune de ces fonctions, un seul de ces opĂ©rateurs surpasse d'un facteur dix les processeurs gĂ©nĂ©ralistes en terme de dĂ©bit, tout en occupant une fraction des ressources matĂ©rielles du FPGA. Tous ces opĂ©rateurs sont disponibles librement sur http://www.ens-lyon.fr/LIP/Arenaire/

    Automatic generation of polynomial-based hardware architectures for function evaluation

    Get PDF
    International audienceMany applications require the evaluation of some function through polynomial approximation. This article details an architecture generator for this class of problems that improves upon the literature in two aspects. Firstly, it benefits from recent advances related to constrained-coefficient polynomial approximation. Secondly, it refines the error analysis of polynomial evaluation to reduce the size of the multipliers used. As a result, architectures for evaluating arbitrary functions with precisions up to 64 bits, making efficient use of the resources of recent FPGAs, can be obtained in seconds. An open-source implementation is provided in the FloPoCo project
    • 

    corecore