7 research outputs found

    Generating high-performance custom floating-point pipelines

    Get PDF
    International audienceCustom operators, working at custom precisions, are a key ingredient to fully exploit the FPGA flexibility advantage for high-performance computing. Unfortunately, such operators are costly to design, and application designers tend to rely on less efficient off-the-shelf operators. To address this issue, an open-source architecture generator framework is introduced. Its salient features are an easy learning curve from VHDL, the ability to embedd arbitrary synthesisable VHDL code, portability to mainstream FPGA targets from Xilinx and Altera, automatic management of complex pipelines with support for frequency-directed pipeline, automatic test-bench generation. This generator is presented around the simple example of a collision detector, which it significantly improves in accuracy, DSP count, logic usage, frequency and latency with respect to an implementation using standard floating-point operators

    Comparison of Pipelined Floating Point Unit with Unpipelined Floating Point Unit

    Get PDF
    Floating-point numbers are broadly received in numerous applications due their element representation abilities. Floating-point representation has the capacity hold its determination and exactness contrasted with altered point representations. Any Digital Signal Processing (DSP) calculations utilization floating-point math, which obliges a huge number of figuring’s every second to be performed. For such stringent necessities, outline of quick, exact and effective circuits is the objective of each VLSI creator. This paper displays a correlation of pipelined floating-point snake dissention with IEEE 754 organization with an unpipelined viper additionally protests with IEEE 754 arrangement. It depicts the IEEE floating-point standard 754. A pipelined floating point unit in light of IEEE 754 configuration is produced and the outline is contrasted and that of an unpipelined floating point unit and an investigation is defeated speed, range, and force contemplations. It builds the rate as well as is vitality productive. Every one of these changes is at the expense of slight increment in the chip region. The basic methodology and approach used for VHDL (Very Large Scale Integration Hardware Descriptive Language) implementation of the floating-point unit are also described. Detailed synthesis report operated upon Xilinx ISE 11 software and Modelsim is given

    Reflections on 10 years of FloPoCo

    Get PDF
    International audienceThe FloPoCo open-source arithmetic core generator project started modestly in 2008 [1], with a few parametric floating point cores. It has since then evolved to become a framework for research on hardware arithmetic cores at large, including among others: LNS arithmetic [2], random number generators [3], elementary functions [4]–[9], specialized operators such as constant multiplication and division [10]–[13], various FPGA-specific optimization techniques [14]–[16], and more recently signal-processing transforms and filters [17], [18] (more references can be found on the project’s web site: http://flopoco.gforge.inria.fr/)

    FPGA acceleration of the phylogenetic likelihood function for Bayesian MCMC inference methods

    Get PDF
    Background Likelihood (ML)-based phylogenetic inference has become a popular method for estimating the evolutionary relationships among species based on genomic sequence data. This method is used in applications such as RAxML, GARLI, MrBayes, PAML, and PAUP. The Phylogenetic Likelihood Function (PLF) is an important kernel computation for this method. The PLF consists of a loop with no conditional behavior or dependencies between iterations. As such it contains a high potential for exploiting parallelism using micro-architectural techniques. In this paper, we describe a technique for mapping the PLF and supporting logic onto a Field Programmable Gate Array (FPGA)-based co-processor. By leveraging the FPGA\u27s on-chip DSP modules and the high-bandwidth local memory attached to the FPGA, the resultant co-processor can accelerate ML-based methods and outperform state-of-the-art multi-core processors. Results We use the MrBayes 3 tool as a framework for designing our co-processor. For large datasets, we estimate that our accelerated MrBayes, if run on a current-generation FPGA, achieves a 10Ă— speedup relative to software running on a state-of-the-art server-class microprocessor. The FPGA-based implementation achieves its performance by deeply pipelining the likelihood computations, performing multiple floating-point operations in parallel, and through a natural log approximation that is chosen specifically to leverage a deeply pipelined custom architecture. Conclusions Heterogeneous computing, which combines general-purpose processors with special-purpose co-processors such as FPGAs and GPUs, is a promising approach for high-performance phylogeny inference as shown by the growing body of literature in this field. FPGAs in particular are well-suited for this task because of their low power consumption as compared to many-core processors and Graphics Processor Units (GPUs)

    Pipeline automatique d’opérateurs dans FloPoCo 5.0

    Get PDF
    National audienceCet article présente la troisième évolution de la fonctionnalité de pipeline automatique intégrée dans le générateur de coeurs arithmétiques FloPoCo. La description combinatoire en VHDL d'un opérateur de calcul est d'abord encapsulée dans du C++. Ensuite, l'ajout de primitives simples à ce C++ permet d'obtenir automatiquement des versions pipelinées de ce VHDL, à une fréquence arbitraire et pour toute la gamme de cible supportées par l'outil. Les algorithmes utilisés sont décrits dans les grandes lignes, et la qualité des résultats est évaluée

    Opérateurs et engins de calcul en virgule flottante et leur application à la simulation en temps réel sur FPGA

    Get PDF
    RÉSUMÉ La simulation en temps réel des réseaux électriques connaît un vif intérêt industriel, motivé par la réduction substantielle des coûts de développement qu'offre une telle approche de prototypage. Ainsi, la simulation en temps réel permet d'intégrer dans la boucle de la simulation du matériel au fur et à mesure sa conception, permettant du même coup d'en vérifier le bon fonctionnement dans des conditions réalistes. Néanmoins, la simulation en temps réel au moyen de CPU, telle qu'elle a été pensée depuis une quinzaine d'années, souffre de certaines limitations, notamment dans l'atteinte de pas de calcul de l'ordre de quelques micro-secondes, un requis important pour la simulation fidèle des transitoires rapides qu'exigent les convertisseurs de puissance modernes. Pour tenter d'apporter une réponse à ces difficultés, les industriels ont adopté les circuits FPGA pour la réalisation d'engins de calcul dédiés à la simulation rapide des réseaux électriques, ce qui a permis de franchir la barrière de la fréquence de commutation de 5 kHz qui était caractéristique de la simulation sur CPU. La simulation sur FPGA offre à ce titre différents avantages telle que la réduction de la latence de la boucle de simulation du matériel sous test, particulièrement du fait que le FPGA donne un accès direct aux senseurs et aux actuateurs du dispositif en cours de prototypage. Les paradigmes usuels du traitement de signal sur FPGA font qu'il est d'usage d'y opérer une arithmétique à virgule fixe. Ce format des nombres pénalise le temps de développement puisqu'il requiert du concepteur une évaluation complexe de la précision nécessaire pour représenter l'ensemble des variables du modèle mathématique. C'est pourquoi l'arithmétique à virgule flottante suscite un certain intérêt dans la simulation des réseaux sur FPGA. Cependant, les opérateurs en virgule flottante imposent de longues latences, particulièrement handicapantes dans la réalisation de lois d'intégration (trapézoïdale, Euler-arrière, etc.) pour lesquelles l'utilisation d'un accumulateur à un cycle est cruciale. En cela, la problématique de l'addition et de l'accumulation en virgule flottante forme le cœur de notre travail de recherche. Ce travail a permis l'élaboration des architectures d'accumulateurs, de multiplieurs accumulateurs (MAC) et d'opérateurs de produit scalaire (OPS) en virgule flottante, qui joueront un rôle déterminant dans la mise en œuvre de nos engins de calcul pour la simulation des réseaux électriques. Ainsi, le travail présenté dans cette thèse propose différentes contributions scientifiques au domaine de la simulation en temps réel sur FPGA. D'une part, il contribue à la formulation d'un algorithme de sommation qui est une généralisation de la technique d'auto-alignement, nantie ici d'une formulation et d'une réalisation matérielle simplifiées. Le travail établit les critères permettant de garantir la bonne exactitude des résultats, critères que nous avons établis par des démonstrations théoriques et empiriques. La thèse propose également une analyse exhaustive de l'utilisation du format redondant high radix carry-save (HRCS) dans l'addition de mantisses larges, format pour lequel deux nouveaux opérateurs arithmétiques sont proposés: un additionneur endomorphique ainsi qu'un convertisseur HRCS à conventionnel. Une fois l'addition en virgule flottante à un cycle réalisée, la thèse propose de concevoir sur FPGA des engins de calcul exploitant une architecture SIMD (single instruction, multiple data) et disposant de plusieurs MAC ou opérateurs de produit scalaire (OPS) en virgule flottante. Ces opérateurs présentent une latence très courte, permettant l'atteinte de pas de calcul de quelques centaines de nanosecondes dans la simulation de convertisseurs de puissance de moyenne complexité.----------ABSTRACT The real-time simulation of electrical networks gained a vivid industrial interest during recent years, motivated by the substantial development cost reduction that such a prototyping approach can offer. Real-time simulation allows the progressive inclusion of real hardware during its development, allowing its testing under realistic conditions. However, CPU-based simulations suffer from certain limitations such as the difficulty to reach time-steps of a few microsecond, an important challenge brought by modern power converters. Hence, industrial practitioners adopted the FPGA as a platform of choice for the implementation of calculation engines dedicated to the rapid real-time simulation of electrical networks. The reconfigurable technology broke the 5~kHz switching frequency barrier that is characteristic of CPU-based simulations. Moreover, FPGA-based real-time simulation offers many advantages, including the reduced latency of the simulation loop that is obtained thanks to a direct access to sensors and actuators. The fixed-point format is paradigmatic to FPGA-based digital signal processing. However, the format imposes a time penalty in the development process since the designer has to asses the required precision for all model variables. This fact brought an import research effort on the use of the floating-point format for the simulation of electrical networks. One of the main challenges in the use of the floating-point format are the long latencies required by the elementary arithmetic operators, particularly when an adder is used as an accumulator, an important building block for the implementation of integration rules such as the trapezoidal method. Hence, single-cycle floating-point accumulation forms the core of this research work. Our results help building such operators as accumulators, multiply-accumulators (MACs), and dot-product (DP) operators. These operators play a key role in the implementation of the proposed calculation engines. Therefore, this thesis contributes to the realm of FPGA-based real-time simulation in many ways. The research work proposes a new summation algorithm, which is a generalization of the so-called self-alignment technique. The new formulation is broader, simpler in its expression and hardware implementation. Our research helps formulating criteria to guarantee good accuracy, the criteria being established on a theoretical, as well as empirical basis. Moreover, the thesis offers a comprehensive analysis on the use of the redundant high radix carry-save (HRCS) format. The HRCS format is used to perform rapid additions of large mantissas. Two new HRCS operators are also proposed, namely an endomorphic adder and a HRCS to conventional converter. Once the mean to single-cycle accumulation is defined as a combination of the self-alignment technique and the HRCS format, the research focuses on the FPGA implementation of SIMD calculation engines using parallel floating-point MACs or DPs. The proposed operators are characterized by low latencies, allowing the engines to reach very low time-steps. The document finally discusses power electronic circuits modelling, and concludes with the presentation of a versatile calculation engine capable of simulating power converter with arbitrary topologies and up to 24 switches, while achieving time steps below 1 μs and allowing switching frequencies in the range of tens kilohertz
    corecore