260 research outputs found

    A systolic architecture for the correlation and accumulation of digital sequences

    Get PDF
    A fully systolic architecture for the implementation of digital sequence correlator/accumulators is described. These devices consist of a two-dimensional array of processing elements that are conceived for efficient fabrication in Very Large Scale Integrated (VLSI) circuits. A custom VLSI chip that was implemented using these concepts is described. The chip, which contains a four-lag three-level sequence correlator and four bits of accumulation with overflow detection, was designed using the Integrated UNIX-Based Computer Aided Design (CAD) System. Applications of such devices include the synchronization of coded telemetry data, alignment of both real time and non-real time Very Large Baseline Interferometry (VLBI) signals, and the implementation of digital filters and processes of many types

    A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization

    Full text link
    A new generation of radio telescopes is achieving unprecedented levels of sensitivity and resolution, as well as increased agility and field-of-view, by employing high-performance digital signal processing hardware to phase and correlate large numbers of antennas. The computational demands of these imaging systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the number of independent beams, and N is the number of antennas. The specifications of many new arrays lead to demands in excess of tens of PetaOps per second. To meet this challenge, we have developed a general purpose correlator architecture using standard 10-Gbit Ethernet switches to pass data between flexible hardware modules containing Field Programmable Gate Array (FPGA) chips. These chips are programmed using open-source signal processing libraries we have developed to be flexible, scalable, and chip-independent. This work reduces the time and cost of implementing a wide range of signal processing systems, with correlators foremost among them,and facilitates upgrading to new generations of processing technology. We present several correlator deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes parameter application deployed on the Precision Array for Probing the Epoch of Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31 pages. v2: corrected typo, v3: corrected Fig. 1

    A model‐based design floating‐point accumulator. Case of study: FPGA implementation of a support vector machine kernel function

    Get PDF
    Recent research in wearable sensors have led to the development of an advanced platform capable of embedding complex algorithms such as machine learning algorithms, which are known to usually be resource‐demanding. To address the need for high computational power, one solution is to design custom hardware platforms dedicated to the specific application by exploiting, for example, Field Programmable Gate Array (FPGA). Recently, model‐based techniques and automatic code generation have been introduced in FPGA design. In this paper, a new model‐based floating‐point accumulation circuit is presented. The architecture is based on the state‐of‐the‐art delayed buffering algorithm. This circuit was conceived to be exploited in order to compute the kernel function of a support vector machine. The implementation of the proposed model was carried out in Simulink, and simulation results showed that it had better performance in terms of speed and occupied area when compared to other solutions. To better evaluate its figure, a practical case of a polynomial kernel function was considered. Simulink and VHDL post‐implementation timing simulations and measurements on FPGA confirmed the good results of the stand‐alone accumulator

    Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

    Get PDF
    The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

    Optimisations arithmétiques et synthÚse de haut niveau

    Get PDF
    High-level synthesis (HLS) tools offer increased productivity regarding FPGA programming.However, due to their relatively young nature, they still lack many arithmetic optimizations.This thesis proposes safe arithmetic optimizations that should always be applied.These optimizations are simple operator specializations, following the C semantic.Other require to a lift the semantic embedded in high-level input program languages, which are inherited from software programming, for an improved accuracy/cost/performance ratio.To demonstrate this claim, the sum-of-product of floating-point numbers is used as a case study. The sum is performed on a fixed-point format, which is tailored to the application, according to the context in which the operator is instantiated.In some cases, there is not enough information about the input data to tailor the fixed-point accumulator.The fall-back strategy used in this thesis is to generate an accumulator covering the entire floating-point range.This thesis explores different strategies for implementing such a large accumulator, including new ones.The use of a 2's complement representation instead of a sign+magnitude is demonstrated to save resources and to reduce the accumulation loop delay.Based on a tapered precision scheme and an exact accumulator, the posit number systems claims to be a candidate to replace the IEEE floating-point format.A throughout analysis of posit operators is performed, using the same level of hardware optimization as state-of-the-art floating-point operators.Their cost remains much higher that their floating-point counterparts in terms of resource usage and performance. Finally, this thesis presents a compatibility layer for HLS tools that allows one code to be deployed on multiple tools.This library implements a strongly typed custom size integer type along side a set of optimized custom operators.À cause de la nature relativement jeune des outils de synthĂšse de haut-niveau (HLS), de nombreuses optimisations arithmĂ©tiques n'y sont pas encore implĂ©mentĂ©es. Cette thĂšse propose des optimisations arithmĂ©tiques se servant du contexte spĂ©cifique dans lequel les opĂ©rateurs sont instanciĂ©s.Certaines optimisations sont de simples spĂ©cialisations d'opĂ©rateurs, respectant la sĂ©mantique du C.D'autres nĂ©cĂ©ssitent de s'Ă©loigner de cette sĂ©mantique pour amĂ©liorer le compromis prĂ©cision/coĂ»t/performance.Cette proposition est dĂ©montrĂ© sur des sommes de produits de nombres flottants.La somme est rĂ©alisĂ©e dans un format en virgule-fixe dĂ©fini par son contexte.Quand trop peu d’informations sont disponibles pour dĂ©finir ce format en virgule-fixe, une stratĂ©gie est de gĂ©nĂ©rer un accumulateur couvrant l'intĂ©gralitĂ© du format flottant.Cette thĂšse explore plusieurs implĂ©mentations d'un tel accumulateur.L'utilisation d'une reprĂ©sentation en complĂ©ment Ă  deux permet de rĂ©duire le chemin critique de la boucle d'accumulation, ainsi que la quantitĂ© de ressources utilisĂ©es. Un format alternatif aux nombres flottants, appelĂ© posit, propose d'utiliser un encodage Ă  prĂ©cision variable.De plus, ce format est augmentĂ© par un accumulateur exact.Pour Ă©valuer prĂ©cisĂ©ment le coĂ»t matĂ©riel de ce format, cette thĂšse prĂ©sente des architectures d'opĂ©rateurs posits, implĂ©mentĂ©s avec le mĂȘme degrĂ© d'optimisation que celui de l'Ă©tat de l'art des opĂ©rateurs flottants.Une analyse dĂ©taillĂ©e montre que le coĂ»t des opĂ©rateurs posits est malgrĂ© tout bien plus Ă©levĂ© que celui de leurs Ă©quivalents flottants.Enfin, cette thĂšse prĂ©sente une couche de compatibilitĂ© entre outils de HLS, permettant de viser plusieurs outils avec un seul code. Cette bibliothĂšque implĂ©mente un type d'entiers de taille variable, avec de plus une sĂ©mantique strictement typĂ©e, ainsi qu'un ensemble d'opĂ©rateurs ad-hoc optimisĂ©s

    High-Speed and Unified ECC Processor for Generic Weierstrass Curves over GF(p) on FPGA

    Get PDF
    In this paper, we present a high-speed, unified elliptic curve cryptography (ECC) processor for arbitrary Weierstrass curves over GF(p), which to the best of our knowledge, outperforms other similar works in terms of execution time. Our approach employs the combination of the schoolbook long and Karatsuba multiplication algorithm for the elliptic curve point multiplication (ECPM) to achieve better parallelization while retaining low complexity. In the hardware implementation, the substantial gain in speed is also contributed by our n-bit pipelined Montgomery Modular Multiplier (pMMM), which is constructed from our n-bit pipelined multiplier-accumulators that utilizes digital signal processor (DSP) primitives as digit multipliers. Additionally, we also introduce our unified, pipelined modular adder-subtractor (pMAS) for the underlying field arithmetic, and leverage a more efficient yet compact scheduling of the Montgomery ladder algorithm. The implementation for 256-bit modulus size on the 7-series FPGA: Virtex-7, Kintex-7, and XC7Z020 yields 0.139, 0.138, and 0.206 ms of execution time, respectively. Furthermore, since our pMMM module is generic for any curve in Weierstrass form, we support multi-curve parameters, resulting in a unified ECC architecture. Lastly, our method also works in constant time, making it suitable for applications requiring high speed and SCA-resistant characteristics

    Readout system test benches

    Get PDF
    We propose to develop and exploit versatile multi-purpose Personal Computer-based Test Benches to support the evaluation and design of the basic elements required for digital front-end readout and data transmission systems for an LHC experiment. These test benches will have modular hardware facilities for the operation of new readout system components under realistic conditions, and will implement advanced modern software engineering concepts. They will support components such as fast ADCs, hybrid fibre-optic transceivers, and the prototype VLSI systolic array and data-flow processors currently being developed in national research laboratories and by the emerging European HDTV industry. These efforts would also lay the foundations for projects involving the development of custom-designed VLSI circuits
    • 

    corecore