351 research outputs found

    High Performance Computing via High Level Synthesis

    Get PDF
    As more and more powerful integrated circuits are appearing on the market, more and more applications, with very different requirements and workloads, are making use of the available computing power. This thesis is in particular devoted to High Performance Computing applications, where those trends are carried to the extreme. In this domain, the primary aspects to be taken into consideration are (1) performance (by definition) and (2) energy consumption (since operational costs dominate over procurement costs). These requirements can be satisfied more easily by deploying heterogeneous platforms, which include CPUs, GPUs and FPGAs to provide a broad range of performance and energy-per-operation choices. In particular, as we will see, FPGAs clearly dominate both CPUs and GPUs in terms of energy, and can provide comparable performance. An important aspect of this trend is of course design technology, because these applications were traditionally programmed in high-level languages, while FPGAs required low-level RTL design. The OpenCL (Open Computing Language) developed by the Khronos group enables developers to program CPU, GPU and recently FPGAs using functionally portable (but sadly not performance portable) source code which creates new possibilities and challenges both for research and industry. FPGAs have been always used for mid-size designs and ASIC prototyping thanks to their energy efficient and flexible hardware architecture, but their usage requires hardware design knowledge and laborious design cycles. Several approaches are developed and deployed to address this issue and shorten the gap between software and hardware in FPGA design flow, in order to enable FPGAs to capture a larger portion of the hardware acceleration market in data centers. Moreover, FPGAs usage in data centers is growing already, regardless of and in addition to their use as computational accelerators, because they can be used as high performance, low power and secure switches inside data-centers. High-Level Synthesis (HLS) is the methodology that enables designers to map their applications on FPGAs (and ASICs). It synthesizes parallel hardware from a model originally written C-based programming languages .e.g. C/C++, SystemC and OpenCL. Design space exploration of the variety of implementations that can be obtained from this C model is possible through wide range of optimization techniques and directives, e.g. to pipeline loops and partition memories into multiple banks, which guide RTL generation toward application dependent hardware and benefit designers from flexible parallel architecture of FPGAs. Model Based Design (MBD) is a high-level and visual process used to generate implementations that solve mathematical problems through a varied set of IP-blocks. MBD enables developers with different expertise, e.g. control theory, embedded software development, and hardware design to share a common design framework and contribute to a shared design using the same tool. Simulink, developed by MATLAB, is a model based design tool for simulation and development of complex dynamical systems. Moreover, Simulink embedded code generators can produce verified C/C++ and HDL code from the graphical model. This code can be used to program micro-controllers and FPGAs. This PhD thesis work presents a study using automatic code generator of Simulink to target Xilinx FPGAs using both HDL and C/C++ code to demonstrate capabilities and challenges of high-level synthesis process. To do so, firstly, digital signal processing unit of a real-time radar application is developed using Simulink blocks. Secondly, generated C based model was used for high level synthesis process and finally the implementation cost of HLS is compared to traditional HDL synthesis using Xilinx tool chain. Alternative to model based design approach, this work also presents an analysis on FPGA programming via high-level synthesis techniques for computationally intensive algorithms and demonstrates the importance of HLS by comparing performance-per-watt of GPUs(NVIDIA) and FPGAs(Xilinx) manufactured in the same node running standard OpenCL benchmarks. We conclude that generation of high quality RTL from OpenCL model requires stronger hardware background with respect to the MBD approach, however, the availability of a fast and broad design space exploration ability and portability of the OpenCL code, e.g. to CPUs and GPUs, motivates FPGA industry leaders to provide users with OpenCL software development environment which promises FPGA programming in CPU/GPU-like fashion. Our experiments, through extensive design space exploration(DSE), suggest that FPGAs have higher performance-per-watt with respect to two high-end GPUs manufactured in the same technology(28 nm). Moreover, FPGAs with more available resources and using a more modern process (20 nm) can outperform the tested GPUs while consuming much less power at the cost of more expensive devices

    FPGAs in Industrial Control Applications

    Get PDF
    The aim of this paper is to review the state-of-the-art of Field Programmable Gate Array (FPGA) technologies and their contribution to industrial control applications. Authors start by addressing various research fields which can exploit the advantages of FPGAs. The features of these devices are then presented, followed by their corresponding design tools. To illustrate the benefits of using FPGAs in the case of complex control applications, a sensorless motor controller has been treated. This controller is based on the Extended Kalman Filter. Its development has been made according to a dedicated design methodology, which is also discussed. The use of FPGAs to implement artificial intelligence-based industrial controllers is then briefly reviewed. The final section presents two short case studies of Neural Network control systems designs targeting FPGAs

    Field Programmable Gate Arrays and Reconfigurable Computing in Automatic Control

    Get PDF
    New combustion engine principles increase the demands on feedback combustion control, at the same time economical considerations currently enforce the usage of low-end control hardware limiting implementation possibilities. Significant development is simultaneously and continuously carried out within the field of Field Programmable Gate Arrays (FPGAs). In recent years FPGAs have developed, from being a device mainly used to implement grids of 'glue-logic' to something of a flexible 'dream device' in cost and performance sensitive applications. It is not solely the development of FPGA devices which has made the FPGA the promising implementation platform it is, development of software tool sets and design methodologies is as important as the device as such. This thesis describes the nature of FPGAs, how they work, which programming environments that are available and which design methodologies that can be used on different levels. Focus is set on implementing control and feedback control on FPGAs in general terms. There are a lot of practical considerations differing between the FPGA environment and the well-known micro-controller environment and those are discussed from the view of the literature available in the different areas. The potential application of FPGAs is described and illustrated with application examples found in the literature, both general applications and control applications are discussed. The intended application is control of internal combustion engines and one FPGA implementation of a modeling algorithm commonly used within automotive control is described and discussed. The intention is to illustrate the usefulness in automotive control applications. Finally a suggestion of a suitable FPGA based automotive-control development environment is treat

    Image Processing Using FPGAs

    Get PDF
    This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs

    High-Level Design for Ultra-Fast Software Defined Radio Prototyping on Multi-Processors Heterogeneous Platforms

    Get PDF
    International audienceThe design of Software Defined Radio (SDR) equipments (terminals, base stations, etc.) is still very challenging. We propose here a design methodology for ultra-fast prototyping on heterogeneous platforms made of GPPs (General Purpose Processors), DSPs (Digital Signal Processors) and FPGAs (Field Programmable Gate Array). Lying on a component-based approach, the methodology mainly aims at automating as much as possible the design from an algorithmic validation to a multi-processing heterogeneous implementation. The proposed methodology is based on the SynDEx CAD design approach, which was originally dedicated to multi-GPPs networks. We show how this was changed so that it is made appropriate with an embedded context of DSP. The implication of FPGAs is then addressed and integrated in the design approach with very little restrictions. Apart from a manual HW/SW partitioning, all other operations may be kept automatic in a heterogeneous processing context. The targeted granularity of the components, which are to be assembled in the design flow, is roughly the same size as that of a FFT, a filter or a Viterbi decoder for instance. The re-use of third party or pre-developed IPs is a basis for this design approach. Thanks to the proposed design methodology it is possible to port "ultra" fast a radio application over several platforms. In addition, the proposed design methodology is not restricted to SDR equipment design, and can be useful for any real-time embedded heterogeneous design in a prototyping context

    Rapid prototyping from algorithm to FPGA prototype

    Get PDF
    Abstract. Wireless data usage continuously increases in today’s world setting higher requirements for wireless networks. Ever increasing requirements result in more complex hardware (HW) implementation, especially telecommunication System-on-Chips (SoC) performance is playing a key-role in this development. Complexity increases design workload, therefore, it makes design flow times longer. High-Level Synthesis (HLS) tools have been designed to automate and accelerate design by moving manual work on a higher level. This Master’s Thesis studies MathWorks HLS workflow usage for rapid prototyping of Wireless Communication SoC Intellectual Property (IP). This thesis introduces design and FPGA prototyping flow of Application-Specific Integrated Circuit (ASIC). It presents good design practices targeted for HLS. It also studies MathWorks Hardware Description Language (HDL) generation flow with HDL Coder, possible problems during the flow and solutions to overcome the problems. The HLS flow is examined with an example design that scales and limits the power of IQ-data. This work verifies the design in a Field-Programmable Gate Array (FPGA) environment. It concentrates on evaluating the usage and benefits of MathWorks HLS workflow targeted for rapid prototyping of SoCs. The Example IP is a Simulink model containing MATLAB algorithms and System Objects. The design is optimized on algorithm level and synthesized into VHDL. The generated Register-Transfer Level (RTL) is verified in co-simulation against the algorithm model. Optimization and verification methods are evaluated. The HDL model is further processed through logic-synthesis using the 3rd party synthesis tool run automatically with a script created by MathWorks workflow. The generated design is tested on FPGA with FPGA-in-the-loop simulation configuration. FPGA prototyping flow benefits for rapid prototyping are evaluated. Coding styles to generate synthesizable HDL code and simulation methods to improve simulation speed of hardware-like algorithm were discussed. MathWorks HLS workflow was evaluated for rapid prototype purposes from algorithm to FPGA. Optimization methods and capability for production quality RTL for ASIC target were also discussed. MathWorks’ tool flow provided promising results for rapid prototyping. It generated human-readable HDL that was successfully synthesized on FPGA. The FPGA model was simulated in FPGA-in-the-loop configuration successfully. It also provided good area and speed results for the ASIC target when the algorithm was written strictly from the hardware perspective. The process was found to be distinct and efficient.Nopea prototypointi algoritmista FPGA-prototyypiksi. Tiivistelmä. Langattoman datan käyttö kasvaa jatkuvasti nykymaailmassa ja asettaa korkeammat vaatimukset langattomille verkoille. Kasvavat vaatimukset tekevät laitteistototeutuksesta kompleksisempaa, erityisesti tietoliikenteessä käytettävien järjestelmäpiirien (SoC) tehokkuus on avainasemassa. Tämä kasvattaa suunnittelun työmäärää ja näin ollen suunnitteluvuohon kuluva aika pidentyy. Korkean tason synteesi (HLS) on kehitetty automatisoimaan ja nopeuttamaan digitaalisuunnittelua siirtämällä manuaalista työtä korkeammalle tasolle. Tämä diplomityö tutkii MathWorks:n HLS-vuon käyttöä langattomaan viestintään suunniteltavien SoC:ien tekijänoikeudenalaisten standardoitujen lohkojen (IP) nopeaan prototypointiin. Työ esittelee perinteisen asiakaspiirin (ASIC) suunnitteluvuon, FPGA-prototypointivuon ja suunnitteluperiaatteet HLS:ää varten. Työssä käydään läpi MathWorks:n laitteistokuvauskielen (HDL) generointivuo HDL Coder:lla, mahdollisia ongelmakohtia vuossa ja ratkaisuja ongelmiin. HLS-vuota tutkitaan esimerkkimallin avulla, joka skaalaa ja rajoittaa IQ-datan tehoa. Esimerkkimallin toiminta tarkistetaan ohjelmoitavan logiikkapiirin (FPGA) kanssa. Työ keskittyy arvioimaan MathWorks:n HLS-vuon käyttöä ja hyötyä nopeaan prototypointiin SoC:ien kehityksessä. Esimerkkinä käytetään Simulink-mallia, joka sisältää MATLAB-funktioita ja System Object-olioita. Algoritmitasolla optimoitu malli syntesoidaan VHDL:ksi ja rekisterinsiirtotason (RTL) mallin toiminta tarkistetaan yhteissimulaatiolla alkuperäistä algoritmimallia vasten. Optimointi- ja verifiointimenetelmien toimivuutta ja tehokkuutta arvioidaan. Generoitu HDL-malli syntesoidaan kolmannen osapuolen logiikkasynteesi-työkalulla, joka käynnistetään MathWorks:n työkaluvuon generoimalla komentosarjalla. Luotu malli ohjelmoidaan FPGA:lle ja sen toiminta tarkistetaan FPGA-simulaatiolla. Syntesoituvan HDL-koodin generointiin vaadittavia koodaustyylejä ja algoritmimallin simulointinopeutta parantavia menetelmiä tutkittiin. MathWorks:n HLS-vuon soveltuvuutta nopeaan prototypointiin algoritmista FPGA-prototyypiksi pohdittiin. Lisäksi optimointimenetelmiä ja vuon soveltuvuutta tuotantolaatuisen RTL:n generoimiseen arvioitiin. MathWorks:n työkaluvuo osoitti lupaavia tuloksia nopean prototypoinnin näkökulmasta. Se loi luettavaa HDL-koodia, joka syntesoitui FPGA:lle. Malli ajettiin onnistuneesti FPGA:lla. Vuon avulla saavutettiin hyviä tuloksia pinta-alan ja nopeuden suhteen, kun malli optimoitiin asiakaspiirille. Tämä vaati mallin kuvaamista tarkasti laitteiston näkökulmasta. Prosessi oli kokonaisuudessaan selkeä ja tehokas
    corecore