603 research outputs found

    Application-Specific Number Representation

    No full text
    Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application- specific number representations. Well-known number formats include fixed-point, floating- point, logarithmic number system (LNS), and residue number system (RNS). Such different number representations lead to different arithmetic designs and error behaviours, thus produc- ing implementations with different performance, accuracy, and cost. To investigate the design options in number representations, the first part of this thesis presents a platform that enables automated exploration of the number representation design space. The second part of the thesis shows case studies that optimise the designs for area, latency or throughput from the perspective of number representations. Automated design space exploration in the first part addresses the following two major issues: ² Automation requires arithmetic unit generation. This thesis provides optimised arithmetic library generators for logarithmic and residue arithmetic units, which support a wide range of bit widths and achieve significant improvement over previous designs. ² Generation of arithmetic units requires specifying the bit widths for each variable. This thesis describes an automatic bit-width optimisation tool called R-Tool, which combines dynamic and static analysis methods, and supports different number systems (fixed-point, floating-point, and LNS numbers). Putting it all together, the second part explores the effects of application-specific number representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic imaging computations. Experimental results show that customising the number representations brings benefits to hardware implementations: by selecting a more appropriate number format, we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%. On the performance side, hardware implementations with customised number formats achieve 5 to potentially over 40 times speedup over software implementations

    Optimising Sparse Matrix Vector multiplication for large scale FEM problems on FPGA

    Get PDF
    Sparse Matrix Vector multiplication (SpMV) is an important kernel in many scientific applications. In this work we propose an architecture and an automated customisation method to detect and optimise the architecture for block diagonal sparse matrices. We evaluate the proposed approach in the context of the spectral/hp Finite Element Method, using the local matrix assembly approach. This problem leads to a large sparse system of linear equations with block diagonal matrix which is typically solved using an iterative method such as the Preconditioned Conjugate Gradient. The efficiency of the proposed architecture combined with the effectiveness of the proposed customisation method reduces BRAM resource utilisation by as much as 10 times, while achieving identical throughput with existing state of the art designs and requiring minimal development effort from the end user. In the context of the Finite Element Method, our approach enables the solution of larger problems than previously possible, enabling the applicability of FPGAs to more interesting HPC problems

    Design and application of reconfigurable circuits and systems

    No full text
    Open Acces

    An analysis of FPGA-based custom computers for DSP applications

    Get PDF
    Field programmable gate arrays (FPGAs) can be rapidly reconfigured to provide different digital logic functions. When such FPGA logic circuits are incorporated within a stored-program computer, the result is a machine where the programmer can design both the software and the hardware that will execute that software. This paper first surveys this area of custom computing. It then describes a new custom computing architecture which uses a processing node with three sections: a standard arithmetic chip, static RAM and reconfigurable logic for operand handling. Finally an analysis of the suitability of this new approach for implementation of DSP applications shows it to be worthy of further investigation

    Digital Low Level RF

    Get PDF
    The demand on high stability and precision on the RF voltage for modern accelerators, as well as better diagnostics, maintenance and flexibility is driving the community to develop Digital Low Level RF systems (DLLRF) for both linear accelerators and synchrotrons. The state of the art in digital technologies applied to DLLRF systems is reviewed; different designs developed or in development at various laboratories are surveyed

    Mixed-TD: Efficient Neural Network Accelerator with Layer-Specific Tensor Decomposition

    Full text link
    Neural Network designs are quite diverse, from VGG-style to ResNet-style, and from Convolutional Neural Networks to Transformers. Towards the design of efficient accelerators, many works have adopted a dataflow-based, inter-layer pipelined architecture, with a customised hardware towards each layer, achieving ultra high throughput and low latency. The deployment of neural networks to such dataflow architecture accelerators is usually hindered by the available on-chip memory as it is desirable to preload the weights of neural networks on-chip to maximise the system performance. To address this, networks are usually compressed before the deployment through methods such as pruning, quantization and tensor decomposition. In this paper, a framework for mapping CNNs onto FPGAs based on a novel tensor decomposition method called Mixed-TD is proposed. The proposed method applies layer-specific Singular Value Decomposition (SVD) and Canonical Polyadic Decomposition (CPD) in a mixed manner, achieving 1.73x to 10.29x throughput per DSP to state-of-the-art CNNs. Our work is open-sourced: https://github.com/Yu-Zhewen/Mixed-TDComment: accepted by FPL202

    Comparing the performance of FPGA-based custom computers with general-purpose computers for DSP applications

    Get PDF
    When FPGA logic circuits are incorporated within a stored-program computer, the result is a machine where the programmer can design both the software and the hardware that will execute that software. This paper first describes some of the more important custom computers, and their potential weakness as DSP implementation platforms. It then describes a new custom computing architecture which is specifically designed for efficient implementation of DSP algorithms. Finally, it presents a simple performance comparison of a number of DSP implementation alternatives, and concludes that the new custom computing architecture is worthy of further investigation, and that custom computers based only on FPGA execution units show little performance improvement over state-of-the-art workstations

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    Establishing the design knowledge for emerging interaction platforms

    Get PDF
    While awaiting a variety of innovative interactive products and services to appear in the market in the near future such as interactive tabletops, interactive TVs, public multi-touch walls, and other embedded appliances, this paper calls for preparation for the arrival of such interactive platforms based on their interactivity. We advocate studying, understanding and establishing the foundation for interaction characteristics and affordances and design implications for these platforms which we know will soon emerge and penetrate our everyday lives. We review some of the archetypal interaction platform categories of the future and highlight the current status of the design knowledge-base accumulated to date and the current rate of growth for each of these. We use example designs illustrating design issues and considerations based on the authors’ 12-year experience in pioneering novel applications in various forms and styles
    • …
    corecore