2,192 research outputs found

    Empowering parallel computing with field programmable gate arrays

    Get PDF
    After more than 30 years, reconfigurable computing has grown from a concept to a mature field of science and technology. The cornerstone of this evolution is the field programmable gate array, a building block enabling the configuration of a custom hardware architecture. The departure from static von Neumannlike architectures opens the way to eliminate the instruction overhead and to optimize the execution speed and power consumption. FPGAs now live in a growing ecosystem of development tools, enabling software programmers to map algorithms directly onto hardware. Applications abound in many directions, including data centers, IoT, AI, image processing and space exploration. The increasing success of FPGAs is largely due to an improved toolchain with solid high-level synthesis support as well as a better integration with processor and memory systems. On the other hand, long compile times and complex design exploration remain areas for improvement. In this paper we address the evolution of FPGAs towards advanced multi-functional accelerators, discuss different programming models and their HLS language implementations, as well as high-performance tuning of FPGAs integrated into a heterogeneous platform. We pinpoint fallacies and pitfalls, and identify opportunities for language enhancements and architectural refinements

    Fault tolerant architectures for integrated aircraft electronics systems, task 2

    Get PDF
    The architectural basis for an advanced fault tolerant on-board computer to succeed the current generation of fault tolerant computers is examined. The network error tolerant system architecture is studied with particular attention to intercluster configurations and communication protocols, and to refined reliability estimates. The diagnosis of faults, so that appropriate choices for reconfiguration can be made is discussed. The analysis relates particularly to the recognition of transient faults in a system with tasks at many levels of priority. The demand driven data-flow architecture, which appears to have possible application in fault tolerant systems is described and work investigating the feasibility of automatic generation of aircraft flight control programs from abstract specifications is reported

    Graph-based Performance Estimation on Customized MIPS Processors

    Get PDF
    The desire for greater processor performance with shrinking technologies and increasing heterogeneity, leads to a need for improvement in performance estimation. Being able to estimate the performance of an application without needing to implement the application on the available hardware and soft-core choices can decrease development time and help expedite the process of choosing which platform would be the best choice to use for development. This thesis work focuses on using a graph-based description of an application to estimate performance. By using a graph-based approach, the need for a hardware specific implementation is eliminated and the design space is simplified. Breaking down an application into a graph allows a new approach review to be taken as nodes of the graph can be assigned to levels in the pipelined architecture. This research uses pipelined customized Instruction Set Architecture (ISA) processors as the platform choice. The customized ISA soft-core processors allow the user more control over the resources used in the processor and provides a viable hardware/software choice to demonstrate the capabilities of the graph-based approach. The testcase applications used were the Dot Product, the Advanced Encryption Standard (AES) application, and the AES with TBox application. The results of this work show that performance can be accurately estimated on a customized processor using a graph-based approach for the application with accuracy ranging from approximately 75% to 89%

    High-level synthesis optimization for blocked floating-point matrix multiplication

    Get PDF
    In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and efficient architectures as well as detailed performance models have been developed. By design these IP cores take a fixed footprint which not necessarily optimizes the use of all available resources. Moreover, the low-level architectures are not easily amenable to a parameterized synthesis. In this paper high-level synthesis is used to fine-tune the configuration parameters in order to achieve the highest performance with maximal resource utilization. An\ exploration strategy is presented to optimize the use of critical resources (DSPs, memory) for any given FPGA. To account for the limited memory size on the FPGA, a block-oriented matrix multiplication is organized such that the block summation is done on the CPU while the block multiplication occurs on the logic fabric simultaneously. The communication overhead between the CPU and the FPGA is minimized by streaming the blocks in a Gray code ordering scheme which maximizes the data reuse for consecutive block matrix product calculations. Using high-level synthesis optimization, the programmable logic operates at 93% of the theoretical peak performance and the combined CPU-FPGA design achieves 76% of the available hardware processing speed for the floating-point multiplication of 2K by 2K matrices

    Development of Lifting-based VLSI Architectures for Two-Dimensional Discrete Wavelet Transform

    Get PDF
    Two-dimensional discrete wavelet transform (2-D DWT) has evolved as an essential part of a modem compression system. It offers superior compression with good image quality and overcomes disadvantage of the discrete cosine transform, which suffers from blocks artifacts that reduces the quality of the inage. The amount of computations involve in 2-D DWT is enormous and cannot be processed by generalpurpose processors when real-time processing is required. Th·"efore, high speed and low power VLSI architecture that computes 2-D DWT effectively is needed. In this research, several VLSI architectures have been developed that meets real-time requirements for 2-D DWT applications. This research iaitially started off by implementing a software simulation program that decorrelates the original image and reconstructs the original image from the decorrelated image. Then, based on the information gained from implementing the simulation program, a new approach for designing lifting-based VLSI architectures for 2-D forward DWT is introduced. As a result, two high performance VLSI architectures that perform 2-D DWT for 5/3 and 9/7 filters are developed based on overlapped and nonoverlapped scan methods. Then, the intermediate architecture is developed, which aim a·: reducing the power consumption of the overlapped areas without using the expensive line buffer. In order to best meet real-time applications of 2-D DWT with demanding requirements in terms of speed and throughput parallelism is explored. The single pipelined intermediate and overlapped architectures are extended to 2-, 3-, and 4-parallel architectures to achieve speed factors of 2, 3, and 4, respectively. To further demonstrate the effectiveness of the approach single and para.llel VLSI architectures for 2-D inverse discrete wavelet transform (2-D IDWT) are developed. Furthermore, 2-D DWT memory architectures, which have been overlooked in the literature, are also developed. Finally, to show the architectural models developed for 2-D DWT are simple to control, the control algorithms for 4-parallel architecture based on the first scan method is developed. To validate architectures develcped in this work five architectures are implemented and simulated on Altera FPGA. In compliance with the terms of the Copyright Act 1987 and the IP Policy of the university, the copyright of this thesis has been reassigned by the author to the legal entity of the university, Institute of Technology PETRONAS Sdn bhd. Due acknowledgement shall always be made of the use of any material contained in, or derived from, this thesis
    corecore