812 research outputs found

    High performance computing with FPGAs

    Get PDF
    Field-programmable gate arrays represent an army of logical units which can be organized in a highly parallel or pipelined fashion to implement an algorithm in hardware. The flexibility of this new medium creates new challenges to find the right processing paradigm which takes into account of the natural constraints of FPGAs: clock frequency, memory footprint and communication bandwidth. In this paper first use of FPGAs as a multiprocessor on a chip or its use as a highly functional coprocessor are compared, and the programming tools for hardware/software codesign are discussed. Next a number of techniques are presented to maximize the parallelism and optimize the data locality in nested loops. This includes unimodular transformations, data locality improving loop transformations and use of smart buffers. Finally, the use of these techniques on a number of examples is demonstrated. The results in the paper and in the literature show that, with the proper programming tool set, FPGAs can speedup computation kernels significantly with respect to traditional processors

    High Performance Reconfigurable Computing for Linear Algebra: Design and Performance Analysis

    Get PDF
    Field Programmable Gate Arrays (FPGAs) enable powerful performance acceleration for scientific computations because of their intrinsic parallelism, pipeline ability, and flexible architecture. This dissertation explores the computational power of FPGAs for an important scientific application: linear algebra. First of all, optimized linear algebra subroutines are presented based on enhancements to both algorithms and hardware architectures. Compared to microprocessors, these routines achieve significant speedup. Second, computing with mixed-precision data on FPGAs is proposed for higher performance. Experimental analysis shows that mixed-precision algorithms on FPGAs can achieve the high performance of using lower-precision data while keeping higher-precision accuracy for finding solutions of linear equations. Third, an execution time model is built for reconfigurable computers (RC), which plays an important role in performance analysis and optimal resource utilization of FPGAs. The accuracy and efficiency of parallel computing performance models often depend on mean maximum computations. Despite significant prior work, there have been no sufficient mathematical tools for this important calculation. This work presents an Effective Mean Maximum Approximation method, which is more general, accurate, and efficient than previous methods. Together, these research results help address how to make linear algebra applications perform better on high performance reconfigurable computing architectures

    Probabilistic Principle Component Analysis based Feature Extraction of Embedded System Applications with Deep Neural Network based Implementation in FPGA

    Get PDF
    The study of hardware and software systems is of major are very important advent in new devices for communication and progress in system of security. In fast pace mobile and embedded devices application in every day’s life leads some new emerging area for research in data mining field. In this we have some technologies which have demand and error free using the principle of component of PPCA. For Embedded system the applications of PCA is basically applied initially for the lessen the having different qualities especially being to simple of the data. PPCA which have the updated version of PCA which is surveyed by similarity measure. In this work, experiments are extensively carried out, using a FPGA based light weight cryptographic data set having benchmark set to check and illustrate the viability, competence, litheness which are reconfigurable embedded system which are having data mining . Which have FPGA are reconfigurable for the computing architectures for hardware and in neural network. FPGA using the multilayer Cascaded for neural network which are forward in nature (CFFNN) and Deep Neural Network also called as DNN with a huge neuron is still a thought-provoking task. This shortcoming leads to elect the FPGA capacity for a particular application we have used the method of implementation which has two neural network have been implemented and compared , namely, CFFNN and DNN. It can be shown that for reconfigurable embedded system, PPCA based data mining and Machine learning based realization can give more speed up less iteration and more space savings when we have compared it with the static conventional version

    Transformations of High-Level Synthesis Codes for High-Performance Computing

    Full text link
    Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

    Beehive: an FPGA-based multiprocessor architecture

    Get PDF
    In recent years, to accomplish with the Moore's law hardware and software designers are tending progressively to focus their efforts on exploiting instruction-level parallelism. Software simulation has been essential for studying computer architecture because of its flexibility and low cost. However, users of software simulators must choose between high performance and high fidelity emulation. This project presents an FPGA-based multiprocessor architecture to speed up multiprocessor architecture research and ease parallel software simulation

    Enabling Independent Communication for FPGAs in High Performance Computing

    Get PDF

    New benchmarking methodology and programming model for big data processing

    Get PDF
    Big data processing is becoming a reality in numerous real-world applications. With the emergence of new data intensive technologies and increasing amounts of data, new computing concepts are needed. The integration of big data producing technologies, such as wireless sensor networks, Internet of Things, and cloud computing, into cyber-physical systems is reducing the available time to find the appropriate solutions. This paper presents one possible solution for the coming exascale big data processing: a data flow computing concept. The performance of data flow systems that are processing big data should not be measured with the measures defined for the prevailing control flow systems. A new benchmarking methodology is proposed, which integrates the performance issues of speed, area, and power needed to execute the task. The computer ranking would look different if the new benchmarking methodologies were used; data flow systems would outperform control flow systems. This statement is backed by the recent results gained from implementations of specialized algorithms and applications in data flow systems. They show considerable factors of speedup, space savings, and power reductions regarding the implementations of the same in control flow computers. In our view, the next step of data flow computing development should be a move from specialized to more general algorithms and applications.Peer ReviewedPostprint (published version

    LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

    Get PDF
    LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft
    • …
    corecore