10 research outputs found

    The Potential for a GPU-Like Overlay Architecture for FPGAs

    Get PDF
    We propose a soft processor programming model and architecture inspired by graphics processing units (GPUs) that are well-matched to the strengths of FPGAs, namely, highly parallel and pipelinable computation. In particular, our soft processor architecture exploits multithreading, vector operations, and predication to supply a floating-point pipeline of 64 stages via hardware support for up to 256 concurrent thread contexts. The key new contributions of our architecture are mechanisms for managing threads and register files that maximize data-level and instruction-level parallelism while overcoming the challenges of port limitations of FPGA block memories as well as memory and pipeline latency. Through simulation of a system that (i) is programmable via NVIDIA's high-level Cg language, (ii) supports AMD's CTM r5xx GPU ISA, and (iii) is realizable on an XtremeData XD1000 FPGA-based accelerator system, we demonstrate the potential for such a system to achieve 100% utilization of a deeply pipelined floating-point datapath

    A Many-Core Overlay for High-Performance Embedded Computing on FPGAs

    Get PDF
    In this work, we propose a configurable many-core overlay for high-performance embedded computing. The size of internal memory, supported operations and number of ports can be configured independently for each core of the overlay. The overlay was evaluated with matrix multiplication, LU decomposition and Fast-Fourier Transform (FFT) on a ZYNQ-7020 FPGA platform. The results show that using a system-level many-core overlay avoids complex hardware design and still provides good performance results.Comment: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423

    Embedded System Architecture for Mobile Augmented Reality. Sailor Assistance Case Study

    Get PDF
    International audienceWith upcoming see-through displays new kinds of applications of Augmented Reality are emerging. However this also raises questions about the design of associated embedded systems that must be lightweight and handle object positioning, heterogeneous sensors, wireless communications as well as graphic computation. This paper studies the specific case of a promising Mobile AR processor, which is different from usual graphics applications. A complete architecture is described, designed and prototyped on FPGA. It includes hard-ware/software partitioning based on the analysis of application requirements. The specification of an original and flexible coprocessor is detailed. Choices as well as optimizations of algorithms are also described. Implementation results and performance evaluation show the relevancy of the proposed approach and demonstrate a new kind of architecture focused on object processing and optimized for the AR domain

    A Configurable Shared Scratchpad Memory for GPU-like Processors

    Get PDF
    During the last years Field Programmable Gate Arrays and Graphics Processing Units have become increasingly important for high-performance computing. In particular, a number of industrial solutions and academic projects are proposing design frameworks based on FPGA-implemented GPU-like compute units. Existing GPU-like core projects provide limited hardware support for shared scratch-pad memory and particularly for the problem of bank conflicts, a major source of performance loss with many parallel kernels. In this paper, we present a configurable, GPU-like oriented scratchpad memory with built-in support for bank remapping. The core is fully synthetizable on FPGA with a contained hardware cost. We also validated the presented architecture with a cycle-accurate event-driven emulator written in C++ as well as an RTL simulator tool. Last, we demonstrated the impact of bank remapping and other parameters available with the proposed configurable shared scratchpad memory by evaluating the performance of two real-world parallelized kernels

    A Novel Methodology for Calculating Large Numbers of Symmetrical Matrices on a Graphics Processing Unit: Towards Efficient, Real-Time Hyperspectral Image Processing

    Get PDF
    Hyperspectral imagery (HSI) is often processed to identify targets of interest. Many of the quantitative analysis techniques developed for this purpose mathematically manipulate the data to derive information about the target of interest based on local spectral covariance matrices. The calculation of a local spectral covariance matrix for every pixel in a given hyperspectral data scene is so computationally intensive that real-time processing with these algorithms is not feasible with today’s general purpose processing solutions. Specialized solutions are cost prohibitive, inflexible, inaccessible, or not feasible for on-board applications. Advances in graphics processing unit (GPU) capabilities and programmability offer an opportunity for general purpose computing with access to hundreds of processing cores in a system that is affordable and accessible. The GPU also offers flexibility, accessibility and feasibility that other specialized solutions do not offer. The architecture for the NVIDIA GPU used in this research is significantly different from the architecture of other parallel computing solutions. With such a substantial change in architecture it follows that the paradigm for programming graphics hardware is significantly different from traditional serial and parallel software development paradigms. In this research a methodology for mapping an HSI target detection algorithm to the NVIDIA GPU hardware and Compute Unified Device Architecture (CUDA) Application Programming Interface (API) is developed. The RX algorithm is chosen as a representative stochastic HSI algorithm that requires the calculation of a spectral covariance matrix. The developed methodology is designed to calculate a local covariance matrix for every pixel in the input HSI data scene. A characterization of the limitations imposed by the chosen GPU is given and a path forward toward optimization of a GPU-based method for real-time HSI data processing is defined

    The Potential for a GPU-Like Overlay Architecture for FPGAs

    No full text
    We propose a soft processor programmingmodel and architecture inspired by graphics processing units(GPUs) that are well-matched to the strengths of FPGAs,namely, highly parallel and pipelinable computation. Inparticular, our soft processor architecture exploits multithreading,vector operations, and predication to supply afloating-point pipeline of 64 stages via hardware supportfor up to 256 concurrent thread contexts. The key newcontributions of our architecture are mechanisms for managingthreads and register files that maximize data-level andinstruction-level parallelism while overcoming the challengesof port limitations of FPGA block memories as well asmemory and pipeline latency. Through simulation of asystem that (i) is programmable via NVIDIA's high-levelCg language, (ii) supports AMD's CTM r5xx GPU ISA, and(iii) is realizable on an XtremeData XD1000 FPGA-basedaccelerator system, we demonstrate the potential for sucha system to achieve 100% utilization of a deeply pipelinedfloating-point datapath.Peer Reviewe
    corecore