475 research outputs found

    FPGA acceleration of structured-mesh-based explicit and implicit numerical solvers using SYCL

    Get PDF
    We explore the design and development of structured-mesh based solvers on current Intel FPGA hardware using the SYCL programming model. Two classes of applications are targeted : (1) stencil applications based on explicit numerical methods and (2) multidimensional tridiagonal solvers based on implicit methods. Both classes of solvers appear as core modules in a wide-range of realworld applications ranging from CFD to financial computing. A general, unified workflow is formulated for synthesizing them on Intel FPGAs together with predictive analytic models to explore the design space to obtain near-optimal performance. Performance of synthesized designs, using the above techniques, for two non-trivial applications on an Intel PAC D5005 FPGA card is benchmarked. Results are compared to performance of optimized parallel implementations of the same applications on a Nvidia V100 GPU. Observed runtime results indicate the FPGA providing better or matching performance to the V100 GPU. However, more importantly the FPGA solutions provide 59%-76% less energy consumption for their largest configurations, making them highly attractive for solving workloads based on these applications in production settings. The performance model predicts the runtime of designs with high accuracy with less than 5% error for all cases tested, demonstrating their significant utility for design space explorations. With these tools and techniques, we discuss determinants for a given structuredmesh code to be amenable to FPGA implementation, providing insights into the feasibility and profitability of a design, how they can be codified using SYCL and the resulting performance

    High throughput multidimensional tridiagonal system solvers on FPGAs

    Get PDF
    We present a high performance tridiagonal solver library for Xilinx FPGAs optimized for multiple multi-dimensional systems common in real-world applications. An analytical performance model is developed and used to explore the design space and obtain rapid performance estimates that are over 85% accurate. This library achieves an order of magnitude better performance when solving large batches of systems than previous FPGA work. A detailed comparison with a current state-of-the-art GPU library for multi-dimensional tridiagonal systems on an Nvidia V100 GPU shows the FPGA achieving competitive or better runtime and significant energy savings of over 30%. Through this design, we learn lessons about the types of applications where FPGAs can challenge the current dominance of GPUs

    Rapid solution of problems by nuclear-magnetic-resonance quantum computation

    Get PDF
    We offer an improved method for using a nuclear-magnetic-resonance quantum computer (NMRQC) to solve the Deutsch-Jozsa problem. Two known obstacles to the application of the NMRQC are exponential diminishment of density-matrix elements with the number of bits, threatening weak signal levels, and the high cost of preparing a suitable starting state. A third obstacle is a heretofore unnoticed restriction on measurement operators available for use by an NMRQC. Variations on the function classes of the Deutsch-Jozsa problem are introduced, both to extend the range of problems advantageous for quantum computation and to escape all three obstacles to use of an NMRQC. By adapting it to one such function class, the Deutsch-Jozsa problem is made solvable without exponential loss of signal. The method involves an extra work bit and a polynomially more involved Oracle; it uses the thermal-equilibrium density matrix systematically for an arbitrary number of spins, thereby avoiding both the preparation of a pseudopure state and temporal averaging.Comment: 19 page
    • …
    corecore