467 research outputs found

    Accelerating Reconfigurable Financial Computing

    Get PDF
    This thesis proposes novel approaches to the design, optimisation, and management of reconfigurable computer accelerators for financial computing. There are three contributions. First, we propose novel reconfigurable designs for derivative pricing using both Monte-Carlo and quadrature methods. Such designs involve exploring techniques such as control variate optimisation for Monte-Carlo, and multi-dimensional analysis for quadrature methods. Significant speedups and energy savings are achieved using our Field-Programmable Gate Array (FPGA) designs over both Central Processing Unit (CPU) and Graphical Processing Unit (GPU) designs. Second, we propose a framework for distributing computing tasks on multi-accelerator heterogeneous clusters. In this framework, different computational devices including FPGAs, GPUs and CPUs work collaboratively on the same financial problem based on a dynamic scheduling policy. The trade-off in speed and in energy consumption of different accelerator allocations is investigated. Third, we propose a mixed precision methodology for optimising Monte-Carlo designs, and a reduced precision methodology for optimising quadrature designs. These methodologies enable us to optimise throughput of reconfigurable designs by using datapaths with minimised precision, while maintaining the same accuracy of the results as in the original designs

    Automatic generation of high-throughput systolic tree-based solvers for modern FPGAs

    Get PDF
    Tree-based models are a class of numerical methods widely used in financial option pricing, which have a computational complexity that is quadratic with respect to the solution accuracy. Previous research has employed reconfigurable computing with small degrees of parallelism to provide faster hardware solutions compared with general-purpose processing software designs. However, due to the nature of their vector hardware architectures, they cannot scale their compute resources efficiently, leaving them with pricing latency figures which are quadratic with respect to the problem size, and hence to the solution accuracy. Also, their solutions are not productive as they require hardware engineering effort, and can only solve one type of tree problems, known as the standard American option. This thesis presents a novel methodology in the form of a high-level design framework which can capture any common tree-based problem, and automatically generates high-throughput field-programmable gate array (FPGA) solvers based on proposed scalable hardware architectures. The thesis has made three main contributions. First, systolic architectures were proposed for solving binomial and trinomial trees, which due to their custom systolic data-movement mechanisms, can scale their compute resources efficiently to provide linear latency scaling for medium-size trees and improved quadratic latency scaling for large trees. Using the proposed systolic architectures, throughput speed-ups of up to 5.6X and 12X were achieved for modern FPGAs, compared to previous vector designs, for medium and large trees, respectively. Second, a productive high-level design framework was proposed, that can capture any common binomial and trinomial tree problem, and a methodology was suggested to generate high-throughput systolic solvers with custom data precision, where the methodology requires no hardware design effort from the end user. Third, a fully-automated tool-chain methodology was proposed that, compared to previous tree-based solvers, improves user productivity by removing the manual engineering effort of applying the design framework to option pricing problems. Using the productive design framework, high-throughput systolic FPGA solvers have been automatically generated from simple end-user C descriptions for several tree problems, such as American, Bermudan, and barrier options.Open Acces

    High Performance and Low Power Monte Carlo Methods to Option Pricing Models via High Level Design and Synthesis

    Get PDF
    This article compares the performance and energy consumption of GPUs and FPGAs via implementing financial market models. The case studies used in this comparison are the Black-Scholes model and the Heston model for option pricing problems, which are analyzed numerically by Monte Carlo method. The algorithms are computationally intensive but not memory-intensive and thus well suited for FPGA implementation. High-level synthesis was performed starting from parallel models written in OpenCL and then various micro-architectures were explored and optimized on FPGAs. The final implementations of both models to several options on FPGAs achieved the best parallel acceleration systems, in terms of both performance-per-operation and energy-per-operation, compared not only to the kernels on advanced GPUs but also to the RTL implementations found in the literatures

    Accelerating Quadrature Methods for Option Valuation

    Full text link

    Energy-Efficient FPGA Implementation for Binomial Option Pricing Using OpenCL

    Get PDF
    International audienceEnergy efficiency of financial computations is a performance criterion that can no longer be dismissed, and is as crucial as raw acceleration and accuracy of the solution. In order to reduce the energy consumption of financial accelerators, FPGAs offer a good compromise with low power consumption and high parallelism. However, designing and prototyping an application on an FPGA-based platform are typically very time-consuming and requires significant skills in hardware design. This issue constitutes a major drawback with respect to software-centric acceleration platforms and approaches. A high-level approach has been chosen, using Altera’s implementation of the OpenCL standard, to answer this issue. We present two FPGA implementations of the binomial option pricing model on American options. The results obtained on a Terasic DE4 - Stratix IV board form a solid basis to hold all the constraints necessary for a real world application. The best implementation can evaluate more than 2000 options/s with an average power of less than 20W

    Low power and high performance heterogeneous computing on FPGAs

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Optimising runtime reconfigurable designs for high performance applications

    Get PDF
    This thesis proposes novel optimisations for high performance runtime reconfigurable designs. For a reconfigurable design, the proposed approach investigates idle resources introduced by static design approaches, and exploits runtime reconfiguration to eliminate the inefficient resources. The approach covers the circuit level, the function level, and the system level. At the circuit level, a method is proposed for tuning reconfigurable designs with two analytical models: a resource model for computational and memory resources and memory bandwidth, and a performance model for estimating execution time. This method is applied to tuning implementations of finite-difference algorithms, optimising arithmetic operators and memory bandwidth based on algorithmic parameters, and eliminating idle resources by runtime reconfiguration. At the function level, a method is proposed to automatically identify and exploit runtime reconfiguration opportunities while optimising resource utilisation. The method is based on Reconfiguration Data Flow Graph, a new hierarchical graph structure enabling runtime reconfigurable designs to be synthesised in three steps: function analysis, configuration organisation, and runtime solution generation. At the system level, a method is proposed for optimising reconfigurable designs by dynamically adapting the designs to available runtime resources in a reconfigurable system. This method includes two steps: compile-time optimisation and runtime scaling, which enable efficient workload distribution, asynchronous communication scheduling, and domain-specific optimisations. It can be used in developing effective servers for high performance applications.Open Acces
    • …
    corecore