Search CORE

765 research outputs found

Accelerating the Fourier split operator method via graphics processing units

Author: Bauke Heiko
Keitel Christoph H.
Publication venue: 'Elsevier BV'
Publication date: 17/12/2010
Field of study

Current generations of graphics processing units have turned into highly parallel devices with general computing capabilities. Thus, graphics processing units may be utilized, for example, to solve time dependent partial differential equations by the Fourier split operator method. In this contribution, we demonstrate that graphics processing units are capable to calculate fast Fourier transforms much more efficiently than traditional central processing units. Thus, graphics processing units render efficient implementations of the Fourier split operator method possible. Performance gains of more than an order of magnitude as compared to implementations for traditional central processing units are reached in the solution of the time dependent Schr\"odinger equation and the time dependent Dirac equation

arXiv.org e-Print Archive

MPG.PuRe

Pricing Early-Exercise and Discrete Barrier Options by Fourier-Cosine Series Expansions

Author: Fang Fang
Oosterlee Kees
Publication venue
Publication date
Field of study

We present a pricing method based on Fourier-cosine expansions for early-exercise and discretely-monitored barrier options. The method works well for exponential Levy asset price models. The error convergence is exponential for processes characterized by very smooth transitional probability density functions. The computational complexity is

O((M-1) N \log{N})

with

N

a (small) number of terms from the series expansion, and

M

, the number of early-exercise/monitoring dates.

Research Papers in Economics

Binomial American Option Pricing on CPU-GPU Hetergenous System

Author: Chi-Un Lei
Ka Lok Man
Nan Zhang
Publication venue
Publication date: 03/04/2020
Field of study

Abstract-We present a novel parallel binomial algorithm to compute prices of American options. The algorithm partitions a binomial tree into blocks of multiple levels of nodes, and assigns each such block to multiple processors. Each processor in parallel with the others computes the option's values at nodes assigned to it. The computation consists of two phases, where the second phase can not start until the valuation in the first phase has been completed. The algorithm is implemented and tested on a heterogeneous system consisting of an Intel multicore processor and a NVIDIA GPU. The whole task is split and divided over the CPU and GPU so that the computations are performed on the two processors simultaneously. In the hybrid processing, the GPU is always assigned the last part of a block, and makes use of a couple of buffers in the on-chip shared memory to reduce the number of accesses to the off-chip device memory. The performance of the hybrid processing is compared with an optimised CPU serial code, a CPU parallel implementation and a GPU standalone program. We learned from the experiments that the lack of explicit mechanism in CUDA for synchronising CPU and GPU executions is a major obstacle for the hybrid processing to achieve high performance

CiteSeerX

Option Pricing on the GPU with Backward Stochastic Differential Equation

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Improving the management efficiency of GPU workloads in data centers through GPU virtualization

Author: Iserte Sergio
Prades Javier
Reaño González Carlos
Silla Federico
Publication venue: 'Wiley'
Publication date: 25/01/2021
Field of study

[EN] Graphics processing units (GPUs) are currently used in data centers to reduce the execution time of compute-intensive applications. However, the use of GPUs presents several side effects, such as increased acquisition costs and larger space requirements. Furthermore, GPUs require a nonnegligible amount of energy even while idle. Additionally, GPU utilization is usually low for most applications. In a similar way to the use of virtual machines, using virtual GPUs may address the concerns associated with the use of these devices. In this regard, the remote GPU virtualization mechanism could be leveraged to share the GPUs present in the computing facility among the nodes of the cluster. This would increase overall GPU utilization, thus reducing the negative impact of the increased costs mentioned before. Reducing the amount of GPUs installed in the cluster could also be possible. However, in the same way as job schedulers map GPU resources to applications, virtual GPUs should also be scheduled before job execution. Nevertheless, current job schedulers are not able to deal with virtual GPUs. In this paper, we analyze the performance attained by a cluster using the remote Compute Unified Device Architecture middleware and a modified version of the Slurm scheduler, which is now able to assign remote GPUs to jobs. Results show that cluster throughput, measured as jobs completed per time unit, is doubled at the same time that the total energy consumption is reduced up to 40%. GPU utilization is also increased.Generalitat Valenciana, Grant/Award Number: PROMETEO/2017/077; MINECO and FEDER, Grant/Award Number: TIN2014-53495-R, TIN2015-65316-P and TIN2017-82972-RIserte, S.; Prades, J.; Reaño González, C.; Silla, F. (2021). Improving the management efficiency of GPU workloads in data centers through GPU virtualization. Concurrency and Computation: Practice and Experience. 33(2):1-16. https://doi.org/10.1002/cpe.5275S11633

RiuNet

Schnelle Löser für partielle Differentialgleichungen

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2008
Field of study

[no abstract available

Repositorium für Naturwissenschaften und Technik

Automatic generation of high-throughput systolic tree-based solvers for modern FPGAs

Author: Tavakkoli Aryan
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/06/2019
Field of study

Tree-based models are a class of numerical methods widely used in financial option pricing, which have a computational complexity that is quadratic with respect to the solution accuracy. Previous research has employed reconfigurable computing with small degrees of parallelism to provide faster hardware solutions compared with general-purpose processing software designs. However, due to the nature of their vector hardware architectures, they cannot scale their compute resources efficiently, leaving them with pricing latency figures which are quadratic with respect to the problem size, and hence to the solution accuracy. Also, their solutions are not productive as they require hardware engineering effort, and can only solve one type of tree problems, known as the standard American option. This thesis presents a novel methodology in the form of a high-level design framework which can capture any common tree-based problem, and automatically generates high-throughput field-programmable gate array (FPGA) solvers based on proposed scalable hardware architectures. The thesis has made three main contributions. First, systolic architectures were proposed for solving binomial and trinomial trees, which due to their custom systolic data-movement mechanisms, can scale their compute resources efficiently to provide linear latency scaling for medium-size trees and improved quadratic latency scaling for large trees. Using the proposed systolic architectures, throughput speed-ups of up to 5.6X and 12X were achieved for modern FPGAs, compared to previous vector designs, for medium and large trees, respectively. Second, a productive high-level design framework was proposed, that can capture any common binomial and trinomial tree problem, and a methodology was suggested to generate high-throughput systolic solvers with custom data precision, where the methodology requires no hardware design effort from the end user. Third, a fully-automated tool-chain methodology was proposed that, compared to previous tree-based solvers, improves user productivity by removing the manual engineering effort of applying the design framework to option pricing problems. Using the productive design framework, high-throughput systolic FPGA solvers have been automatically generated from simple end-user C descriptions for several tree problems, such as American, Bermudan, and barrier options.Open Acces

Spiral - Imperial College Digital Repository