Search CORE

610 research outputs found

CPU-GPU hybrid parallel binomial American option pricing

Author: Lei Chi-Un
Lim Enggee
Man K. L.
Zhang Nan
Publication venue
Publication date: 01/01/2012
Field of study

We present in this paper a novel parallel binomial algorithm that computes the price of an American option. The algorithm partitions a binomial tree constructed for the pricing into blocks of multiple levels of nodes, and assigns each such block to multiple processors. Each of the processors then computes the option's values at its assigned nodes in two phases. The algorithm is implemented and tested on a heterogeneous system consisting of an Intel multi-core processor and a NVIDIA GPU. The whole task is split and divided over and the CPU and GPU so that the computations are performed on the two processors simultaneously. In the hybrid processing, the GPU is always assigned the last part of a block, and makes use of a couple of buffers in the on-chip shared memory to reduce the number of accesses to the off-chip device memory. The performance of the hybrid processing is compared with an optimised CPU serial code, a CPU parallel implementation and a GPU standalone program.published_or_final_versio

HKU Scholars Hub

Randomized Binomial Tree and Pricing of American-Style Options

Author: Cao Jie
Hu Xiaoping
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2014
Field of study

Randomized binomial tree and methods for pricing American options were studied. Firstly, both the completeness and the no-arbitrage conditions in the randomized binomial tree market were proved. Secondly, the description of the node was given, and the cubic polynomial relationship between the number of nodes and the time steps was also obtained. Then, the characteristics of paths and storage structure of the randomized binomial tree were depicted. Then, the procedure and method for pricing American-style options were given in a random binomial tree market. Finally, a numerical example pricing the American option was illustrated, and the sensitivity analysis of parameter was carried out. The results show that the impact of the occurrence probability of the random binomial tree environment on American option prices is very significant. With the traditional complete market characteristics of random binary and a stronger ability to describe, at the same time, maintaining a computational feasibility, randomized binomial tree is a kind of promising method for pricing financial derivatives

Crossref

Directory of Open Access Journals

Automatic generation of high-throughput systolic tree-based solvers for modern FPGAs

Author: Tavakkoli Aryan
Publication venue: Electrical and Electronic Engineering, Imperial College London
Publication date: 01/06/2019
Field of study

Tree-based models are a class of numerical methods widely used in financial option pricing, which have a computational complexity that is quadratic with respect to the solution accuracy. Previous research has employed reconfigurable computing with small degrees of parallelism to provide faster hardware solutions compared with general-purpose processing software designs. However, due to the nature of their vector hardware architectures, they cannot scale their compute resources efficiently, leaving them with pricing latency figures which are quadratic with respect to the problem size, and hence to the solution accuracy. Also, their solutions are not productive as they require hardware engineering effort, and can only solve one type of tree problems, known as the standard American option. This thesis presents a novel methodology in the form of a high-level design framework which can capture any common tree-based problem, and automatically generates high-throughput field-programmable gate array (FPGA) solvers based on proposed scalable hardware architectures. The thesis has made three main contributions. First, systolic architectures were proposed for solving binomial and trinomial trees, which due to their custom systolic data-movement mechanisms, can scale their compute resources efficiently to provide linear latency scaling for medium-size trees and improved quadratic latency scaling for large trees. Using the proposed systolic architectures, throughput speed-ups of up to 5.6X and 12X were achieved for modern FPGAs, compared to previous vector designs, for medium and large trees, respectively. Second, a productive high-level design framework was proposed, that can capture any common binomial and trinomial tree problem, and a methodology was suggested to generate high-throughput systolic solvers with custom data precision, where the methodology requires no hardware design effort from the end user. Third, a fully-automated tool-chain methodology was proposed that, compared to previous tree-based solvers, improves user productivity by removing the manual engineering effort of applying the design framework to option pricing problems. Using the productive design framework, high-throughput systolic FPGA solvers have been automatically generated from simple end-user C descriptions for several tree problems, such as American, Bermudan, and barrier options.Open Acces

Spiral - Imperial College Digital Repository

Recommended from our members

Minimizing the Cost of Innovative Nuclear Technology Through Flexibility: The Case of a Demonstration Accelerator-Driven Subcritical Reactor Park

Author: Cardin MA
de Neufville R
Gonçalves LVN
Nuttall WJ
Parks GT
Steer SJ
Publication venue: Faculty of Economics
Publication date: 01/08/2010
Field of study

Presented is a methodology to analyze the expected Levelised Cost Of Electricity (LCOE) in the face of technology uncertainty for Accelerator-Driven Subcritical Reactors (ADSRs). It shows that flexibility in the design and deployment strategy of an ADSR park demonstrator significantly reduces its expected LCOE. The methodology recognizes in the conceptual design a range of possible technological outcomes for the ADSR accelerator system. It identifies flexibility “on” and “in” the design to modify the future development path in light of such uncertain scenarios. Uncertainty and flexibility are incorporated in the ADSR valuation. The resulting economic assessment is more realistic than typical discounted cash flow analysis that does not consider a range of development outcomes, or the flexibility to change development path

Apollo (Cambridge)

CUED - Cambridge University Engineering Department

NanoStreams: A Microserver Architecture for Real-time Analytics on Fast Data Streams

Author: Barber P.
Bilos A.
Georgakoudis G.
Gillan C.
Kaloutsakis S.
Minhas U. I.
Nikolopoulos D. S.
Russell M.
Woods R.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2018
Field of study

Queen's University Belfast Research Portal

Crossref

Binomial American Option Pricing on CPU-GPU Hetergenous System

Author: Chi-Un Lei
Ka Lok Man
Nan Zhang
Publication venue
Publication date: 03/04/2020
Field of study

Abstract-We present a novel parallel binomial algorithm to compute prices of American options. The algorithm partitions a binomial tree into blocks of multiple levels of nodes, and assigns each such block to multiple processors. Each processor in parallel with the others computes the option's values at nodes assigned to it. The computation consists of two phases, where the second phase can not start until the valuation in the first phase has been completed. The algorithm is implemented and tested on a heterogeneous system consisting of an Intel multicore processor and a NVIDIA GPU. The whole task is split and divided over the CPU and GPU so that the computations are performed on the two processors simultaneously. In the hybrid processing, the GPU is always assigned the last part of a block, and makes use of a couple of buffers in the on-chip shared memory to reduce the number of accesses to the off-chip device memory. The performance of the hybrid processing is compared with an optimised CPU serial code, a CPU parallel implementation and a GPU standalone program. We learned from the experiments that the lack of explicit mechanism in CUDA for synchronising CPU and GPU executions is a major obstacle for the hybrid processing to achieve high performance

CiteSeerX

Market-Based Scheduling in Distributed Computing Systems

Author: Stößer Jochen
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

In verteilten Rechensystemen (bspw. im Cluster und Grid Computing) kann eine Knappheit der zur Verfügung stehenden Ressourcen auftreten. Hier haben Marktmechanismen das Potenzial, Ressourcenbedarf und -angebot durch geeignete Anreizmechanismen zu koordinieren und somit die ökonomische Effizienz des Gesamtsystems zu steigern. Diese Arbeit beschäftigt sich anhand vier spezifischer Anwendungsszenarien mit der Frage, wie Marktmechanismen für verteilte Rechensysteme ausgestaltet sein sollten

KITopen

Evaluating Multicore Algorithms on the Unified Memory Model

Author: Savage John E.
Zubair Mohammad
Publication venue: ODU Digital Commons
Publication date: 01/01/2009
Field of study

One of the challenges to achieving good performance on multicore architectures is the effective utilization of the underlying memory hierarchy. While this is an issue for single-core architectures, it is a critical problem for multicore chips. In this paper, we formulate the unified multicore model (UMM) to help understand the fundamental limits on cache performance on these architectures. The UMM seamlessly handles different types of multiple-core processors with varying degrees of cache sharing at different levels. We demonstrate that our model can be used to study a variety of multicore architectures on a variety of applications. In particular, we use it to analyze an option pricing problem using the trinomial model and develop an algorithm for it that has near-optimal memory traffic between cache levels. We have implemented the algorithm on a two Quad-Core Intel Xeon 5310 1.6 GHz processors (8 cores). It achieves a peak performance of 19.5 GFLOPs, which is 38% of the theoretical peak of the multicore system. We demonstrate that our algorithm outperforms compiler-optimized and auto-parallelized code by a factor of up to 7.5

Directory of Open Access Journals

Old Dominion University

Evaluating Multicore Algorithms on the Unified Memory Model

Author: John E. Savage
Mohammad Zubair
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2009
Field of study

Crossref

Directory of Open Access Journals

Old Dominion University