Search CORE

8,239 research outputs found

The GPU vs Phi Debate: Risk Analytics Using Many-Core Computing

Author: Varghese Blesson
Publication venue: 'Elsevier BV'
Publication date: 26/01/2015
Field of study

The risk of reinsurance portfolios covering globally occurring natural catastrophes, such as earthquakes and hurricanes, is quantified by employing simulations. These simulations are computationally intensive and require large amounts of data to be processed. The use of many-core hardware accelerators, such as the Intel Xeon Phi and the NVIDIA Graphics Processing Unit (GPU), are desirable for achieving high-performance risk analytics. In this paper, we set out to investigate how accelerators can be employed in risk analytics, focusing on developing parallel algorithms for Aggregate Risk Analysis, a simulation which computes the Probable Maximum Loss of a portfolio taking both primary and secondary uncertainties into account. The key result is that both hardware accelerators are useful in different contexts; without taking data transfer times into account the Phi had lowest execution times when used independently and the GPU along with a host in a hybrid platform yielded best performance.Comment: A modified version of this article is accepted to the Computers and Electrical Engineering Journal under the title - "The Hardware Accelerator Debate: A Financial Risk Case Study Using Many-Core Computing"; Blesson Varghese, "The Hardware Accelerator Debate: A Financial Risk Case Study Using Many-Core Computing," Computers and Electrical Engineering, 201

arXiv.org e-Print Archive

CiteSeerX

University of St. Andrews - Pure

St Andrews Research Repository

Solving the global atmospheric equations through heterogeneous reconfigurable platforms

Author: Fu H
Gan L
Huang X
Luk W
Xue W
Yang C
Yang G
Zhang Y
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/03/2014
Field of study

Spiral - Imperial College Digital Repository

GPU in Physics Computation: Case Geant4 Navigation

Author: Kommeri Jukka
Niemi Tapio
Seiskari Otto
Publication venue
Publication date: 24/09/2012
Field of study

General purpose computing on graphic processing units (GPU) is a potential method of speeding up scientific computation with low cost and high energy efficiency. We experimented with the particle physics simulation toolkit Geant4 used at CERN to benchmark its geometry navigation functionality on a GPU. The goal was to find out whether Geant4 physics simulations could benefit from GPU acceleration and how difficult it is to modify Geant4 code to run in a GPU. We ported selected parts of Geant4 code to C99 & CUDA and implemented a simple gamma physics simulation utilizing this code to measure efficiency. The performance of the program was tested by running it on two different platforms: NVIDIA GeForce 470 GTX GPU and a 12-core AMD CPU system. Our conclusion was that GPUs can be a competitive alternate for multi-core computers but porting existing software in an efficient way is challenging

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server

Performance analysis of direct N-body algorithms for astrophysical simulations on distributed systems

Author: Alessia Gualandris
Alfredo Tirado-Ramos
Cohn
Dorband
Foster
Foster
Giersz
Louis
Makino
Makino
Makino
Makino
Makino
Murphy
Plummer
Simon Portegies Zwart
Publication venue: 'Elsevier BV'
Publication date: 09/12/2004
Field of study

We discuss the performance of direct summation codes used in the simulation of astrophysical stellar systems on highly distributed architectures. These codes compute the gravitational interaction among stars in an exact way and have an O(N^2) scaling with the number of particles. They can be applied to a variety of astrophysical problems, like the evolution of star clusters, the dynamics of black holes, the formation of planetary systems, and cosmological simulations. The simulation of realistic star clusters with sufficiently high accuracy cannot be performed on a single workstation but may be possible on parallel computers or grids. We have implemented two parallel schemes for a direct N-body code and we study their performance on general purpose parallel computers and large computational grids. We present the results of timing analyzes conducted on the different architectures and compare them with the predictions from theoretical models. We conclude that the simulation of star clusters with up to a million particles will be possible on large distributed computers in the next decade. Simulating entire galaxies however will in addition require new hybrid methods to speedup the calculation.Comment: 22 pages, 8 figures, accepted for publication in Parallel Computin

arXiv.org e-Print Archive

Crossref

Surrey Research Insight

International Migration, Integration and Social Cohesion online publications

Parallel Simulations for Analysing Portfolios of Catastrophic Event Risk

Author: Bahl Aman
Baltzer Oliver
Rau-Chaplin Andrew
Varghese Blesson
Publication venue
Publication date: 01/01/2012
Field of study

At the heart of the analytical pipeline of a modern quantitative insurance/reinsurance company is a stochastic simulation technique for portfolio risk analysis and pricing process referred to as Aggregate Analysis. Support for the computation of risk measures including Probable Maximum Loss (PML) and the Tail Value at Risk (TVAR) for a variety of types of complex property catastrophe insurance contracts including Cat eXcess of Loss (XL), or Per-Occurrence XL, and Aggregate XL, and contracts that combine these measures is obtained in Aggregate Analysis. In this paper, we explore parallel methods for aggregate risk analysis. A parallel aggregate risk analysis algorithm and an engine based on the algorithm is proposed. This engine is implemented in C and OpenMP for multi-core CPUs and in C and CUDA for many-core GPUs. Performance analysis of the algorithm indicates that GPUs offer an alternative HPC solution for aggregate risk analysis that is cost effective. The optimised algorithm on the GPU performs a 1 million trial aggregate simulation with 1000 catastrophic events per trial on a typical exposure set and contract structure in just over 20 seconds which is approximately 15x times faster than the sequential counterpart. This can sufficiently support the real-time pricing scenario in which an underwriter analyses different contractual terms and pricing while discussing a deal with a client over the phone.Comment: Proceedings of the Workshop at the International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2012, 8 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

A Domain Specific Approach to High Performance Heterogeneous Computing

Author: Inggs Gordon
Luk Wayne
Thomas David B.
Publication venue
Publication date: 14/03/2016
Field of study

Users of heterogeneous computing systems face two problems: firstly, in understanding the trade-off relationships between the observable characteristics of their applications, such as latency and quality of the result, and secondly, how to exploit knowledge of these characteristics to allocate work to distributed computing platforms efficiently. A domain specific approach addresses both of these problems. By considering a subset of operations or functions, models of the observable characteristics or domain metrics may be formulated in advance, and populated at run-time for task instances. These metric models can then be used to express the allocation of work as a constrained integer program, which can be solved using heuristics, machine learning or Mixed Integer Linear Programming (MILP) frameworks. These claims are illustrated using the example domain of derivatives pricing in computational finance, with the domain metrics of workload latency or makespan and pricing accuracy. For a large, varied workload of 128 Black-Scholes and Heston model-based option pricing tasks, running upon a diverse array of 16 Multicore CPUs, GPUs and FPGAs platforms, predictions made by models of both the makespan and accuracy are generally within 10% of the run-time performance. When these models are used as inputs to machine learning and MILP-based workload allocation approaches, a latency improvement of up to 24 and 270 times over the heuristic approach is seen.Comment: 14 pages, preprint draft, minor revisio

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

Spiral - Imperial College Digital Repository