2,187 research outputs found
High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm
We implement a master-slave parallel genetic algorithm (PGA) with a bespoke
log-likelihood fitness function to identify emergent clusters within price
evolutions. We use graphics processing units (GPUs) to implement a PGA and
visualise the results using disjoint minimal spanning trees (MSTs). We
demonstrate that our GPU PGA, implemented on a commercially available general
purpose GPU, is able to recover stock clusters in sub-second speed, based on a
subset of stocks in the South African market. This represents a pragmatic
choice for low-cost, scalable parallel computing and is significantly faster
than a prototype serial implementation in an optimised C-based
fourth-generation programming language, although the results are not directly
comparable due to compiler differences. Combined with fast online intraday
correlation matrix estimation from high frequency data for cluster
identification, the proposed implementation offers cost-effective,
near-real-time risk assessment for financial practitioners.Comment: 10 pages, 5 figures, 4 tables, More thorough discussion of
implementatio
Analysis and Implementation of Room Assignment Problem and Cannon\u27s Algorithm on General Purpose Programmable Graphical Processing Units with CUDA
General-purpose Graphics Processing Units (GP-GPU) has emerged as a popular computing paradigm for high-performance computing over the last few years. The increased interest in GP-GPUs for parallel computing mirrors the trend in general computing with the rise of multi-core processors as an alternative approach to increase processor performance. Many applications that were previously accelerated on distributed processing platforms with MPI or multithreaded techniques such as OpenMP are now being investigated to assess their performance on GP-GPU platforms. Since the GP-GPU platform is designed to give higher performance for parallel problems, applications on other parallel architectures are good candidates for performance studies on GP-GPUs. The first case study in this research is a GP-GPU implementation of a Simulated Annealing-based solution of the Room Assignment problem using CUDA. The Room Assignment problem attempts to arrange N people in N/2 rooms, taking into consideration each person\u27s preference for a roommate. To evaluate the implementation, it was compared against the serial implementation for problem sizes 5000, 10000, 15000 and 20000 people. The GP-GPU implementation achieved as much as 78% higher improvement ratio than the serial version in comparable execution time. The second case study is a GP-GPU implementation of Cannon\u27s Algorithm using CUDA. The GP-GPU implementation is compared with a serial implementation of a conventional matrix multiplication O(n3). The GP-GPU implementation achieved upto 6.2x speedup over the conventional serial multiplication. The results for both applications with varying problem sizes are presented and discussed
- …