10,283 research outputs found

    TensorFlow Enabled Genetic Programming

    Full text link
    Genetic Programming, a kind of evolutionary computation and machine learning algorithm, is shown to benefit significantly from the application of vectorized data and the TensorFlow numerical computation library on both CPU and GPU architectures. The open source, Python Karoo GP is employed for a series of 190 tests across 6 platforms, with real-world datasets ranging from 18 to 5.5M data points. This body of tests demonstrates that datasets measured in tens and hundreds of data points see 2-15x improvement when moving from the scalar/SymPy configuration to the vector/TensorFlow configuration, with a single core performing on par or better than multiple CPU cores and GPUs. A dataset composed of 90,000 data points demonstrates a single vector/TensorFlow CPU core performing 875x better than 40 scalar/Sympy CPU cores. And a dataset containing 5.5M data points sees GPU configurations out-performing CPU configurations on average by 1.3x.Comment: 8 pages, 5 figures; presented at GECCO 2017, Berlin, German

    High-speed detection of emergent market clustering via an unsupervised parallel genetic algorithm

    Full text link
    We implement a master-slave parallel genetic algorithm (PGA) with a bespoke log-likelihood fitness function to identify emergent clusters within price evolutions. We use graphics processing units (GPUs) to implement a PGA and visualise the results using disjoint minimal spanning trees (MSTs). We demonstrate that our GPU PGA, implemented on a commercially available general purpose GPU, is able to recover stock clusters in sub-second speed, based on a subset of stocks in the South African market. This represents a pragmatic choice for low-cost, scalable parallel computing and is significantly faster than a prototype serial implementation in an optimised C-based fourth-generation programming language, although the results are not directly comparable due to compiler differences. Combined with fast online intraday correlation matrix estimation from high frequency data for cluster identification, the proposed implementation offers cost-effective, near-real-time risk assessment for financial practitioners.Comment: 10 pages, 5 figures, 4 tables, More thorough discussion of implementatio

    A Survey of Parallel Data Mining

    Get PDF
    With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms

    The Ariadne's Clew Algorithm

    Full text link
    We present a new approach to path planning, called the "Ariadne's clew algorithm". It is designed to find paths in high-dimensional continuous spaces and applies to robots with many degrees of freedom in static, as well as dynamic environments - ones where obstacles may move. The Ariadne's clew algorithm comprises two sub-algorithms, called Search and Explore, applied in an interleaved manner. Explore builds a representation of the accessible space while Search looks for the target. Both are posed as optimization problems. We describe a real implementation of the algorithm to plan paths for a six degrees of freedom arm in a dynamic environment where another six degrees of freedom arm is used as a moving obstacle. Experimental results show that a path is found in about one second without any pre-processing

    Design and evaluation of a genomics variant analysis pipeline using GATK Spark tools

    Full text link
    Scalable and efficient processing of genome sequence data, i.e. for variant discovery, is key to the mainstream adoption of High Throughput technology for disease prevention and for clinical use. Achieving scalability, however, requires a significant effort to enable the parallel execution of the analysis tools that make up the pipelines. This is facilitated by the new Spark versions of the well-known GATK toolkit, which offer a black-box approach by transparently exploiting the underlying Map Reduce architecture. In this paper we report on our experience implementing a standard variant discovery pipeline using GATK 4.0 with Docker-based deployment over a cluster. We provide a preliminary performance analysis, comparing the processing times and cost to those of the new Microsoft Genomics Services
    • …
    corecore