1,684 research outputs found

    Scaling Monte Carlo Tree Search on Intel Xeon Phi

    Full text link
    Many algorithms have been parallelized successfully on the Intel Xeon Phi coprocessor, especially those with regular, balanced, and predictable data access patterns and instruction flows. Irregular and unbalanced algorithms are harder to parallelize efficiently. They are, for instance, present in artificial intelligence search algorithms such as Monte Carlo Tree Search (MCTS). In this paper we study the scaling behavior of MCTS, on a highly optimized real-world application, on real hardware. The Intel Xeon Phi allows shared memory scaling studies up to 61 cores and 244 hardware threads. We compare work-stealing (Cilk Plus and TBB) and work-sharing (FIFO scheduling) approaches. Interestingly, we find that a straightforward thread pool with a work-sharing FIFO queue shows the best performance. A crucial element for this high performance is the controlling of the grain size, an approach that we call Grain Size Controlled Parallel MCTS. Our subsequent comparing with the Xeon CPUs shows an even more comprehensible distinction in performance between different threading libraries. We achieve, to the best of our knowledge, the fastest implementation of a parallel MCTS on the 61 core Intel Xeon Phi using a real application (47 relative to a sequential run).Comment: 8 pages, 9 figure

    Massively Parallel Computing and the Search for Jets and Black Holes at the LHC

    Full text link
    Massively parallel computing at the LHC could be the next leap necessary to reach an era of new discoveries at the LHC after the Higgs discovery. Scientific computing is a critical component of the LHC experiment, including operation, trigger, LHC computing GRID, simulation, and analysis. One way to improve the physics reach of the LHC is to take advantage of the flexibility of the trigger system by integrating coprocessors based on Graphics Processing Units (GPUs) or the Many Integrated Core (MIC) architecture into its server farm. This cutting edge technology provides not only the means to accelerate existing algorithms, but also the opportunity to develop new algorithms that select events in the trigger that previously would have evaded detection. In this article we describe new algorithms that would allow to select in the trigger new topological signatures that include non-prompt jet and black hole--like objects in the silicon tracker.Comment: 15 pages, 11 figures, submitted to NIM

    Janus II: a new generation application-driven computer for spin-system simulations

    Get PDF
    This paper describes the architecture, the development and the implementation of Janus II, a new generation application-driven number cruncher optimized for Monte Carlo simulations of spin systems (mainly spin glasses). This domain of computational physics is a recognized grand challenge of high-performance computing: the resources necessary to study in detail theoretical models that can make contact with experimental data are by far beyond those available using commodity computer systems. On the other hand, several specific features of the associated algorithms suggest that unconventional computer architectures, which can be implemented with available electronics technologies, may lead to order of magnitude increases in performance, reducing to acceptable values on human scales the time needed to carry out simulation campaigns that would take centuries on commercially available machines. Janus II is one such machine, recently developed and commissioned, that builds upon and improves on the successful JANUS machine, which has been used for physics since 2008 and is still in operation today. This paper describes in detail the motivations behind the project, the computational requirements, the architecture and the implementation of this new machine and compares its expected performances with those of currently available commercial systems.Comment: 28 pages, 6 figure

    Exploring Computational Chemistry on Emerging Architectures

    Get PDF
    Emerging architectures, such as next generation microprocessors, graphics processing units, and Intel MIC cards, are being used with increased popularity in high performance computing. Each of these architectures has advantages over previous generations of architectures including performance, programmability, and power efficiency. With the ever-increasing performance of these architectures, scientific computing applications are able to attack larger, more complicated problems. However, since applications perform differently on each of the architectures, it is difficult to determine the best tool for the job. This dissertation makes the following contributions to computer engineering and computational science. First, this work implements the computational chemistry variational path integral application, QSATS, on various architectures, ranging from microprocessors to GPUs to Intel MICs. Second, this work explores the use of analytical performance modeling to predict the runtime and scalability of the application on the architectures. This allows for a comparison of the architectures when determining which to use for a set of program input parameters. The models presented in this dissertation are accurate within 6%. This work combines novel approaches to this algorithm and exploration of the various architectural features to develop the application to perform at its peak. In addition, this expands the understanding of computational science applications and their implementation on emerging architectures while providing insight into the performance, scalability, and programmer productivity

    QCDGPU: open-source package for Monte Carlo lattice simulations on OpenCL-compatible multi-GPU systems

    Full text link
    The multi-GPU open-source package QCDGPU for lattice Monte Carlo simulations of pure SU(N) gluodynamics in external magnetic field at finite temperature and O(N) model is developed. The code is implemented in OpenCL, tested on AMD and NVIDIA GPUs, AMD and Intel CPUs and may run on other OpenCL-compatible devices. The package contains minimal external library dependencies and is OS platform-independent. It is optimized for heterogeneous computing due to the possibility of dividing the lattice into non-equivalent parts to hide the difference in performances of the devices used. QCDGPU has client-server part for distributed simulations. The package is designed to produce lattice gauge configurations as well as to analyze previously generated ones. QCDGPU may be executed in fault-tolerant mode. Monte Carlo procedure core is based on PRNGCL library for pseudo-random numbers generation on OpenCL-compatible devices, which contains several most popular pseudo-random number generators.Comment: Presented at the Third International Conference "High Performance Computing" (HPC-UA 2013), Kyiv, Ukraine; 9 pages, 2 figure
    • …
    corecore