3,291 research outputs found

    Evaluation of a Simple, Scalable, Parallel Best-First Search Strategy

    Get PDF
    Large-scale, parallel clusters composed of commodity processors are increasingly available, enabling the use of vast processing capabilities and distributed RAM to solve hard search problems. We investigate Hash-Distributed A* (HDA*), a simple approach to parallel best-first search that asynchronously distributes and schedules work among processors based on a hash function of the search state. We use this approach to parallelize the A* algorithm in an optimal sequential version of the Fast Downward planner, as well as a 24-puzzle solver. The scaling behavior of HDA* is evaluated experimentally on a shared memory, multicore machine with 8 cores, a cluster of commodity machines using up to 64 cores, and large-scale high-performance clusters, using up to 2400 processors. We show that this approach scales well, allowing the effective utilization of large amounts of distributed memory to optimally solve problems which require terabytes of RAM. We also compare HDA* to Transposition-table Driven Scheduling (TDS), a hash-based parallelization of IDA*, and show that, in planning, HDA* significantly outperforms TDS. A simple hybrid which combines HDA* and TDS to exploit strengths of both algorithms is proposed and evaluated.Comment: in press, to appear in Artificial Intelligenc

    Scaling Monte Carlo Tree Search on Intel Xeon Phi

    Full text link
    Many algorithms have been parallelized successfully on the Intel Xeon Phi coprocessor, especially those with regular, balanced, and predictable data access patterns and instruction flows. Irregular and unbalanced algorithms are harder to parallelize efficiently. They are, for instance, present in artificial intelligence search algorithms such as Monte Carlo Tree Search (MCTS). In this paper we study the scaling behavior of MCTS, on a highly optimized real-world application, on real hardware. The Intel Xeon Phi allows shared memory scaling studies up to 61 cores and 244 hardware threads. We compare work-stealing (Cilk Plus and TBB) and work-sharing (FIFO scheduling) approaches. Interestingly, we find that a straightforward thread pool with a work-sharing FIFO queue shows the best performance. A crucial element for this high performance is the controlling of the grain size, an approach that we call Grain Size Controlled Parallel MCTS. Our subsequent comparing with the Xeon CPUs shows an even more comprehensible distinction in performance between different threading libraries. We achieve, to the best of our knowledge, the fastest implementation of a parallel MCTS on the 61 core Intel Xeon Phi using a real application (47 relative to a sequential run).Comment: 8 pages, 9 figure

    Definition of a Method for the Formulation of Problems to be Solved with High Performance Computing

    Get PDF
    Computational power made available by current technology has been continuously increasing, however today’s problems are larger and more complex and demand even more computational power. Interest in computational problems has also been increasing and is an important research area in computer science. These complex problems are solved with computational models that use an underlying mathematical model and are solved using computer resources, simulation, and are run with High Performance Computing. For such computations, parallel computing has been employed to achieve high performance. This thesis identifies families of problems that can best be solved using modelling and implementation techniques of parallel computing such as message passing and shared memory. Few case studies are considered to show when the shared memory model is suitable and when the message passing model would be suitable. The models of parallel computing are implemented and evaluated using some algorithms and simulations. This thesis mainly focuses on showing the more suitable model of computing for the various scenarios in attaining High Performance Computing

    Pengaruh Global Transposition Table dan Algoritma Pvs dan Negascout Pada Puzzle Games

    Get PDF
    Nine Men’s Morris merupakan game puzzle berbentuk board game 2 pemain. Papan terdiri dari kotak dengan dua puluh empat persimpangan atau titik. Permainan Nine Men’s Morris bersifat fully observable yang artinya bahwa seluruh kondisi pada papan permainan dan bidak-bidak dapat dipersepsi dan dinilai dengan baik. Penelitian ini melihat pengaruh penggunaan algoritma-algoritma dalam performa game Nine Men’s Morris. Performa permainan dilihat dari kemenangan, panjang jumlah jalur, lama pencarian. Penggunaan Global Transposition Table (GTT) sebagai penyimpanan memiliki keunggulan – dimana penyimpanannya lebih banyak sehingga memberikan ruang penelusuran lebih besar. Kemampuan ini disebabkan karena sifat GTT yang paralel. Dengan GTT ini diharapkan mampu menemukan solusi lebih cepat. Global Transposition Table sendiri adalah kumpulan dari beberapa tabel transposisi di dalam sebuah tabel transposisi lebih besar. GTT dapat diibaratkan seperti folder yang memiliki banyak subfolder dengan setiap subfolder berisi tipe file yang sama, dan memiliki nama depan file yang sama. Maka dengan menggabungkan Algoritma pencaria dan penggunakan GTT dalam permaian Nine Man’s Morris ini diharapkan dapat mengetahui performa dari penggunaan Algoritma Negascout dan juga pengaruh tambahan dari penggunaan GTT dalam game puzzle Nine Men’s Morris
    • …
    corecore