3,291 research outputs found
Evaluation of a Simple, Scalable, Parallel Best-First Search Strategy
Large-scale, parallel clusters composed of commodity processors are
increasingly available, enabling the use of vast processing capabilities and
distributed RAM to solve hard search problems. We investigate Hash-Distributed
A* (HDA*), a simple approach to parallel best-first search that asynchronously
distributes and schedules work among processors based on a hash function of the
search state. We use this approach to parallelize the A* algorithm in an
optimal sequential version of the Fast Downward planner, as well as a 24-puzzle
solver. The scaling behavior of HDA* is evaluated experimentally on a shared
memory, multicore machine with 8 cores, a cluster of commodity machines using
up to 64 cores, and large-scale high-performance clusters, using up to 2400
processors. We show that this approach scales well, allowing the effective
utilization of large amounts of distributed memory to optimally solve problems
which require terabytes of RAM. We also compare HDA* to Transposition-table
Driven Scheduling (TDS), a hash-based parallelization of IDA*, and show that,
in planning, HDA* significantly outperforms TDS. A simple hybrid which combines
HDA* and TDS to exploit strengths of both algorithms is proposed and evaluated.Comment: in press, to appear in Artificial Intelligenc
Scaling Monte Carlo Tree Search on Intel Xeon Phi
Many algorithms have been parallelized successfully on the Intel Xeon Phi
coprocessor, especially those with regular, balanced, and predictable data
access patterns and instruction flows. Irregular and unbalanced algorithms are
harder to parallelize efficiently. They are, for instance, present in
artificial intelligence search algorithms such as Monte Carlo Tree Search
(MCTS). In this paper we study the scaling behavior of MCTS, on a highly
optimized real-world application, on real hardware. The Intel Xeon Phi allows
shared memory scaling studies up to 61 cores and 244 hardware threads. We
compare work-stealing (Cilk Plus and TBB) and work-sharing (FIFO scheduling)
approaches. Interestingly, we find that a straightforward thread pool with a
work-sharing FIFO queue shows the best performance. A crucial element for this
high performance is the controlling of the grain size, an approach that we call
Grain Size Controlled Parallel MCTS. Our subsequent comparing with the Xeon
CPUs shows an even more comprehensible distinction in performance between
different threading libraries. We achieve, to the best of our knowledge, the
fastest implementation of a parallel MCTS on the 61 core Intel Xeon Phi using a
real application (47 relative to a sequential run).Comment: 8 pages, 9 figure
Definition of a Method for the Formulation of Problems to be Solved with High Performance Computing
Computational power made available by current technology has been continuously increasing, however today’s problems are larger and more complex and demand even more computational power. Interest in computational problems has also been increasing and is an important research area in computer science. These complex problems are solved with computational models that use an underlying mathematical model and are solved using computer resources, simulation, and are run with High Performance Computing. For such computations, parallel computing has been employed to achieve high performance. This thesis identifies families of problems that can best be solved using modelling and implementation techniques of parallel computing such as message passing and shared memory. Few case studies are considered to show when the shared memory model is suitable and when the message passing model would be suitable. The models of parallel computing are implemented and evaluated using some algorithms and simulations. This thesis mainly focuses on showing the more suitable model of computing for the various scenarios in attaining High Performance Computing
Recommended from our members
Scalable grid resource allocation for scientific workflows using hybrid metaheuristics
Grid infrastructure is a valuable tool for scientific users, but it is characterized by a high level of complexity which makes it difficult for them to quantify their requirements and allocate resources. In this paper, we show that resource trading is a viable and scalable approach for scientific users to consume resources. We propose the use of Grid resource bundles to specify supply and demand combined with a hybrid metaheuristic method to determine the allocation of resources in a market-based approach. We evaluate this through the application domain of scientific workflow execution on the Grid
Pengaruh Global Transposition Table dan Algoritma Pvs dan Negascout Pada Puzzle Games
Nine Men’s Morris merupakan game puzzle berbentuk board game 2 pemain. Papan terdiri dari kotak dengan dua puluh empat persimpangan atau titik. Permainan Nine Men’s Morris bersifat fully observable yang artinya bahwa seluruh kondisi pada papan permainan dan bidak-bidak dapat dipersepsi dan dinilai dengan baik. Penelitian ini melihat pengaruh penggunaan algoritma-algoritma dalam performa game Nine Men’s Morris. Performa permainan dilihat dari kemenangan, panjang jumlah jalur, lama pencarian. Penggunaan Global Transposition Table (GTT) sebagai penyimpanan memiliki keunggulan – dimana penyimpanannya lebih banyak sehingga memberikan ruang penelusuran lebih besar. Kemampuan ini disebabkan karena sifat GTT yang paralel. Dengan GTT ini diharapkan mampu menemukan solusi lebih cepat. Global Transposition Table sendiri adalah kumpulan dari beberapa tabel transposisi di dalam sebuah tabel transposisi lebih besar. GTT dapat diibaratkan seperti folder yang memiliki banyak subfolder dengan setiap subfolder berisi tipe file yang sama, dan memiliki nama depan file yang sama. Maka dengan menggabungkan Algoritma pencaria dan penggunakan GTT dalam permaian Nine Man’s Morris ini diharapkan dapat mengetahui performa dari penggunaan Algoritma Negascout dan juga pengaruh tambahan dari penggunaan GTT dalam game puzzle Nine Men’s Morris
The distributed ASCI supercomputer project
The Distributed ASCI Supercomputer (DAS) is a homogeneous wide-area distributed system consisting of four cluster computers at different locations. DAS has been used for research on communication software, parallel languages and programming systems, schedulers, parallel applications, and distributed applications. The paper gives a preview of the most interesting research results obtained so far in the DAS project
- …