847 research outputs found
A hybrid CPU-GPU parallelization scheme of variable neighborhood search for inventory optimization problems
In this paper, we study various parallelization schemes for the Variable
Neighborhood Search (VNS) metaheuristic on a CPU-GPU system via OpenMP and
OpenACC. A hybrid parallel VNS method is applied to recent benchmark problem
instances for the multi-product dynamic lot sizing problem with product returns
and recovery, which appears in reverse logistics and is known to be NP-hard. We
report our findings regarding these parallelization approaches and present
promising computational results.Comment: 8 pages, 1 figur
HyP-DESPOT: A Hybrid Parallel Algorithm for Online Planning under Uncertainty
Planning under uncertainty is critical for robust robot performance in
uncertain, dynamic environments, but it incurs high computational cost.
State-of-the-art online search algorithms, such as DESPOT, have vastly improved
the computational efficiency of planning under uncertainty and made it a
valuable tool for robotics in practice. This work takes one step further by
leveraging both CPU and GPU parallelization in order to achieve near real-time
online planning performance for complex tasks with large state, action, and
observation spaces. Specifically, we propose Hybrid Parallel DESPOT
(HyP-DESPOT), a massively parallel online planning algorithm that integrates
CPU and GPU parallelism in a multi-level scheme. It performs parallel DESPOT
tree search by simultaneously traversing multiple independent paths using
multi-core CPUs and performs parallel Monte-Carlo simulations at the leaf nodes
of the search tree using GPUs. Experimental results show that HyP-DESPOT speeds
up online planning by up to several hundred times, compared with the original
DESPOT algorithm, in several challenging robotic tasks in simulation
GraphVite: A High-Performance CPU-GPU Hybrid System for Node Embedding
Learning continuous representations of nodes is attracting growing interest
in both academia and industry recently, due to their simplicity and
effectiveness in a variety of applications. Most of existing node embedding
algorithms and systems are capable of processing networks with hundreds of
thousands or a few millions of nodes. However, how to scale them to networks
that have tens of millions or even hundreds of millions of nodes remains a
challenging problem. In this paper, we propose GraphVite, a high-performance
CPU-GPU hybrid system for training node embeddings, by co-optimizing the
algorithm and the system. On the CPU end, augmented edge samples are parallelly
generated by random walks in an online fashion on the network, and serve as the
training data. On the GPU end, a novel parallel negative sampling is proposed
to leverage multiple GPUs to train node embeddings simultaneously, without much
data transfer and synchronization. Moreover, an efficient collaboration
strategy is proposed to further reduce the synchronization cost between CPUs
and GPUs. Experiments on multiple real-world networks show that GraphVite is
super efficient. It takes only about one minute for a network with 1 million
nodes and 5 million edges on a single machine with 4 GPUs, and takes around 20
hours for a network with 66 million nodes and 1.8 billion edges. Compared to
the current fastest system, GraphVite is about 50 times faster without any
sacrifice on performance.Comment: accepted at WWW 201
Mapping parallel programs to heterogeneous CPU/GPU architectures using a Monte Carlo Tree Search
The single core processor, which has dominated for over 30 years, is now obsolete with recent trends increasing towards parallel systems, demanding a huge shift in programming techniques and practices. Moreover, we are rapidly moving towards an age where almost all programming will be targeting parallel systems. Parallel hardware is rapidly evolving, with large heterogeneous systems, typically comprising a mixture of CPUs and GPUs, becoming the mainstream. Additionally, with this increasing heterogeneity comes increasing complexity: not only does the programmer have to worry about where and how to express the parallelism, they must also express an efficient mapping of resources to the available system. This generally requires in-depth expert knowledge that most application programmers do not have. In this paper we describe a new technique that derives, automatically, optimal mappings for an application onto a heterogeneous architecture, using a Monte Carlo Tree Search algorithm. Our technique exploits high-level design patterns, targeting a set of well-specified parallel skeletons. We demonstrate that our MCTS on a convolution example obtained speedups that are within 5% of the speedups achieved by a hand-tuned version of the same application.Postprin
09491 Abstracts Collection -- Graph Search Engineering
From the 29th November to the 4th December 2009, the Dagstuhl Seminar
09491 ``Graph Search Engineering \u27\u27 was held
in Schloss Dagstuhl~--~Leibniz Center for Informatics.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available
- …