31,703 research outputs found
Adaptive Parallel Iterative Deepening Search
Many of the artificial intelligence techniques developed to date rely on
heuristic search through large spaces. Unfortunately, the size of these spaces
and the corresponding computational effort reduce the applicability of
otherwise novel and effective algorithms. A number of parallel and distributed
approaches to search have considerably improved the performance of the search
process. Our goal is to develop an architecture that automatically selects
parallel search strategies for optimal performance on a variety of search
problems. In this paper we describe one such architecture realized in the
Eureka system, which combines the benefits of many different approaches to
parallel heuristic search. Through empirical and theoretical analyses we
observe that features of the problem space directly affect the choice of
optimal parallel search strategy. We then employ machine learning techniques to
select the optimal parallel search strategy for a given problem space. When a
new search task is input to the system, Eureka uses features describing the
search space and the chosen architecture to automatically select the
appropriate search strategy. Eureka has been tested on a MIMD parallel
processor, a distributed network of workstations, and a single workstation
using multithreading. Results generated from fifteen puzzle problems, robot arm
motion problems, artificial search spaces, and planning problems indicate that
Eureka outperforms any of the tested strategies used exclusively for all
problem instances and is able to greatly reduce the search time for these
applications
Lock-in Problem for Parallel Rotor-router Walks
The rotor-router model, also called the Propp machine, was introduced as a
deterministic alternative to the random walk. In this model, a group of
identical tokens are initially placed at nodes of the graph. Each node
maintains a cyclic ordering of the outgoing arcs, and during consecutive turns
the tokens are propagated along arcs chosen according to this ordering in
round-robin fashion. The behavior of the model is fully deterministic. Yanovski
et al.(2003) proved that a single rotor-router walk on any graph with m edges
and diameter stabilizes to a traversal of an Eulerian circuit on the set of
all 2m directed arcs on the edge set of the graph, and that such periodic
behaviour of the system is achieved after an initial transient phase of at most
2mD steps. The case of multiple parallel rotor-routers was studied
experimentally, leading Yanovski et al. to the conjecture that a system of k
\textgreater{} 1 parallel walks also stabilizes with a period of length at
most steps. In this work we disprove this conjecture, showing that the
period of parallel rotor-router walks can in fact, be superpolynomial in the
size of graph. On the positive side, we provide a characterization of the
periodic behavior of parallel router walks, in terms of a structural property
of stable states called a subcycle decomposition. This property provides us the
tools to efficiently detect whether a given system configuration corresponds to
the transient or to the limit behavior of the system. Moreover, we provide
polynomial upper bounds of and on the
number of steps it takes for the system to stabilize. Thus, we are able to
predict any future behavior of the system using an algorithm that takes
polynomial time and space. In addition, we show that there exists a separation
between the stabilization time of the single-walk and multiple-walk
rotor-router systems, and that for some graphs the latter can be asymptotically
larger even for the case of walks
Many-Task Computing and Blue Waters
This report discusses many-task computing (MTC) generically and in the
context of the proposed Blue Waters systems, which is planned to be the largest
NSF-funded supercomputer when it begins production use in 2012. The aim of this
report is to inform the BW project about MTC, including understanding aspects
of MTC applications that can be used to characterize the domain and
understanding the implications of these aspects to middleware and policies.
Many MTC applications do not neatly fit the stereotypes of high-performance
computing (HPC) or high-throughput computing (HTC) applications. Like HTC
applications, by definition MTC applications are structured as graphs of
discrete tasks, with explicit input and output dependencies forming the graph
edges. However, MTC applications have significant features that distinguish
them from typical HTC applications. In particular, different engineering
constraints for hardware and software must be met in order to support these
applications. HTC applications have traditionally run on platforms such as
grids and clusters, through either workflow systems or parallel programming
systems. MTC applications, in contrast, will often demand a short time to
solution, may be communication intensive or data intensive, and may comprise
very short tasks. Therefore, hardware and software for MTC must be engineered
to support the additional communication and I/O and must minimize task dispatch
overheads. The hardware of large-scale HPC systems, with its high degree of
parallelism and support for intensive communication, is well suited for MTC
applications. However, HPC systems often lack a dynamic resource-provisioning
feature, are not ideal for task communication via the file system, and have an
I/O system that is not optimized for MTC-style applications. Hence, additional
software support is likely to be required to gain full benefit from the HPC
hardware
Engineering Parallel String Sorting
We discuss how string sorting algorithms can be parallelized on modern
multi-core shared memory machines. As a synthesis of the best sequential string
sorting algorithms and successful parallel sorting algorithms for atomic
objects, we first propose string sample sort. The algorithm makes effective use
of the memory hierarchy, uses additional word level parallelism, and largely
avoids branch mispredictions. Then we focus on NUMA architectures, and develop
parallel multiway LCP-merge and -mergesort to reduce the number of random
memory accesses to remote nodes. Additionally, we parallelize variants of
multikey quicksort and radix sort that are also useful in certain situations.
Comprehensive experiments on five current multi-core platforms are then
reported and discussed. The experiments show that our implementations scale
very well on real-world inputs and modern machines.Comment: 46 pages, extension of "Parallel String Sample Sort" arXiv:1305.115
- …