564 research outputs found
Space-Efficient Parallel Algorithms for Combinatorial Search Problems
We present space-efficient parallel strategies for two fundamental
combinatorial search problems, namely, backtrack search and branch-and-bound,
both involving the visit of an -node tree of height under the assumption
that a node can be accessed only through its father or its children. For both
problems we propose efficient algorithms that run on a -processor
distributed-memory machine. For backtrack search, we give a deterministic
algorithm running in time, and a Las Vegas algorithm requiring
optimal time, with high probability. Building on the backtrack
search algorithm, we also derive a Las Vegas algorithm for branch-and-bound
which runs in time, with high probability. A
remarkable feature of our algorithms is the use of only constant space per
processor, which constitutes a significant improvement upon previous algorithms
whose space requirements per processor depend on the (possibly huge) tree to be
explored.Comment: Extended version of the paper in the Proc. of 38th International
Symposium on Mathematical Foundations of Computer Science (MFCS
Partially ordered distributed computations on asynchronous point-to-point networks
Asynchronous executions of a distributed algorithm differ from each other due
to the nondeterminism in the order in which the messages exchanged are handled.
In many situations of interest, the asynchronous executions induced by
restricting nondeterminism are more efficient, in an application-specific
sense, than the others. In this work, we define partially ordered executions of
a distributed algorithm as the executions satisfying some restricted orders of
their actions in two different frameworks, those of the so-called event- and
pulse-driven computations. The aim of these restrictions is to characterize
asynchronous executions that are likely to be more efficient for some important
classes of applications. Also, an asynchronous algorithm that ensures the
occurrence of partially ordered executions is given for each case. Two of the
applications that we believe may benefit from the restricted nondeterminism are
backtrack search, in the event-driven case, and iterative algorithms for
systems of linear equations, in the pulse-driven case
MASSIVELY PARALLEL ALGORITHMS FOR POINT CLOUD BASED OBJECT RECOGNITION ON HETEROGENEOUS ARCHITECTURE
With the advent of new commodity depth sensors, point cloud data processing plays an increasingly important role in object recognition and perception. However, the computational cost of point cloud data processing is extremely high due to the large data size, high dimensionality, and algorithmic complexity. To address the computational challenges of real-time processing, this work investigates the possibilities of using modern heterogeneous computing platforms and its supporting ecosystem such as massively parallel architecture (MPA), computing cluster, compute unified device architecture (CUDA), and multithreaded programming to accelerate the point cloud based object recognition. The aforementioned computing platforms would not yield high performance unless the specific features are properly utilized. Failing that the result actually produces an inferior performance. To achieve the high-speed performance in image descriptor computing, indexing, and matching in point cloud based object recognition, this work explores both coarse and fine grain level parallelism, identifies the acceptable levels of algorithmic approximation, and analyzes various performance impactors. A set of heterogeneous parallel algorithms are designed and implemented in this work. These algorithms include exact and approximate scalable massively parallel image descriptors for descriptor computing, parallel construction of k-dimensional tree (KD-tree) and the forest of KD-trees for descriptor indexing, parallel approximate nearest neighbor search (ANNS) and buffered ANNS (BANNS) on the KD-tree and the forest of KD-trees for descriptor matching. The results show that the proposed massively parallel algorithms on heterogeneous computing platforms can significantly improve the execution time performance of feature computing, indexing, and matching. Meanwhile, this work demonstrates that the heterogeneous computing architectures, with appropriate architecture specific algorithms design and optimization, have the distinct advantages of improving the performance of multimedia applications
Towards better algorithms for parallel backtracking
Many algorithms in operations research and artificial intelligence
are based on depth first search in implicitly defined trees.
For parallelizing these algorithms, a load balancing scheme is
needed which is able to evenly distribute parts of an irregularly
shaped tree over the processors. It should work with minimal
interprocessor communication and without prior knowledge of the
tree\u27s shape.
Previously known load balancing algorithms either require sending a
message for each tree node or they only work efficiently for large
search trees. This paper introduces new randomized dynamic load
balancing algorithms for {\em tree structured computations}, a
generalization of backtrack search.These algorithms only need to
communicate when necessary and have an asymptotically optimal
scalability for many important cases.
They work work on hypercubes, butterflies, meshes and many other
architectures
Analysis of randomized load distribution for reproduction trees in linear arrays and rings
AbstractHigh performance computing requires high quality load distribution of processes of a parallel application over processors in a parallel computer at runtime such that both maximum load and dilation are minimized. The performance of a simple randomized load distribution algorithm that dynamically supports tree-structured parallel computations on two simple static networks, namely, linear arrays and rings, is analyzed in this paper. The algorithm spreads newly created tree nodes to neighboring processors, which actually provides randomized dilation-1 tree embedding in a static network. We develop linear systems of equations that characterize expected loads on all processors, and find their closed form solutions under the reproduction tree model, which can generate trees of arbitrary size and shape. The main contribution of the paper is to show that the above simple randomized algorithm is able to generate high-quality dynamic tree embeddings even in very simple and sparse networks such as linear arrays and rings. In particular, we prove that as tree size becomes large, the asymptotic performance ratio of such a randomized dilation-1 tree embedding is N/(Nâ1) in linear arrays and is optimal in rings
Accelerating backtrack search with a best-first-search strategy
Backtrack-style exhaustive search algorithms for NP-hard problems
tend to have large variance in their runtime. This is because ``fortunate''
branching decisions can lead to finding a solution quickly, whereas
``unfortunate'' decisions in another run can lead the algorithm to
a region of the search space with no solutions. In the literature,
frequent restarting has been suggested as a means to overcome this
problem.
In this paper, we propose a more sophisticated approach: a best-first-search heuristic
to quickly move between parts of the search space, always concentrating
on the most promising region. We describe how this idea can be efficiently
incorporated into a backtrack search algorithm, without
sacrificing optimality. Moreover, we demonstrate
empirically that, for hard solvable problem instances, the new approach
provides significantly higher speed-up than frequent restarting
- âŠ