564 research outputs found

    Space-Efficient Parallel Algorithms for Combinatorial Search Problems

    Get PDF
    We present space-efficient parallel strategies for two fundamental combinatorial search problems, namely, backtrack search and branch-and-bound, both involving the visit of an nn-node tree of height hh under the assumption that a node can be accessed only through its father or its children. For both problems we propose efficient algorithms that run on a pp-processor distributed-memory machine. For backtrack search, we give a deterministic algorithm running in O(n/p+hlog⁥p)O(n/p+h\log p) time, and a Las Vegas algorithm requiring optimal O(n/p+h)O(n/p+h) time, with high probability. Building on the backtrack search algorithm, we also derive a Las Vegas algorithm for branch-and-bound which runs in O((n/p+hlog⁥plog⁥n)hlog⁥2n)O((n/p+h\log p \log n)h\log^2 n) time, with high probability. A remarkable feature of our algorithms is the use of only constant space per processor, which constitutes a significant improvement upon previous algorithms whose space requirements per processor depend on the (possibly huge) tree to be explored.Comment: Extended version of the paper in the Proc. of 38th International Symposium on Mathematical Foundations of Computer Science (MFCS

    Partially ordered distributed computations on asynchronous point-to-point networks

    Full text link
    Asynchronous executions of a distributed algorithm differ from each other due to the nondeterminism in the order in which the messages exchanged are handled. In many situations of interest, the asynchronous executions induced by restricting nondeterminism are more efficient, in an application-specific sense, than the others. In this work, we define partially ordered executions of a distributed algorithm as the executions satisfying some restricted orders of their actions in two different frameworks, those of the so-called event- and pulse-driven computations. The aim of these restrictions is to characterize asynchronous executions that are likely to be more efficient for some important classes of applications. Also, an asynchronous algorithm that ensures the occurrence of partially ordered executions is given for each case. Two of the applications that we believe may benefit from the restricted nondeterminism are backtrack search, in the event-driven case, and iterative algorithms for systems of linear equations, in the pulse-driven case

    MASSIVELY PARALLEL ALGORITHMS FOR POINT CLOUD BASED OBJECT RECOGNITION ON HETEROGENEOUS ARCHITECTURE

    Get PDF
    With the advent of new commodity depth sensors, point cloud data processing plays an increasingly important role in object recognition and perception. However, the computational cost of point cloud data processing is extremely high due to the large data size, high dimensionality, and algorithmic complexity. To address the computational challenges of real-time processing, this work investigates the possibilities of using modern heterogeneous computing platforms and its supporting ecosystem such as massively parallel architecture (MPA), computing cluster, compute unified device architecture (CUDA), and multithreaded programming to accelerate the point cloud based object recognition. The aforementioned computing platforms would not yield high performance unless the specific features are properly utilized. Failing that the result actually produces an inferior performance. To achieve the high-speed performance in image descriptor computing, indexing, and matching in point cloud based object recognition, this work explores both coarse and fine grain level parallelism, identifies the acceptable levels of algorithmic approximation, and analyzes various performance impactors. A set of heterogeneous parallel algorithms are designed and implemented in this work. These algorithms include exact and approximate scalable massively parallel image descriptors for descriptor computing, parallel construction of k-dimensional tree (KD-tree) and the forest of KD-trees for descriptor indexing, parallel approximate nearest neighbor search (ANNS) and buffered ANNS (BANNS) on the KD-tree and the forest of KD-trees for descriptor matching. The results show that the proposed massively parallel algorithms on heterogeneous computing platforms can significantly improve the execution time performance of feature computing, indexing, and matching. Meanwhile, this work demonstrates that the heterogeneous computing architectures, with appropriate architecture specific algorithms design and optimization, have the distinct advantages of improving the performance of multimedia applications

    Towards better algorithms for parallel backtracking

    Get PDF
    Many algorithms in operations research and artificial intelligence are based on depth first search in implicitly defined trees. For parallelizing these algorithms, a load balancing scheme is needed which is able to evenly distribute parts of an irregularly shaped tree over the processors. It should work with minimal interprocessor communication and without prior knowledge of the tree\u27s shape. Previously known load balancing algorithms either require sending a message for each tree node or they only work efficiently for large search trees. This paper introduces new randomized dynamic load balancing algorithms for {\em tree structured computations}, a generalization of backtrack search.These algorithms only need to communicate when necessary and have an asymptotically optimal scalability for many important cases. They work work on hypercubes, butterflies, meshes and many other architectures

    Analysis of randomized load distribution for reproduction trees in linear arrays and rings

    Get PDF
    AbstractHigh performance computing requires high quality load distribution of processes of a parallel application over processors in a parallel computer at runtime such that both maximum load and dilation are minimized. The performance of a simple randomized load distribution algorithm that dynamically supports tree-structured parallel computations on two simple static networks, namely, linear arrays and rings, is analyzed in this paper. The algorithm spreads newly created tree nodes to neighboring processors, which actually provides randomized dilation-1 tree embedding in a static network. We develop linear systems of equations that characterize expected loads on all processors, and find their closed form solutions under the reproduction tree model, which can generate trees of arbitrary size and shape. The main contribution of the paper is to show that the above simple randomized algorithm is able to generate high-quality dynamic tree embeddings even in very simple and sparse networks such as linear arrays and rings. In particular, we prove that as tree size becomes large, the asymptotic performance ratio of such a randomized dilation-1 tree embedding is N/(N−1) in linear arrays and is optimal in rings

    Accelerating backtrack search with a best-first-search strategy

    Get PDF
    Backtrack-style exhaustive search algorithms for NP-hard problems tend to have large variance in their runtime. This is because ``fortunate'' branching decisions can lead to finding a solution quickly, whereas ``unfortunate'' decisions in another run can lead the algorithm to a region of the search space with no solutions. In the literature, frequent restarting has been suggested as a means to overcome this problem. In this paper, we propose a more sophisticated approach: a best-first-search heuristic to quickly move between parts of the search space, always concentrating on the most promising region. We describe how this idea can be efficiently incorporated into a backtrack search algorithm, without sacrificing optimality. Moreover, we demonstrate empirically that, for hard solvable problem instances, the new approach provides significantly higher speed-up than frequent restarting
    • 

    corecore