24,197 research outputs found

    Best-first heuristic search for multicore machines

    Get PDF
    To harness modern multicore processors, it is imperative to develop parallel versions of fundamental algorithms. In this paper, we compare different approaches to parallel best-first search in a shared-memory setting. We present a new method, PBNF, that uses abstraction to partition the state space and to detect duplicate states without requiring frequent locking. PBNF allows speculative expansions when necessary to keep threads busy. We identify and fix potential livelock conditions in our approach, proving its correctness using temporal logic. Our approach is general, allowing it to extend easily to suboptimal and anytime heuristic search. In an empirical comparison on STRIPS planning, grid pathfinding, and sliding tile puzzle problems using 8-core machines, we show that A*, weighted A* and Anytime weighted A* implemented using PBNF yield faster search than improved versions of previous parallel search proposals

    Non-blocking Priority Queue based on Skiplists with Relaxed Semantics

    Full text link
    Priority queues are data structures that store information in an orderly fashion. They are of tremendous importance because they are an integral part of many applications, like Dijkstra’s shortest path algorithm, MST algorithms, priority schedulers, and so on. Since priority queues by nature have high contention on the delete_min operation, the design of an efficient priority queue should involve an intelligent choice of the data structure as well as relaxation bounds on the data structure. Lock-free data structures provide higher scalability as well as progress guarantee than a lock-based data structure. That is another factor to be considered in the priority queue design. We present a relaxed non-blocking priority queue based on skiplists. We address all the design issues mentioned above in our priority queue. Use of skiplists allows multiple threads to concurrently access different parts of the skiplist quickly, whereas relaxing the priority queue delete_min operation distributes contention over the skiplist instead of just at the front. Furthermore, a non-blocking implementation guarantees that the system will make progress even when some process fails. Our priority queue is internally composed of several priority queues, one for each thread and one shared priority queue common to all threads. Each thread selects the best value from its local priority queue and the shared priority queue and returns the value. In case a thread is unable to delete an item, it tries to spy items from other threads\u27 local priority queues. We experimentally and theoretically show the correctness of our data structure. We also compare the performance of our data structure with other variations like priority queues based on coarse-grained skiplists for both relaxed and non-relaxed semantics

    Fast Parallel Operations on Search Trees

    Full text link
    Using (a,b)-trees as an example, we show how to perform a parallel split with logarithmic latency and parallel join, bulk updates, intersection, union (or merge), and (symmetric) set difference with logarithmic latency and with information theoretically optimal work. We present both asymptotically optimal solutions and simplified versions that perform well in practice - they are several times faster than previous implementations

    A parallel edge orientation algorithm for quadrilateral meshes

    Get PDF
    One approach to achieving correct finite element assembly is to ensure that the local orientation of facets relative to each cell in the mesh is consistent with the global orientation of that facet. Rognes et al. have shown how to achieve this for any mesh composed of simplex elements, and deal.II contains a serial algorithm to construct a consistent orientation of any quadrilateral mesh of an orientable manifold. The core contribution of this paper is the extension of this algorithm for distributed memory parallel computers, which facilitates its seamless application as part of a parallel simulation system. Furthermore, our analysis establishes a link between the well-known Union-Find algorithm and the construction of a consistent orientation of a quadrilateral mesh. As a result, existing work on the parallelisation of the Union-Find algorithm can be easily adapted to construct further parallel algorithms for mesh orientations.Comment: Second revision: minor change

    The Energy Complexity of Broadcast

    Full text link
    Energy is often the most constrained resource in networks of battery-powered devices, and as devices become smaller, they spend a larger fraction of their energy on communication (transceiver usage) not computation. As an imperfect proxy for true energy usage, we define energy complexity to be the number of time slots a device transmits/listens; idle time and computation are free. In this paper we investigate the energy complexity of fundamental communication primitives such as broadcast in multi-hop radio networks. We consider models with collision detection (CD) and without (No-CD), as well as both randomized and deterministic algorithms. Some take-away messages from this work include: 1. The energy complexity of broadcast in a multi-hop network is intimately connected to the time complexity of leader election in a single-hop (clique) network. Many existing lower bounds on time complexity immediately transfer to energy complexity. For example, in the CD and No-CD models, we need Ω(logn)\Omega(\log n) and Ω(log2n)\Omega(\log^2 n) energy, respectively. 2. The energy lower bounds above can almost be achieved, given sufficient (Ω(n)\Omega(n)) time. In the CD and No-CD models we can solve broadcast using O(lognloglognlogloglogn)O(\frac{\log n\log\log n}{\log\log\log n}) energy and O(log3n)O(\log^3 n) energy, respectively. 3. The complexity measures of Energy and Time are in conflict, and it is an open problem whether both can be minimized simultaneously. We give a tradeoff showing it is possible to be nearly optimal in both measures simultaneously. For any constant ϵ>0\epsilon>0, broadcast can be solved in O(D1+ϵlogO(1/ϵ)n)O(D^{1+\epsilon}\log^{O(1/\epsilon)} n) time with O(logO(1/ϵ)n)O(\log^{O(1/\epsilon)} n) energy, where DD is the diameter of the network
    corecore