12,687 research outputs found

    Parallelizing RRT on distributed-memory architectures

    Get PDF
    This paper addresses the problem of improving the performance of the Rapidly-exploring Random Tree (RRT) algorithm by parallelizing it. For scalability reasons we do so on a distributed-memory architecture, using the message-passing paradigm. We present three parallel versions of RRT along with the technicalities involved in their implementation. We also evaluate the algorithms and study how they behave on different motion planning problems

    Parallelizing RRT on large-scale distributed-memory architectures

    Get PDF
    This paper addresses the problem of parallelizing the Rapidly-exploring Random Tree (RRT) algorithm on large-scale distributed-memory architectures, using the Message Passing Interface. We compare three parallel versions of RRT based on classical parallelization schemes. We evaluate them on different motion planning problems and analyze the various factors influencing their performance

    A Time and Space Efficient Junction Tree Architecture

    Full text link
    The junction tree algorithm is a way of computing marginals of boolean multivariate probability distributions that factorise over sets of random variables. The junction tree algorithm first constructs a tree called a junction tree who's vertices are sets of random variables. The algorithm then performs a generalised version of belief propagation on the junction tree. The Shafer-Shenoy and Hugin architectures are two ways to perform this belief propagation that tradeoff time and space complexities in different ways: Hugin propagation is at least as fast as Shafer-Shenoy propagation and in the cases that we have large vertices of high degree is significantly faster. However, this speed increase comes at the cost of an increased space complexity. This paper first introduces a simple novel architecture, ARCH-1, which has the best of both worlds: the speed of Hugin propagation and the low space requirements of Shafer-Shenoy propagation. A more complicated novel architecture, ARCH-2, is then introduced which has, up to a factor only linear in the maximum cardinality of any vertex, time and space complexities at least as good as ARCH-1 and in the cases that we have large vertices of high degree is significantly faster than ARCH-1

    On the design and implementation of broadcast and global combine operations using the postal model

    Get PDF
    There are a number of models that were proposed in recent years for message passing parallel systems. Examples are the postal model and its generalization the LogP model. In the postal model a parameter λ is used to model the communication latency of the message-passing system. Each node during each round can send a fixed-size message and, simultaneously, receive a message of the same size. Furthermore, a message sent out during round r will incur a latency of hand will arrive at the receiving node at round r + λ - 1. Our goal in this paper is to bridge the gap between the theoretical modeling and the practical implementation. In particular, we investigate a number of practical issues related to the design and implementation of two collective communication operations, namely, the broadcast operation and the global combine operation. Those practical issues include, for example, 1) techniques for measurement of the value of λ on a given machine, 2) creating efficient broadcast algorithms that get the latency hand the number of nodes n as parameters and 3) creating efficient global combine algorithms for parallel machines with λ which is not an integer. We propose solutions that address those practical issues and present results of an experimental study of the new algorithms on the Intel Delta machine. Our main conclusion is that the postal model can help in performance prediction and tuning, for example, a properly tuned broadcast improves the known implementation by more than 20%

    A system for routing arbitrary directed graphs on SIMD architectures

    Get PDF
    There are many problems which can be described in terms of directed graphs that contain a large number of vertices where simple computations occur using data from connecting vertices. A method is given for parallelizing such problems on an SIMD machine model that is bit-serial and uses only nearest neighbor connections for communication. Each vertex of the graph will be assigned to a processor in the machine. Algorithms are given that will be used to implement movement of data along the arcs of the graph. This architecture and algorithms define a system that is relatively simple to build and can do graph processing. All arcs can be transversed in parallel in time O(T), where T is empirically proportional to the diameter of the interconnection network times the average degree of the graph. Modifying or adding a new arc takes the same time as parallel traversal

    CCL: a portable and tunable collective communication library for scalable parallel computers

    Get PDF
    A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model

    Investigation of the applicability of a functional programming model to fault-tolerant parallel processing for knowledge-based systems

    Get PDF
    In a fault-tolerant parallel computer, a functional programming model can facilitate distributed checkpointing, error recovery, load balancing, and graceful degradation. Such a model has been implemented on the Draper Fault-Tolerant Parallel Processor (FTPP). When used in conjunction with the FTPP's fault detection and masking capabilities, this implementation results in a graceful degradation of system performance after faults. Three graceful degradation algorithms have been implemented and are presented. A user interface has been implemented which requires minimal cognitive overhead by the application programmer, masking such complexities as the system's redundancy, distributed nature, variable complement of processing resources, load balancing, fault occurrence and recovery. This user interface is described and its use demonstrated. The applicability of the functional programming style to the Activation Framework, a paradigm for intelligent systems, is then briefly described

    An attentive neural architecture for joint segmentation and parsing and its application to real estate ads

    Get PDF
    In processing human produced text using natural language processing (NLP) techniques, two fundamental subtasks that arise are (i) segmentation of the plain text into meaningful subunits (e.g., entities), and (ii) dependency parsing, to establish relations between subunits. In this paper, we develop a relatively simple and effective neural joint model that performs both segmentation and dependency parsing together, instead of one after the other as in most state-of-the-art works. We will focus in particular on the real estate ad setting, aiming to convert an ad to a structured description, which we name property tree, comprising the tasks of (1) identifying important entities of a property (e.g., rooms) from classifieds and (2) structuring them into a tree format. In this work, we propose a new joint model that is able to tackle the two tasks simultaneously and construct the property tree by (i) avoiding the error propagation that would arise from the subtasks one after the other in a pipelined fashion, and (ii) exploiting the interactions between the subtasks. For this purpose, we perform an extensive comparative study of the pipeline methods and the new proposed joint model, reporting an improvement of over three percentage points in the overall edge F1 score of the property tree. Also, we propose attention methods, to encourage our model to focus on salient tokens during the construction of the property tree. Thus we experimentally demonstrate the usefulness of attentive neural architectures for the proposed joint model, showcasing a further improvement of two percentage points in edge F1 score for our application.Comment: Preprint - Accepted for publication in Expert Systems with Application
    corecore