203,169 research outputs found

    Dynamically allocating sets of fine-grained processors to running computations

    Get PDF
    Researchers explore an approach to using general purpose parallel computers which involves mapping hardware resources onto computations instead of mapping computations onto hardware. Problems such as processor allocation, task scheduling and load balancing, which have traditionally proven to be challenging, change significantly under this approach and may become amenable to new attacks. Researchers describe the implementation of this approach used by the FFP Machine whose computation and communication resources are repeatedly partitioned into disjoint groups that match the needs of available tasks from moment to moment. Several consequences of this system are examined

    Inference of termination conditions for numerical loops

    Full text link
    We present a new approach to termination analysis of numerical computations in logic programs. Traditional approaches fail to analyse them due to non well-foundedness of the integers. We present a technique that allows to overcome these difficulties. Our approach is based on transforming a program in way that allows integrating and extending techniques originally developed for analysis of numerical computations in the framework of query-mapping pairs with the well-known framework of acceptability. Such an integration not only contributes to the understanding of termination behaviour of numerical computations, but also allows to perform a correct analysis of such computations automatically, thus, extending previous work on a constraints-based approach to termination. In the last section of the paper we discuss possible extensions of the technique, including incorporating general term orderings.Comment: Presented at WST200

    High throughput spatial convolution filters on FPGAs

    Get PDF
    Digital signal processing (DSP) on field- programmable gate arrays (FPGAs) has long been appealing because of the inherent parallelism in these computations that can be easily exploited to accelerate such algorithms. FPGAs have evolved significantly to further enhance the mapping of these algorithms, included additional hard blocks, such as the DSP blocks found in modern FPGAs. Although these DSP blocks can offer more efficient mapping of DSP computations, they are primarily designed for 1-D filter structures. We present a study on spatial convolutional filter implementations on FPGAs, optimizing around the structure of the DSP blocks to offer high throughput while maintaining the coefficient flexibility that other published architectures usually sacrifice. We show that it is possible to implement large filters for large 4K resolution image frames at frame rates of 30–60 FPS, while maintaining functional flexibility

    Inference of termination conditions for numerical loops in Prolog

    Full text link
    We present a new approach to termination analysis of numerical computations in logic programs. Traditional approaches fail to analyse them due to non well-foundedness of the integers. We present a technique that allows overcoming these difficulties. Our approach is based on transforming a program in a way that allows integrating and extending techniques originally developed for analysis of numerical computations in the framework of query-mapping pairs with the well-known framework of acceptability. Such an integration not only contributes to the understanding of termination behaviour of numerical computations, but also allows us to perform a correct analysis of such computations automatically, by extending previous work on a constraint-based approach to termination. Finally, we discuss possible extensions of the technique, including incorporating general term orderings.Comment: To appear in Theory and Practice of Logic Programming. To appear in Theory and Practice of Logic Programmin

    Efficient Process-to-Node Mapping Algorithms for Stencil Computations

    Full text link
    Good process-to-compute-node mappings can be decisive for well performing HPC applications. A special, important class of process-to-node mapping problems is the problem of mapping processes that communicate in a sparse stencil pattern to Cartesian grids. By thoroughly exploiting the inherently present structure in this type of problem, we devise three novel distributed algorithms that are able to handle arbitrary stencil communication patterns effectively. We analyze the expected performance of our algorithms based on an abstract model of inter- and intra-node communication. An extensive experimental evaluation on several HPC machines shows that our algorithms are up to two orders of magnitude faster in running time than a (sequential) high-quality general graph mapping tool, while obtaining similar results in communication performance. Furthermore, our algorithms also achieve significantly better mapping quality compared to previous state-of-the-art Cartesian grid mapping algorithms. This results in up to a threefold performance improvement of an MPI_Neighbor_alltoall exchange operation. Our new algorithms can be used to implement the MPI_Cart_create functionality.Comment: 18 pages, 9 Figure

    Characterization of robotics parallel algorithms and mapping onto a reconfigurable SIMD machine

    Get PDF
    The kinematics, dynamics, Jacobian, and their corresponding inverse computations are six essential problems in the control of robot manipulators. Efficient parallel algorithms for these computations are discussed and analyzed. Their characteristics are identified and a scheme on the mapping of these algorithms to a reconfigurable parallel architecture is presented. Based on the characteristics including type of parallelism, degree of parallelism, uniformity of the operations, fundamental operations, data dependencies, and communication requirement, it is shown that most of the algorithms for robotic computations possess highly regular properties and some common structures, especially the linear recursive structure. Moreover, they are well-suited to be implemented on a single-instruction-stream multiple-data-stream (SIMD) computer with reconfigurable interconnection network. The model of a reconfigurable dual network SIMD machine with internal direct feedback is introduced. A systematic procedure internal direct feedback is introduced. A systematic procedure to map these computations to the proposed machine is presented. A new scheduling problem for SIMD machines is investigated and a heuristic algorithm, called neighborhood scheduling, that reorders the processing sequence of subtasks to reduce the communication time is described. Mapping results of a benchmark algorithm are illustrated and discussed

    Basins of attraction in nonsmooth models of gear rattle

    Get PDF
    This paper is concerned with the computation of the basins of attraction of a simple one degree-of-freedom backlash oscillator using cell-to-cell mapping techniques. This analysis is motivated by the modeling of order vibration in geared systems. We consider both a piecewise-linear stiffness model and a simpler infinite stiffness impacting limit. The basins reveal rich and delicate dynamics, and we analyze some of the transitions in the system's behavior in terms of smooth and discontinuity-induced bifurcations. The stretching and folding of phase space are illustrated via computations of the grazing curve, and its preimages, and manifold computations of basin boundaries using DsTool (Dynamical Systems Toolkit)
    • …
    corecore