3,092 research outputs found

    A taxonomy of parallel sorting

    Get PDF
    TR 84-601In this paper, we propose a taxonomy of parallel sorting that includes a broad range of array and file sorting algorithms. We analyze the evolution of research on parallel sorting, from the earliest sorting networks to the shared memory algorithms and the VLSI sorters. In the context of sorting networks, we describe two fundamental parallel merging schemes - the odd-even and the bitonic merge. Sorting algorithms have been derived from these merging algorithms for parallel computers where processors communicate through interconnection networks such as the perfect shuffle, the mesh and a number of other sparse networks. After describing the network sorting algorithms, we show that, with a shared memory model of parallel computation, faster algorithms have been derived from parallel enumeration sorting schemes, where keys are first ranked and then rearranged according to their rank

    OutFlank Routing: Increasing Throughput in Toroidal Interconnection Networks

    Full text link
    We present a new, deadlock-free, routing scheme for toroidal interconnection networks, called OutFlank Routing (OFR). OFR is an adaptive strategy which exploits non-minimal links, both in the source and in the destination nodes. When minimal links are congested, OFR deroutes packets to carefully chosen intermediate destinations, in order to obtain travel paths which are only an additive constant longer than the shortest ones. Since routing performance is very sensitive to changes in the traffic model or in the router parameters, an accurate discrete-event simulator of the toroidal network has been developed to empirically validate OFR, by comparing it against other relevant routing strategies, over a range of typical real-world traffic patterns. On the 16x16x16 (4096 nodes) simulated network OFR exhibits improvements of the maximum sustained throughput between 14% and 114%, with respect to Adaptive Bubble Routing.Comment: 9 pages, 5 figures, to be presented at ICPADS 201

    VLSI Design

    Get PDF
    This book provides some recent advances in design nanometer VLSI chips. The selected topics try to present some open problems and challenges with important topics ranging from design tools, new post-silicon devices, GPU-based parallel computing, emerging 3D integration, and antenna design. The book consists of two parts, with chapters such as: VLSI design for multi-sensor smart systems on a chip, Three-dimensional integrated circuits design for thousand-core processors, Parallel symbolic analysis of large analog circuits on GPU platforms, Algorithms for CAD tools VLSI design, A multilevel memetic algorithm for large SAT-encoded problems, etc

    New Techniques in Scene Understanding and Parallel Image Processing.

    Get PDF
    There has been tremendous research interest in the areas of computer and robotic vision. Scene understanding and parallel image processing are important paradigms in computer vision. New techniques are presented to solve some of the problems in these paradigms. Automatic interpretation of features in a natural scene is the focus of the first part of the dissertation. The proposed interpretation technique consists of a context dependent feature labeling algorithm using non linear probabilistic relaxation, and an expert system. Traditionally, the output of the labeling is analyzed, and then recognized by a high level interpreter. In this new approach, the knowledge about the scene is utilized to resolve the inconsistencies introduced by the labeling algorithm. A feature labeling system based on this hybrid technique is designed and developed. The labeling system plays a vital role in the development of an automatic image interpretation system for oceanographic satellite images. An extensive study on the existing interpretation techniques has been made in the related areas such as remote sensing, medical diagnosis, astronomy, and oceanography and has shown that our hybrid approach is unique and powerful. The second part of the dissertation presents the results in the area of parallel image processing. A new approach for parallelizing vision tasks in the low and intermediate levels is introduced. The technique utilizes schemes to embed the inherent data or computational structure, used to solve the problem, into parallel architectures such as hypercubes. The important characteristic of the technique is that the adjacent pixels in the image are mapped to nodes that are at a constant distance in the hypercube. Using the technique, parallel algorithms for neighbor-finding and digital distances are developed. A parallel hypercube sorting algorithm is obtained as an illustration of the technique. The research in developing these embedding algorithms has paved the way for efficient reconfiguration algorithms for hypercube architectures

    Analysis of wormhole routings in cayley graphs of permutation groups.

    Get PDF
    Over a decade, a new class of switching technology, called wormhole routing, has been investigated in the multicomputer interconnection network field. Several classes of wormhole routing algorithms have been proposed. Most of the algorithms have been centered on the traditional binary hypercube, k-ary n-cube mesh, and torus networks. In the design of a wormhole routing algorithm, deadlock avoidance scheme is the main concern. Recently, new classes of networks called Cayley graphs of permutation groups are considered very promising alternatives. Although proposed Cayley networks have superior topological properties over the traditional network topologies, the design of the deadlock-free wormhole routing algorithm in these networks is not simple. In this dissertation, we investigate deadlock free wormhole routing algorithms in the several classes of Cayley networks, such as complete-transposition and star networks. We evaluate several classes of routing algorithms on these networks, and compare the performance of each algorithm to the simulation study. Also, the performances of these networks are compared to the traditional networks. Through extensive simulation we found that adaptive algorithm outperformed deterministic algorithm in general with more virtual channels. On the network performance comparison, the complete transposition network showed the best performance among the similar sized networks, and the binary hypercube performed better compared to the star graph

    Optimizing the DFCN Broadcast Protocol with a Parallel Cooperative Strategy of Multi-Objective Evolutionary Algorithms

    Get PDF
    Proceeding of: 5th International Conference, EMO 2009, Nantes, France, April 7-10, 2009This work presents the application of a parallel coopera- tive optimization approach to the broadcast operation in mobile ad-hoc networks (manets). The optimization of the broadcast operation im- plies satisfying several objectives simultaneously, so a multi-objective approach has been designed. The optimization lies on searching the best configurations of the dfcn broadcast protocol for a given manet sce- nario. The cooperation of a team of multi-objective evolutionary al- gorithms has been performed with a novel optimization model. Such model is a hybrid parallel algorithm that combines a parallel island- based scheme with a hyperheuristic approach. Results achieved by the algorithms in different stages of the search process are analyzed in order to grant more computational resources to the most suitable algorithms. The obtained results for a manets scenario, representing a mall, demon- strate the validity of the new proposed approach.This work has been supported by the ec (feder) and the Spanish Ministry of Education and Science inside the ‘Plan Nacional de i+d+i’ (tin2005-08818-c04) and (tin2008-06491-c04-02). The work of Gara Miranda has been developed under grant fpu-ap2004-2290.Publicad

    Design and Analysis of Optical Interconnection Networks for Parallel Computation.

    Get PDF
    In this doctoral research, we propose several novel protocols and topologies for the interconnection of massively parallel processors. These new technologies achieve considerable improvements in system performance and structure simplicity. Currently, synchronous protocols are used in optical TDM buses. The major disadvantage of a synchronous protocol is the waste of packet slots. To offset this inherent drawback of synchronous TDM, a pipelined asynchronous TDM optical bus is proposed. The simulation results show that the performance of the proposed bus is significantly better than that of known pipelined synchronous TDM optical buses. Practically, the computation power of the plain TDM protocol is limited. Various extensions must be added to the system. In this research, a new pipelined optical TDM bus for implementing a linear array parallel computer architecture is proposed. The switches on the receiving segment of the bus can be dynamically controlled, which make the system highly reconfigurable. To build large and scalable systems, we need new network architectures that are suitable for optical interconnections. A new kind of reconfigurable bus called segmented bus is introduced to achieve reduced structure simplicity and increased concurrency. We show that parallel architectures based on segmented buses are versatile by showing that it can simulate parallel communication patterns supported by a wide variety of networks with small slowdown factors. New kinds of interconnection networks, the hypernetworks, have been proposed recently. Compared with point-to-point networks, they allow for increased resource-sharing and communication bandwidth utilization, and they are especially suitable for optical interconnects. One way to derive a hypernetwork is by finding the dual of a point-to-point network. Hypercube Q\sb{n}, where n is the dimension, is a very popular point-to-point network. It is interesting to construct hypernetworks from the dual Q\sbsp{n}{*} of hypercube of Q\sb{n}. In this research, the properties of Q\sbsp{n}{*} are investigated and a set of fundamental data communication algorithms for Q\sbsp{n}{*} are presented. The results indicate that the Q\sbsp{n}{*} hypernetwork is a useful and promising interconnection structure for high-performance parallel and distributed computing systems
    • …
    corecore