47 research outputs found

    Performance analysis of pyramid mapping algorithms for the hypercube

    Get PDF
    Comparative performance analysis of algorithms that map pyramids and multilevel structures onto the hypercube are presented. The pyramid structure is appropriate for low-level and intermediate-level computer vision algorithms. It is not only efficient for the support of both local and global operations but also capable of supporting the implementation of multilevel solvers. Nevertheless, pyramids lack the capability of efficient implementation of the majority of scientific algorithms and their cost may become unacceptably high. On a different horizon, hypercube machines have widely been used in the field of parallel computing due to their small diameter, high degree of fault tolerance, and rich interconnection that permits fast communication at a reasonable cost. As a result, hypercube machines can efficiently emulate pyramids. Therefore, the characteristics which make hypercube machines useful scientific processors also make them efficient image processors. Two algorithms which have been developed for the efficient mapping of the pyramid onto the hypercube are discussed in this thesis. The algorithm proposed by Stout [4] requires a hypercube with a number of processing elements (PEs) which is equal to the number of nodes in the base of the pyramid. This algorithm can activate only one level of the pyramid at a time. In contrast, the algorithm proposed by Patel and Ziavras [7] requires the same number of PEs as Stout\u27s algorithm but allows the concurren simulation of multiple levels, as long as the base level is not involved in the set of pyramid levels that need to be simulated at the same time. This low-cost algorithm yields higher performance through high utilization of PEs. However it performs slightly worse than Stout\u27s algorithm when only one level is active at a time. Patel and Ziavras\u27 algorithm performs much better than Stout\u27s algorithm when all levels, excluding the leaf level, are active concurrently. The comparative analysis of these two algorithms is based on the incorporation of simulation results for some image processing algorithms which are perimeter counting, image convolution, and segmentation

    New Techniques in Scene Understanding and Parallel Image Processing.

    Get PDF
    There has been tremendous research interest in the areas of computer and robotic vision. Scene understanding and parallel image processing are important paradigms in computer vision. New techniques are presented to solve some of the problems in these paradigms. Automatic interpretation of features in a natural scene is the focus of the first part of the dissertation. The proposed interpretation technique consists of a context dependent feature labeling algorithm using non linear probabilistic relaxation, and an expert system. Traditionally, the output of the labeling is analyzed, and then recognized by a high level interpreter. In this new approach, the knowledge about the scene is utilized to resolve the inconsistencies introduced by the labeling algorithm. A feature labeling system based on this hybrid technique is designed and developed. The labeling system plays a vital role in the development of an automatic image interpretation system for oceanographic satellite images. An extensive study on the existing interpretation techniques has been made in the related areas such as remote sensing, medical diagnosis, astronomy, and oceanography and has shown that our hybrid approach is unique and powerful. The second part of the dissertation presents the results in the area of parallel image processing. A new approach for parallelizing vision tasks in the low and intermediate levels is introduced. The technique utilizes schemes to embed the inherent data or computational structure, used to solve the problem, into parallel architectures such as hypercubes. The important characteristic of the technique is that the adjacent pixels in the image are mapped to nodes that are at a constant distance in the hypercube. Using the technique, parallel algorithms for neighbor-finding and digital distances are developed. A parallel hypercube sorting algorithm is obtained as an illustration of the technique. The research in developing these embedding algorithms has paved the way for efficient reconfiguration algorithms for hypercube architectures

    Fault-Tolerant Ring Embeddings in Hypercubes -- A Reconfigurable Approach

    Get PDF
    We investigate the problem of designing reconfigurable embedding schemes for a fixed hypercube (without redundant processors and links). The fundamental idea for these schemes is to embed a basic network on the hypercube without fully utilizing the nodes on the hypercube. The remaining nodes can be used as spares to reconfigure the embeddings in case of faults. The result of this research shows that by carefully embedding the application graphs, the topological properties of the embedding can be preserved under fault conditions, and reconfiguration can be carried out efficiently. In this dissertation, we choose the ring as the basic network of interest, and propose several schemes for the design of reconfigurable embeddings with the aim of minimizing reconfiguration cost and performance degradation. The cost is measured by the number of node-state changes or reconfiguration steps needed for processing of the reconfiguration, and the performance degradation is characterized as the dilation of the new embedding after reconfiguration. Compared to the existing schemes, our schemes surpass the existing ones in terms of applicability of schemes and reconfiguration cost needed for the resulting embeddings

    Interconnection Networks Embeddings and Efficient Parallel Computations.

    Get PDF
    To obtain a greater performance, many processors are allowed to cooperate to solve a single problem. These processors communicate via an interconnection network or a bus. The most essential function of the underlying interconnection network is the efficient interchanging of messages between processes in different processors. Parallel machines based on the hypercube topology have gained a great respect in parallel computation because of its many attractive properties. Many versions of the hypercube have been introduced by many researchers mainly to enhance communications. The twisted hypercube is one of the most attractive versions of the hypercube. It preserves the important features of the hypercube and reduces its diameter by a factor of two. This dissertation investigates relations and transformations between various interconnection networks and the twisted hypercube and explore its efficiency in parallel computation. The capability of the twisted hypercube to simulate complete binary trees, complete quad trees, and rings is demonstrated and compared with the hypercube. Finally, the fault-tolerance of the twisted hypercube is investigated. We present optimal algorithms to simulate rings in a faulty twisted hypercube environment and compare that with the hypercube

    Hierarchical gate-array routing on a hypercube multiprocessor

    Full text link
    Gate-arrays are the most common design style for semicustom VLSI integrated circuits. An important part of the gate-array design process is the routing of wires between the logic elements, which is an extremely compute-intensive operation. This paper presents an algorithm for routing gate-arrays that uses a hypercube connected parallel processor to provide the necessary computation power. In order to make optimal use of the hypercube, the routing algorithm is organized so that interprocessor communication is kept to minimum. It occurs only during the global routing and crossing placement phases of the algorithm, which constitute less than 15% of the total running time of the algorithm. On the basis of the results of executing the algorithm on two gate-array benchmarks the case is made for using hypercube multiprocessors as accelerators for compute-intensive CAD operations.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/28650/1/0000466.pd

    Recursive circulants and their embeddings among hypercubes

    Get PDF
    AbstractWe propose an interconnection structure for multicomputer networks, called recursive circulant. Recursive circulant G(N,d) is defined to be a circulant graph with N nodes and jumps of powers of d. G(N,d) is node symmetric, and has some strong hamiltonian properties. G(N,d) has a recursive structure when N=cdm, 1â©œc<d. We develop a shortest-path routing algorithm in G(cdm,d), and analyze various network metrics of G(cdm,d) such as connectivity, diameter, mean internode distance, and visit ratio. G(2m,4), whose degree is m, compares favorably to the hypercube Qm. G(2m,4) has the maximum possible connectivity, and its diameter is ⌈(3m−1)/4⌉. Recursive circulants have interesting relationship with hypercubes in terms of embedding. We present expansion one embeddings among recursive circulants and hypercubes, and analyze the costs associated with each embedding. The earlier version of this paper appeared in Park and Chwa (Proc. Internat. Symp. Parallel Architectures, Algorithms and Networks ISPAN’94, Kanazawa, Japan, December 1994, pp. 73–80)

    Polyvalent Parallelizations for Hierarchical Block Matching Motion Estimation

    Get PDF
    Block matching motion estimation algorithms are widely used in video coding schemes. In this paper,we design an efficient hierarchical block matching motion estimation (HBMME) algorithm on a hypercube multiprocessor. Unlike systolic array designs, this solution is not tied down to specific values of algorithm parameters and thus offers increased flexibility. Moreover, the hypercube network can efficiently handle the non regular data flow of the HBMME algorithm. Our techniques nearly eliminate the occurrence of “difficult” communication patterns, namely many-to-many personalized communication, by replacing them with simple shift operations. These operations have an efficient implementation on most of interconnection networks and thus our techniques can be adapted to other networks as well. With regard to the employed multiprocessor we make no specific assumption about the amount of local memory residing in each processor. Instead, we introduce a free parameter S and assume that each processor has O(S) local memory. By doing so, we handle all the cases of modern multiprocessors, that is fine-grained, medium-grained and coarse-grained multiprocessors and thus our design is quite general

    Properties and algorithms of the (n, k)-star graphs

    Get PDF
    The (n, k)-star interconnection network was proposed in 1995 as an attractive alternative to the n-star topology in parallel computation. The (n, k )-star has significant advantages over the n-star which itself was proposed as an attractive alternative to the popular hypercube. The major advantage of the (n, k )-star network is its scalability, which makes it more flexible than the n-star as an interconnection network. In this thesis, we will focus on finding graph theoretical properties of the (n, k )-star as well as developing parallel algorithms that run on this network. The basic topological properties of the (n, k )-star are first studied. These are useful since they can be used to develop efficient algorithms on this network. We then study the (n, k )-star network from algorithmic point of view. Specifically, we will investigate both fundamental and application algorithms for basic communication, prefix computation, and sorting, etc. A literature review of the state-of-the-art in relation to the (n, k )-star network as well as some open problems in this area are also provided
    corecore