220 research outputs found

    Performance analysis of pyramid mapping algorithms for the hypercube

    Get PDF
    Comparative performance analysis of algorithms that map pyramids and multilevel structures onto the hypercube are presented. The pyramid structure is appropriate for low-level and intermediate-level computer vision algorithms. It is not only efficient for the support of both local and global operations but also capable of supporting the implementation of multilevel solvers. Nevertheless, pyramids lack the capability of efficient implementation of the majority of scientific algorithms and their cost may become unacceptably high. On a different horizon, hypercube machines have widely been used in the field of parallel computing due to their small diameter, high degree of fault tolerance, and rich interconnection that permits fast communication at a reasonable cost. As a result, hypercube machines can efficiently emulate pyramids. Therefore, the characteristics which make hypercube machines useful scientific processors also make them efficient image processors. Two algorithms which have been developed for the efficient mapping of the pyramid onto the hypercube are discussed in this thesis. The algorithm proposed by Stout [4] requires a hypercube with a number of processing elements (PEs) which is equal to the number of nodes in the base of the pyramid. This algorithm can activate only one level of the pyramid at a time. In contrast, the algorithm proposed by Patel and Ziavras [7] requires the same number of PEs as Stout\u27s algorithm but allows the concurren simulation of multiple levels, as long as the base level is not involved in the set of pyramid levels that need to be simulated at the same time. This low-cost algorithm yields higher performance through high utilization of PEs. However it performs slightly worse than Stout\u27s algorithm when only one level is active at a time. Patel and Ziavras\u27 algorithm performs much better than Stout\u27s algorithm when all levels, excluding the leaf level, are active concurrently. The comparative analysis of these two algorithms is based on the incorporation of simulation results for some image processing algorithms which are perimeter counting, image convolution, and segmentation

    Investigation of reduced hypercube (RH) networks : embedding and routing capabilities

    Get PDF
    The choice of a topology for the interconnection of resources in a distributed-memory parallel computing system is a major design decision. The direct binary hypercube has been widely used for this purpose due to its low diameter and its ability to efficiently emulate other important structures. The aforementioned strong properties of the hypercube come at the cost of high VLSI complexity due to the increase in the number of communication ports and channels per node with an increase in the total number of nodes. The reduced hypercube (RH) topology, which is obtained by a uniform reduction in the number of links for each hypercube node, yields lower complexity interconnection networks compared to hypercubes with the same number of nodes, thus permitting the construction of larger parallel systems. Furthermore, it has been shown that the RH at a lower cost achieves performance comparable to that of a regular hypercube with the same number of nodes. A very important issue for the viability of the RH is to investigate the efficiency of embedding frequently used topologies into it. This thesis proposes embedding algorithms for three very important topologies, namely the ring, the torus and the binary tree. The performance of the proposed algorithms is analyzed and compared to that of equivalent embedding algorithms for the regular hypercube. It is shown that these topologies are emulated efficiently on the RH. Additionally, two already proposed routing algorithms for the RH are evaluated through simulation results

    Adaptive global optimization algorithms

    Get PDF
    Global optimization is concerned with finding the minimum value of a function where many local minima may exist. The development of a global optimization algorithm may involve using information about the target function (e.g., differentiability) and functions based on statistical models to better the worst case time complexity and expected error of similar deterministic algorithms. Recent algorithms are investigated, new ones proposed and their performance is analyzed. Minimum, maximum and average case error bounds for the algorithms presented are derived. Software architecture implemented with MATLAB and Java is presented and experimental results for the algorithms are displayed. The graphical capabilities and function-rich MATLAB environment are combined with the object oriented features of Java, hosted on the computer system described in this paper, to provide a fast, powerful test environment to provide experimental results. In order to do this, matlabcontrol, a third party set of procedures that allows a Java program to call MATLAB functions to access a function such as voronoi() or to provide graphical results, is used. Additionally, the Java implementation can be called from, and return values to, the MATLAB environment. The data can then be used as input to MATLAB\u27s graphing or other functions. The software test environment provides algorithm performance information such as whether more iterations or replications of a proposed algorithm would be expected to provide a better result for an algorithm. It is anticipated that the functionality provided by the framework would be used for initial development and analysis and subsequently removed and replaced with optimized (in the computer efficiency sense) functions for deployment

    Data broadcasting and reduction, prefix computation, and sorting on reduced hypercube (RH) parallel computers

    Get PDF
    The binary hypercube parallel computer has been very popular due to its rich interconnection structure and small average internode distance which allow the efficient embedding of frequently used topologies. Communication patterns of many parallel algorithms also match the hypercube topology. The hypercube has high VLSI complexity. however. due to the logarithmic increase in the number of connections to each node with the increase in the number of dimensions of the hypercube. The reduced hypercube (RH) interconnection network. which is obtained by a uniform reduction in the number of links for each hypercube node. yields lower-complexity interconnection networks when compared to hypercubes with the same number of nodes. It has been shown elsewhere that the RH interconnection network achieves performance comparable to that of the hypercube. at lower hardware cost. The reduced VLSI complexity of the RH also permits the construction of larger systems. thus. making the RH suitable for massively parallel processing. This thesis proposes algorithms for data broadcasting and reduction. prefix computation, and sorting on the RH parallel computer. All these operations are fundamental to many parallel algorithms. A worst case analysis of each algorithm is given and compared with equivalent- algorithms for the regular hypercube. It is shown that the proposed algorithms for the RH yield performance comparable to that for the regular hypercube

    Efficient hypercube communications

    Get PDF
    Hypercube algorithms may be developed for a variety of communication-intensive tasks such as sending a message from one node to another, broadcasting a message from one node to all others, broadcasting a message from each node to all others, all-to-all personalized communication, one-to-all personalized communication, and exchanging messages between nodes via fixed permutations. All these communication patterns are special cases of many-to-many personalized communication. The problem of many-to-many personalized communication is investigated here. Two routing algorithms for many-to-many personalized communication are presented here. The algorithms proposed yield very high performance with respect to the number of time steps and packet transmissions. The first algorithm yields high performance through attempts to equibalance the number of messages at intermediate nodes. This technique tries to avoid creating a bottleneck at any node and thus reduces the total communication time. The second algorithm yields high performance through one-step time-lookahead equibalancing. It chooses from the candidate intermediate nodes the one which will probably have the minimum number of messages in the next cycle

    A 2D based Partition Strategy for Solving Ranking under Team Context (RTP)

    Full text link
    In this paper, we propose a 2D based partition method for solving the problem of Ranking under Team Context(RTC) on datasets without a priori. We first map the data into 2D space using its minimum and maximum value among all dimensions. Then we construct window queries with consideration of current team context. Besides, during the query mapping procedure, we can pre-prune some tuples which are not top ranked ones. This pre-classified step will defer processing those tuples and can save cost while providing solutions for the problem. Experiments show that our algorithm performs well especially on large datasets with correctness

    Fast ray tracing by ray classification

    Full text link

    Fault-Tolerant Ring Embeddings in Hypercubes -- A Reconfigurable Approach

    Get PDF
    We investigate the problem of designing reconfigurable embedding schemes for a fixed hypercube (without redundant processors and links). The fundamental idea for these schemes is to embed a basic network on the hypercube without fully utilizing the nodes on the hypercube. The remaining nodes can be used as spares to reconfigure the embeddings in case of faults. The result of this research shows that by carefully embedding the application graphs, the topological properties of the embedding can be preserved under fault conditions, and reconfiguration can be carried out efficiently. In this dissertation, we choose the ring as the basic network of interest, and propose several schemes for the design of reconfigurable embeddings with the aim of minimizing reconfiguration cost and performance degradation. The cost is measured by the number of node-state changes or reconfiguration steps needed for processing of the reconfiguration, and the performance degradation is characterized as the dilation of the new embedding after reconfiguration. Compared to the existing schemes, our schemes surpass the existing ones in terms of applicability of schemes and reconfiguration cost needed for the resulting embeddings

    New Techniques in Scene Understanding and Parallel Image Processing.

    Get PDF
    There has been tremendous research interest in the areas of computer and robotic vision. Scene understanding and parallel image processing are important paradigms in computer vision. New techniques are presented to solve some of the problems in these paradigms. Automatic interpretation of features in a natural scene is the focus of the first part of the dissertation. The proposed interpretation technique consists of a context dependent feature labeling algorithm using non linear probabilistic relaxation, and an expert system. Traditionally, the output of the labeling is analyzed, and then recognized by a high level interpreter. In this new approach, the knowledge about the scene is utilized to resolve the inconsistencies introduced by the labeling algorithm. A feature labeling system based on this hybrid technique is designed and developed. The labeling system plays a vital role in the development of an automatic image interpretation system for oceanographic satellite images. An extensive study on the existing interpretation techniques has been made in the related areas such as remote sensing, medical diagnosis, astronomy, and oceanography and has shown that our hybrid approach is unique and powerful. The second part of the dissertation presents the results in the area of parallel image processing. A new approach for parallelizing vision tasks in the low and intermediate levels is introduced. The technique utilizes schemes to embed the inherent data or computational structure, used to solve the problem, into parallel architectures such as hypercubes. The important characteristic of the technique is that the adjacent pixels in the image are mapped to nodes that are at a constant distance in the hypercube. Using the technique, parallel algorithms for neighbor-finding and digital distances are developed. A parallel hypercube sorting algorithm is obtained as an illustration of the technique. The research in developing these embedding algorithms has paved the way for efficient reconfiguration algorithms for hypercube architectures
    • …
    corecore