10 research outputs found

    Some Theoretical Results of Hypercube for Parallel Architecture

    Get PDF
    This paper surveys some theoretical results of the hypercube for design of VLSI architecture. The parallel computer including the hypercube multiprocessor will become a leading technology that supports efficient computation for large uncertain systems

    Rectilinear partitioning of irregular data parallel computations

    Get PDF
    New mapping algorithms for domain oriented data-parallel computations, where the workload is distributed irregularly throughout the domain, but exhibits localized communication patterns are described. Researchers consider the problem of partitioning the domain for parallel processing in such a way that the workload on the most heavily loaded processor is minimized, subject to the constraint that the partition be perfectly rectilinear. Rectilinear partitions are useful on architectures that have a fast local mesh network. Discussed here is an improved algorithm for finding the optimal partitioning in one dimension, new algorithms for partitioning in two dimensions, and optimal partitioning in three dimensions. The application of these algorithms to real problems are discussed

    Optimal processor assignment for pipeline computations

    Get PDF
    The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered

    Embedding Meshes on the Star Graph

    Get PDF
    We develop algorithms for mapping n-dimensional meshes on a star graph of degree n with expansion 1 and dilation 3. We show that an n degree star graph can efficiently simulate an n-dimensional mesh

    On graphs embeddable in a layer of a hypercube and their extremal numbers

    Full text link
    A graph is cubical if it is a subgraph of a hypercube. For a cubical graph HH and a hypercube QnQ_n, ex(Qn,H)ex(Q_n, H) is the largest number of edges in an HH-free subgraph of QnQ_n. If ex(Qn,H)ex(Q_n, H) is equal to a positive proportion of the number of edges in QnQ_n, HH is said to have positive Tur\'an density in a hypercube; otherwise it has zero Tur\'an density. Determining ex(Qn,H)ex(Q_n, H) and even identifying whether HH has positive or zero Tur\'an density remains a widely open question for general HH. In this paper we focus on layered graphs, i.e., graphs that are contained in an edge-layer of some hypercube. Graphs HH that are not layered have positive Tur\'an density because one can form an HH-free subgraph of QnQ_n consisting of edges of every other layer. For example, a 44-cycle is not layered and has positive Tur\'an density. However, in general it is not obvious what properties layered graphs have. We give a characterisation of layered graphs in terms of edge-colorings and show that any nn-vertex layered graphs has at most 12nlogn(1+o(1))\frac{1}{2}n \log n (1+o(1)) edges. We show that most non-trivial subdivisions have zero Tur\'an density, extending known results on zero Tur\'an density of even cycles of length at least 1212 and of length 88. However, we prove that there are cubical graphs of girth 88 that are not layered and thus having positive Tur\'an density. The cycle of length 1010 remains the only cycle for which it is not known whether its Tur\'an density is positive or not. We prove that ex(Qn,C10)=Ω(n2n/logan)ex(Q_n, C_{10})= \Omega(n2^n/ \log^a n), for a constant aa, showing that the extremal number for a 1010-cycle behaves differently from any other cycle of zero Tur\'an density

    Fault-Tolerant Ring Embeddings in Hypercubes -- A Reconfigurable Approach

    Get PDF
    We investigate the problem of designing reconfigurable embedding schemes for a fixed hypercube (without redundant processors and links). The fundamental idea for these schemes is to embed a basic network on the hypercube without fully utilizing the nodes on the hypercube. The remaining nodes can be used as spares to reconfigure the embeddings in case of faults. The result of this research shows that by carefully embedding the application graphs, the topological properties of the embedding can be preserved under fault conditions, and reconfiguration can be carried out efficiently. In this dissertation, we choose the ring as the basic network of interest, and propose several schemes for the design of reconfigurable embeddings with the aim of minimizing reconfiguration cost and performance degradation. The cost is measured by the number of node-state changes or reconfiguration steps needed for processing of the reconfiguration, and the performance degradation is characterized as the dilation of the new embedding after reconfiguration. Compared to the existing schemes, our schemes surpass the existing ones in terms of applicability of schemes and reconfiguration cost needed for the resulting embeddings

    Performance effects of node mapping on the IBM BlueGene/L machine

    Get PDF
    The IBM BlueGene/L (BG/L) supercomputer is a new machine consisting of up to 65536 relatively modest compute nodes connected with three application-level networks -- a high-performance point-to-point 3D torus network, a global combining/broadcast tree network for collective operations, and a global interrupt/barrier network for extremely fast global barriers. The BG/L control system allows the user to assign MPI logical ranks to physical torus coordinates at run-time in an arbitrary manner as long as all nodes are uniquely included in the mapping. This presents the possibility of increasing application performance with very little effort. This thesis investigates the performance effects of node mapping with several benchmarks and scientific codes using a variety of existing and new mapping strategies. The benchmarks are the NAS parallel benchmarks, the Ames Laboratory Classical Molecular dynamics code (ALCMD), and the General Atomic and Molecular Electronic Structure System (GAMESS) application. The NAS benchmarks are short, easy to understand, and fairly well known. ALCMD has an interesting communication pattern that should benefit from a good mapping strategy. GAMESS is one application that is not necessarily well-suited for running on BlueGene because it requires a large amount of compute power and memory per node. However, it provides an interesting data point for performance of applications that were not designed for a particular system and the possible benefits of mapping on such applications. The mappings investigated were the stock permutations (XYZ, XZY, etc), Gray-code based mesh mappings, random maps, variations on Gray-code maps for embedding 2D meshes in the 3D torus, and three maps designed for GAMESS. Performance results are presented for node mappings on several BG/L partition sizes

    Hypercube-Based Topologies With Incremental Link Redundancy.

    Get PDF
    Hypercube structures have received a great deal of attention due to the attractive properties inherent to their topology. Parallel algorithms targeted at this topology can be partitioned into many tasks, each of which running on one node processor. A high degree of performance is achievable by running every task individually and concurrently on each node processor available in the hypercube. Nevertheless, the performance can be greatly degraded if the node processors spend much time just communicating with one another. The goal in designing hypercubes is, therefore, to achieve a high ratio of computation time to communication time. The dissertation addresses primarily ways to enhance system performance by minimizing the communication time among processors. The need for improving the performance of hypercube networks is clearly explained. Three novel topologies related to hypercubes with improved performance are proposed and analyzed. Firstly, the Bridged Hypercube (BHC) is introduced. It is shown that this design is remarkably more efficient and cost-effective than the standard hypercube due to its low diameter. Basic routing algorithms such as one to one and broadcasting are developed for the BHC and proven optimal. Shortcomings of the BHC such as its asymmetry and limited application are clearly discussed. The Folded Hypercube (FHC), a symmetric network with low diameter and low degree of the node, is introduced. This new topology is shown to support highly efficient communications among the processors. For the FHC, optimal routing algorithms are developed and proven to be remarkably more efficient than those of the conventional hypercube. For both BHC and FHC, network parameters such as average distance, message traffic density, and communication delay are derived and comparatively analyzed. Lastly, to enhance the fault tolerance of the hypercube, a new design called Fault Tolerant Hypercube (FTH) is proposed. The FTH is shown to exhibit a graceful degradation in performance with the existence of faults. Probabilistic models based on Markov chain are employed to characterize the fault tolerance of the FTH. The results are verified by Monte Carlo simulation. The most attractive feature of all new topologies is the asymptotically zero overhead associated with them. The designs are simple and implementable. These designs can lead themselves to many parallel processing applications requiring high degree of performance

    On embedding rectangular grids in hypercubes.

    No full text
    The following graph-embedding question is addressed: given a two-dimensional grid and the smallest hypercube with at least as many nodes as grid points, how can one assign grid points to hypercube nodes (with at most one grid point per node) so as to keep grid neighbors near each other in the cube? An embedding scheme for an infinite class of two-dimensional grids is given that keeps grid neighbors within a distance of two apart.link_to_subscribed_fulltex
    corecore