90 research outputs found

    On the Area of Hypercube Layouts

    Get PDF
    This paper precisely analyzes the wire density and required area in standard layout styles for the hypercube. The most natural, regular layout of a hypercube of N^2 nodes in the plane, in a N x N grid arrangement, uses floor(2N/3)+1 horizontal wiring tracks for each row of nodes. (The number of tracks per row can be reduced by 1 with a less regular design.) This paper also gives a simple formula for the wire density at any cut position and a full characterization of all places where the wire density is maximized (which does not occur at the bisection).Comment: 8 pages, 4 figures, LaTe

    On crossing numbers of hypercubes and cube connected cycles

    Get PDF
    Recently the hypercube-like networks have received considerable attention in the field of parallel computing due to its high potential for system availability and parallel execution of algorithms. The crossing number cr(G){\rm cr}(G) of a graph GG is defined as the least number of crossings of its edges when GG is drawn in a plane. Crossing numbers naturally appear in the fabrication of VLSI circuit and provide a good area lower bound argument in VLSI complexity theory. According to the survey paper of Harary et al., all that is known on the exact values of an n-dimensional hypercube cr(Qn){\rm cr}(Q_n) is cr(Q3)=0,cr(Q4)=8{\rm cr}(Q_3)=0, {\rm cr}(Q_4)=8 and cr(Q5)56.{\rm cr}(Q_5)\le 56. We prove the following tight bounds on cr(Qn){\rm cr}(Q_n) and cr(CCCn){\rm cr}(CCC_n): 4n20(n+1)2n2<cr(Qn)<4n6n22n3 \frac{4^n}{20} - (n+1)2^{n-2} < {\rm cr}(Q_n) < \frac{4^n}{6} -n^22^{n-3} 4n203(n+1)2n2<cr(CCCn)<4n6+3n22n3. \frac{4^n}{20} - 3(n+1)2^{n-2} < {\rm cr}(CCC_n) < \frac{4^n}{6} + 3n^22^{n-3}. Our lower bounds on cr(Qn){\rm cr}(Q_n) and cr(CCCn){\rm cr}(CCC_n) give immediately alternative proofs that the area complexity of {\it hypercube} and CCCCCC computers realized on VLSI circuits is $A=\Omega (4^n)

    Unifying mesh- and tree-based programmable interconnect

    Get PDF
    We examine the traditional, symmetric, Manhattan mesh design for field-programmable gate-array (FPGA) routing along with tree-of-meshes (ToM) and mesh-of-trees (MoT) based designs. All three networks can provide general routing for limited bisection designs (Rent's rule with p<1) and allow locality exploitation. They differ in their detailed topology and use of hierarchy. We show that all three have the same asymptotic wiring requirements. We bound this tightly by providing constructive mappings between routes in one network and routes in another. For example, we show that a (c,p) MoT design can be mapped to a (2c,p) linear population ToM and introduce a corner turn scheme which will make it possible to perform the reverse mapping from any (c,p) linear population ToM to a (2c,p) MoT augmented with a particular set of corner turn switches. One consequence of this latter mapping is a multilayer layout strategy for N-node, linear population ToM designs that requires only /spl Theta/(N) two-dimensional area for any p when given sufficient wiring layers. We further show upper and lower bounds for global mesh routes based on recursive bisection width and show these are within a constant factor of each other and within a constant factor of MoT and ToM layout area. In the process we identify the parameters and characteristics which make the networks different, making it clear there is a unified design continuum in which these networks are simply particular regions

    Edge separators for graphs of bounded genus with applications

    No full text
    nn-vertex graph of positive genus gg and maximal degree kk has an O(gkn)O(\sqrt{gkn}) edge separator. This bound is best possible to within a constant factor. The separator can be found in O(g+n)O(g+n) time provided that we start with an imbedding of the graph in its genus surface. This extends known results on planar graphs and similar results about vertex separators. We apply the edge separator to the isoperimetric problem, to efficient embeddings of graphs of genus gg into various classes of graphs including trees, meshes and hypercubes and to showing lower bounds on crossing numbers of Kn,Km,nK_n,K_{m,n} and QnQ_n drawn on surfaces of genus gg

    Embedding Schemes for Interconnection Networks.

    Get PDF
    Graph embeddings play an important role in interconnection network and VLSI design. Designing efficient embedding strategies for simulating one network by another and determining the number of layers required to build a VLSI chip are just two of the many areas in which graph embeddings are used. In the area of network simulation we develop efficient, small dilation embeddings of a butterfly network into a different size and/or type of butterfly network. The genus of a graph gives an indication of how many layers are required to build a circuit. We have determined the exact genus for the permutation network called the star graph, and have given a lower bound for the genus of the permutation network called the pancake graph. The star graph has been proposed as an alternative to the binary hypercube and, therefore, we compare the genus of the star graph with that of the binary hypercube. Another type of embedding that is helpful in determining the number of layers is a book embedding. We develop upper and lower bounds on the pagenumber of a book embedding of the k-ary hypercube along with an upper bound on the cumulative pagewidth

    High-performance computing for vision

    Get PDF
    Vision is a challenging application for high-performance computing (HPC). Many vision tasks have stringent latency and throughput requirements. Further, the vision process has a heterogeneous computational profile. Low-level vision consists of structured computations, with regular data dependencies. The subsequent, higher level operations consist of symbolic computations with irregular data dependencies. Over the years, many approaches to high-speed vision have been pursued. VLSI hardware solutions such as ASIC's and digital signal processors (DSP's) have provided good processing speeds on structured low-level vision tasks. Special purpose systems for vision have also been designed. Currently, there is growing interest in using general purpose parallel systems for vision problems. These systems offer advantages of higher performance, sofavare programmability, generality, and architectural flexibility over the earlier approaches. The choice of low-cost commercial-off-theshelf (COTS) components as building blocks for these systems leads to easy upgradability and increased system life. The main focus of the paper is on effectively using the COTSbased general purpose parallel computing platforms to realize high-speed implementations of vision tasks. Due to the successful use of the COTS-based systems in a variety of high performance applications, it is attractive to consider their use for vision applications as well. However, the irregular data dependencies in vision tasks lead to large communication overheads in the HPC systems. At the University of Southern California, our research efforts have been directed toward designing scalable parallel algorithms for vision tasks on the HPC systems. In our approach, we use the message passing programming model to develop portable code. Our algorithms are specified using C and MPI. In this paper, we summarize our efforts, and illustrate our approach using several example vision tasks. To facilitate the analysis and development of scalable algorithms, a realistic computational model of the parallel system must be used. Several such models have been proposed in the literature. We use the General-purpose Distributed Memory (GDM) model which is a simple but realistic model of state-of-theart parallel machines. Using the GDM model, generic algorithmic techniques such as data remapping, overlapping of communication with computation, message packing, asynchronous execution, and communication scheduling are developed. Using these techniques, we have developed scalable algorithms for many vision tasks. For instance, a scalable algorithm for linear approximation has been developed using the asynchronous execution technique. Using this algorithm, linear feature extraction can be performed in 0.065 s on a 64 node SP-2 for a 512 × 512 image. A serial implementation takes 3.45 s for the same task. Similarly, the communication scheduling and decomposition techniques lead to a scalable algorithm for the line grouping task. We believe that such an algorithmic approach can result in the development of scalable and portable solutions for vision tasks. © 1996 IEEE Publisher Item Identifier S 0018-9219(96)04992-4.published_or_final_versio

    Algebraic approach to hardware description and verification

    Get PDF

    Submicron Systems Architecture Project : Semiannual Technical Report

    Get PDF
    The Mosaic C is an experimental fine-grain multicomputer based on single-chip nodes. The Mosaic C chip includes 64KB of fast dynamic RAM, processor, packet interface, ROM for bootstrap and self-test, and a two-dimensional selftimed router. The chip architecture provides low-overhead and low-latency handling of message packets, and high memory and network bandwidth. Sixty-four Mosaic chips are packaged by tape-automated bonding (TAB) in an 8 x 8 array on circuit boards that can, in turn, be arrayed in two dimensions to build arbitrarily large machines. These 8 x 8 boards are now in prototype production under a subcontract with Hewlett-Packard. We are planning to construct a 16K-node Mosaic C system from 256 of these boards. The suite of Mosaic C hardware also includes host-interface boards and high-speed communication cables. The hardware developments and activities of the past eight months are described in section 2.1. The programming system that we are developing for the Mosaic C is based on the same message-passing, reactive-process, computational model that we have used with earlier multicomputers, but the model is implemented for the Mosaic in a way that supports finegrain concurrency. A process executes only in response to receiving a message, and may in execution send messages, create new processes, and modify its persistent variables before it either exits or becomes dormant in preparation for receiving another message. These computations are expressed in an object-oriented programming notation, a derivative of C++ called C+-. The computational model and the C+- programming notation are described in section 2.2. The Mosaic C runtime system, which is written in C+-, provides automatic process placement and highly distributed management of system resources. The Mosaic C runtime system is described in section 2.3
    corecore