181 research outputs found

    A tight layout of the cube-connected cycles

    Get PDF
    Preparata and Vuillemin proposed the cubeconnected cycles (CCC) in 1981 [lS], and in the same paper, gave an asymptotically-optimal layout scheme for the CCC. We give a new layout scheme for the CCC which requires less than half of the area of th,e Preparata- Vuillemin layout. We also give a non-trivial lower bound on the layout area of the CCC. There is a constant factor of 2 between the new layout and the lower bound. We conjectur.e that the new layout is optimal (minimal).published_or_final_versio

    A tight layout of the cube-connected cycles

    Get PDF
    Preparata and Vuillemin proposed the cubeconnected cycles (CCC) in 1981 [lS], and in the same paper, gave an asymptotically-optimal layout scheme for the CCC. We give a new layout scheme for the CCC which requires less than half of the area of th,e Preparata- Vuillemin layout. We also give a non-trivial lower bound on the layout area of the CCC. There is a constant factor of 2 between the new layout and the lower bound. We conjectur.e that the new layout is optimal (minimal).published_or_final_versio

    Tighter layouts of the cube-connected cycles

    Get PDF
    Preparata and Vuillemin proposed the cube-connected cycles (CCC) and its compact layout in 1981 [17]. We give a new layout of the CCC which uses less than half the area of the Preparata-Vuillemin layout. We also give a lower bound on the layout area of the CCC. The area of the new layout deviates from this bound by a small constant factor. If we 'unfold' the cycles in the CCC, the resulting structure can be laid out in optimal area.published_or_final_versio

    Aspects of practical implementations of PRAM algorithms

    Get PDF
    The PRAM is a shared memory model of parallel computation which abstracts away from inessential engineering details. It provides a very simple architecture independent model and provides a good programming environment. Theoreticians of the computer science community have proved that it is possible to emulate the theoretical PRAM model using current technology. Solutions have been found for effectively interconnecting processing elements, for routing data on these networks and for distributing the data among memory modules without hotspots. This thesis reviews this emulation and the possibilities it provides for large scale general purpose parallel computation. The emulation employs a bridging model which acts as an interface between the actual hardware and the PRAM model. We review the evidence that such a scheme crn achieve scalable parallel performance and portable parallel software and that PRAM algorithms can be optimally implemented on such practical models. In the course of this review we presented the following new results: 1. Concerning parallel approximation algorithms, we describe an NC algorithm for finding an approximation to a minimum weight perfect matching in a complete weighted graph. The algorithm is conceptually very simple and it is also the first NC-approximation algorithm for the task with a sub-linear performance ratio. 2. Concerning graph embedding, we describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the 2-dimcnsional mesh. In the embeddings the maximum distance from a leaf to the root of the tree is asymptotically optimally short. The embeddings facilitate efficient implementation of many PRAM algorithms on networks employing these graphs as interconnection networks. 3. Concerning bulk synchronous algorithmics, we describe scalable transportable algorithms for the following three commonly required types of computation; balanced tree computations. Fast Fourier Transforms and matrix multiplications

    Parallel computation on sparse networks of processors

    Get PDF
    SIGLELD:D48226/84 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    VLSI-sorting evaluated under the linear model

    Get PDF
    AbstractThere are several different models of computation used on which to base evaluations of VLSI sorting algorithms and there are different measures of complexity. This paper revises complexity results under the linear model that have been gained under the constant model. This approach is due to expected technological development (see Mangir, 1983; Thompson and Raghavan, 1984; Vitanyi, 1984a, 1984b).For the constant model we know that for medium sized keys there are AT2and AP2 optimal sorting algorithms with T ranging from ω(log n) to O(√nk) and P ranging from Ω(1) to O(√nk) (Bilardi, 1984). The main results of asymptotic analysis of sorting algorithms under the linear model are that the lower bounds allow AT2 optimal sorting algorithms only for T = Θ(√nk) but allow AP2 algorithms in the same range as under the constant model. Furthermore the sorting algorithms presented in this paper meet these lower bounds. This proves that these bounds cannot be improved for k = Θ (log n). The building block for the realization of these sorting algorithms is a comparison exchange module that compares r × s bit matrices in time TC = Θ(r + s) on an area AC = Θ(r2) (not including the storage area for the keys).For problem sizes that exceed realistic chip capacities, chip-external sorting algorithms can be used. In this paper two different chip-external sorting algorithms (BBB(S) and TWB(S)) are presented. They are designed to be implemented on a single board. They use a sorting chip S to perform the sort-split operation on blocks of data BBB(S) and TWB(S) are systolic algorithms using local communication only so that their evaluation does not depend on whether the constant or the linear model is used. Furthermore it seems obvious that their design is technically feasible whenever the sorting chip S is technically feasible.TWB has optimal asymptotic time complexity, so its existence proves that under the linear model external sorting can be done asymptotically as fast as under the constant model. The time complexity of TWB(S) is linearly dependent on the speed gs = nsts. It is shown that the speed if looked at as a function of the chip capacity C is asymptotically maximal for AT2 optimal sorting algorithms. Thus S should be a sorting algorithm similar to the M-M-sorter presented in this paper. A major disadvantage of TWB(S) is that it cannot exploit the maximal throughput ds = ns/ps of a systolic sorting algorithm S.Therefore algorithm BBB(S) is introduced. The time complexity of BBB(S) is linearly dependent on ds. It is shown that the throughput is maximal for AP2 optimal algorithms. There is a wide range of such sorting algorithms including algorithms that can be realized in a way that is independent of the length of the keys. For example, BBB(S) with S being a highly parallel version of odd-even transposition sort has this kind of flexibility. A disadvantage of BBB(S) is that it is asymptotically slower than TWB(S)
    • …
    corecore