3,898 research outputs found

    VLSI-sorting evaluated under the linear model

    Get PDF
    AbstractThere are several different models of computation used on which to base evaluations of VLSI sorting algorithms and there are different measures of complexity. This paper revises complexity results under the linear model that have been gained under the constant model. This approach is due to expected technological development (see Mangir, 1983; Thompson and Raghavan, 1984; Vitanyi, 1984a, 1984b).For the constant model we know that for medium sized keys there are AT2and AP2 optimal sorting algorithms with T ranging from ω(log n) to O(√nk) and P ranging from Ω(1) to O(√nk) (Bilardi, 1984). The main results of asymptotic analysis of sorting algorithms under the linear model are that the lower bounds allow AT2 optimal sorting algorithms only for T = Θ(√nk) but allow AP2 algorithms in the same range as under the constant model. Furthermore the sorting algorithms presented in this paper meet these lower bounds. This proves that these bounds cannot be improved for k = Θ (log n). The building block for the realization of these sorting algorithms is a comparison exchange module that compares r × s bit matrices in time TC = Θ(r + s) on an area AC = Θ(r2) (not including the storage area for the keys).For problem sizes that exceed realistic chip capacities, chip-external sorting algorithms can be used. In this paper two different chip-external sorting algorithms (BBB(S) and TWB(S)) are presented. They are designed to be implemented on a single board. They use a sorting chip S to perform the sort-split operation on blocks of data BBB(S) and TWB(S) are systolic algorithms using local communication only so that their evaluation does not depend on whether the constant or the linear model is used. Furthermore it seems obvious that their design is technically feasible whenever the sorting chip S is technically feasible.TWB has optimal asymptotic time complexity, so its existence proves that under the linear model external sorting can be done asymptotically as fast as under the constant model. The time complexity of TWB(S) is linearly dependent on the speed gs = nsts. It is shown that the speed if looked at as a function of the chip capacity C is asymptotically maximal for AT2 optimal sorting algorithms. Thus S should be a sorting algorithm similar to the M-M-sorter presented in this paper. A major disadvantage of TWB(S) is that it cannot exploit the maximal throughput ds = ns/ps of a systolic sorting algorithm S.Therefore algorithm BBB(S) is introduced. The time complexity of BBB(S) is linearly dependent on ds. It is shown that the throughput is maximal for AP2 optimal algorithms. There is a wide range of such sorting algorithms including algorithms that can be realized in a way that is independent of the length of the keys. For example, BBB(S) with S being a highly parallel version of odd-even transposition sort has this kind of flexibility. A disadvantage of BBB(S) is that it is asymptotically slower than TWB(S)

    Evaluating local indirect addressing in SIMD proc essors

    Get PDF
    In the design of parallel computers, there exists a tradeoff between the number and power of individual processors. The single instruction stream, multiple data stream (SIMD) model of parallel computers lies at one extreme of the resulting spectrum. The available hardware resources are devoted to creating the largest possible number of processors, and consequently each individual processor must use the fewest possible resources. Disagreement exists as to whether SIMD processors should be able to generate addresses individually into their local data memory, or all processors should access the same address. The tradeoff is examined between the increased capability and the reduced number of processors that occurs in this single instruction stream, multiple, locally addressed, data (SIMLAD) model. Factors are assembled that affect this design choice, and the SIMLAD model is compared with the bare SIMD and the MIMD models

    A Lower Bound Technique for Communication in BSP

    Get PDF
    Communication is a major factor determining the performance of algorithms on current computing systems; it is therefore valuable to provide tight lower bounds on the communication complexity of computations. This paper presents a lower bound technique for the communication complexity in the bulk-synchronous parallel (BSP) model of a given class of DAG computations. The derived bound is expressed in terms of the switching potential of a DAG, that is, the number of permutations that the DAG can realize when viewed as a switching network. The proposed technique yields tight lower bounds for the fast Fourier transform (FFT), and for any sorting and permutation network. A stronger bound is also derived for the periodic balanced sorting network, by applying this technique to suitable subnetworks. Finally, we demonstrate that the switching potential captures communication requirements even in computational models different from BSP, such as the I/O model and the LPRAM

    VHDL Design of a Scalable VLSI Sorting Device Based on Pipelined Computation

    Get PDF
    This paper describes the VHDL design of a sorting algorithm, aiming at defining an elementary sorting unit as a building block of VLSI devices which require a huge number of sorting units. As such, care was taken to reach a reasonable low value of the area-time parameter. A sorting VLSI device, in fact, can be built as a cascade of elementary sorting units which process the input stream in a pipeline fashion: as the processing goes on, a wave of sorted numbers propagates towards the output ports. The paper describes the design starting from an initial theoretical analysis of the algorithm\u27s complexity to a VHDL behavioural analysis of the proposed architecture to a structural synthesis of a sorting block based on the Alliance tools to, finally, a silicon synthesis which was worked out again using Alliance. Two points in the proposed design are particularly noteworthy. First, the sorting architecture is suitable for treating a continuous stream of input data rather than a block of data as in many other designs. Secondly, the proposed design reaches a reasonable compromise between area and time, as it yields an A T product which compares favourably with the theoretical lower bound

    A taxonomy of parallel sorting

    Get PDF
    TR 84-601In this paper, we propose a taxonomy of parallel sorting that includes a broad range of array and file sorting algorithms. We analyze the evolution of research on parallel sorting, from the earliest sorting networks to the shared memory algorithms and the VLSI sorters. In the context of sorting networks, we describe two fundamental parallel merging schemes - the odd-even and the bitonic merge. Sorting algorithms have been derived from these merging algorithms for parallel computers where processors communicate through interconnection networks such as the perfect shuffle, the mesh and a number of other sparse networks. After describing the network sorting algorithms, we show that, with a shared memory model of parallel computation, faster algorithms have been derived from parallel enumeration sorting schemes, where keys are first ranked and then rearranged according to their rank

    VLSI implementation of a fairness ATM buffer system

    Get PDF

    Design and implementation of a general purpose VLSI median filter unit and its applications

    Get PDF
    A VLSI median filter unit has been designed and implemented in 3-Ό m M2 CMOS, using full-custom VLSI design techniques. The unit consists of two single-chip median filters, one extensible and one real-time. The chips are bit-level pipelined systolic structures based on odd/even transposition sorting. The extensible chip is designed for applications requiring variable window sizes and variable word-lengths, whereas the other one is for real-time applications. Various median filtering techniques are easily realized by using the designed chips together with reasonable external hardware
    • 

    corecore