5 research outputs found

    Parallel RAM from Cyclic Circuits

    Full text link
    Known simulations of random access machines (RAMs) or parallel RAMs (PRAMs) by Boolean circuits incur significant polynomial blowup, due to the need to repeatedly simulate accesses to a large main memory. Consider two modifications to Boolean circuits: (1) remove the restriction that circuit graphs are acyclic and (2) enhance AND gates such that they output zero eagerly. If an AND gate has a zero input, it 'short circuits' and outputs zero without waiting for its second input. We call this the cyclic circuit model. Note, circuits in this model remain combinational, as they do not allow wire values to change over time. We simulate a bounded-word-size PRAM via a cyclic circuit, and the blowup from the simulation is only polylogarithmic. Consider a PRAM program PP that on a length nn input uses an arbitrary number of processors to manipulate words of size Θ(logn)\Theta(\log n) bits and then halts within W(n)W(n) work. We construct a size-O(W(n)log4n)O(W(n)\cdot \log^4 n) cyclic circuit that simulates PP. Suppose that on a particular input, PP halts in time TT; our circuit computes the same output within TO(log3n)T \cdot O(\log^3 n) gate delay. This implies theoretical feasibility of powerful parallel machines. Cyclic circuits can be implemented in hardware, and our circuit achieves performance within polylog factors of PRAM. Our simulated PRAM synchronizes processors by simply leveraging logical dependencies between wires

    Parallel routing algorithms in Benes-Clos networks.

    Get PDF
    by Soung-Yue Liew.Thesis (M.Phil.)--Chinese University of Hong Kong, 1996.Includes bibliographical references (leaves 55-57).Chapter 1 --- Introduction --- p.1Chapter 2 --- The Basic Principles of Routing Algorithms --- p.10Chapter 2.1 --- The principles of sequential algorithms --- p.11Chapter 2.1.1 --- Edge-coloring of bipartite graph with maximum degree two --- p.11Chapter 2.1.2 --- Edge-coloring of bipartite graph with maximum degree M --- p.14Chapter 2.2 --- Looping algorithm --- p.17Chapter 2.2.1 --- Paull's Matrix --- p.17Chapter 2.2.2 --- Chain to be rearranged in Paull's Matrix --- p.18Chapter 2.3 --- The principles of parallel algorithms --- p.19Chapter 2.3.1 --- Edge-coloring of bipartite graph with maximum degree two --- p.20Chapter 2.3.2 --- Edge-coloring of bipartite graph with maximum degree 2m --- p.22Chapter 3 --- Parallel routing algorithm in Benes-Clos networks --- p.25Chapter 3.1 --- Routing properties of Benes networks --- p.25Chapter 3.1.1 --- Three-stage structure and routing constraints --- p.26Chapter 3.1.2 --- Algebraic interpretation of connection set up problem --- p.29Chapter 3.1.3 --- Equivalent classes --- p.31Chapter 3.2 --- Parallel routing algorithm --- p.32Chapter 3.2.1 --- Basic principles --- p.32Chapter 3.2.2 --- Initialization --- p.34Chapter 3.2.3 --- Algorithm --- p.36Chapter 3.2.4 --- Set up the states and determine π for next stage --- p.37Chapter 3.2.5 --- Simulation results --- p.40Chapter 3.2.6 --- Time complexity --- p.41Chapter 3.3 --- Contention resolution --- p.41Chapter 3.4 --- Algorithms applied to Clos network with 2m central switches --- p.43Chapter 3.5 --- Parallel algorithms in rearrangeability --- p.47Chapter 4 --- Conclusions --- p.5

    A self-routing permutation network

    No full text
    A self-routing permutation network is a connector which can set its own switches to realize any one-to-one mapping of its inputs onto its outputs. Many permutation networks have been reported in the literature, but none with the self-routing property, except crossbars and cellular permutation arrays which have excessive cost. This paper describes a self-routing permutation network which has O(log3n) bit-level delay and uses O(n log3n) bit-level hardware, where n is the number of inputs to the network. The network is derived from a complementary Beneš network by replacing each of its two switches in its first stage by what is called a 1-sorter and recursively defining the switches in the third stage as self-routing networks. The use of 1-sorters results in substantial reduction in both propagation delay and hardware cost when contrasted with O(n) delay and O(n1.59) hardware of the recursively decomposed version of a complementary Beneš network. Furthermore, these complexities match the propagation delay and hardware cost of Batcher\u27s sorters (the only networks, other than crossbars and cellular permutation arrays, which are known to behave like self-routing permutation networks). More specifically, it is shown that the network of this paper uses about half of the hardware with about four-thirds of the delay of a Batcher\u27s sorter. © 1990
    corecore