    Quantum Network Coding

    Since quantum information is continuous, its handling is sometimes surprisingly harder than the classical counterpart. A typical example is cloning; making a copy of digital information is straightforward but it is not possible exactly for quantum information. The question in this paper is whether or not quantum network coding is possible. Its classical counterpart is another good example to show that digital information flow can be done much more efficiently than conventional (say, liquid) flow. Our answer to the question is similar to the case of cloning, namely, it is shown that quantum network coding is possible if approximation is allowed, by using a simple network model called Butterfly. In this network, there are two flow paths, s_1 to t_1 and s_2 to t_2, which shares a single bottleneck channel of capacity one. In the classical case, we can send two bits simultaneously, one for each path, in spite of the bottleneck. Our results for quantum network coding include: (i) We can send any quantum state |psi_1> from s_1 to t_1 and |psi_2> from s_2 to t_2 simultaneously with a fidelity strictly greater than 1/2. (ii) If one of |psi_1> and |psi_2> is classical, then the fidelity can be improved to 2/3. (iii) Similar improvement is also possible if |psi_1> and |psi_2> are restricted to only a finite number of (previously known) states. (iv) Several impossibility results including the general upper bound of the fidelity are also given.Comment: 27pages, 11figures. The 12page version will appear in 24th International Symposium on Theoretical Aspects of Computer Science (STACS 2007

    The efficiency of greedy routing in hypercubes and butterflies

    Includes bibliographical references (p. 24-26).Cover title. "October 1990".Research supported by the ARO. DAAL03-86-K-0171 Research supported by the NSF. ECS-8552419by George D. Stamoulis and John N. Tsitsiklis

    Universal Wormhole Routing

    In this paper, we examine the wormhole routing problem in terms of the “congestion” c and “dilation” d for a set of packet paths. We show, with mild restrictions, that there is a simple randomized algorithm for routing any set of P packets in O(cdη+cLηlogP) time with high probability, where L is the number of flits in a packet, and η=min{d,L}; only a constant number of flits are stored in each queue at any time. Using this result, we show that a fat-tree network of area Θ(A) can simulate wormhole routing on any network of comparable area with O(log^3 A) slowdown, when all worms have the same length. Variable-length worms are also considered. We run some simulations on the fat-tree which show that not only does wormhole routing tend to perform better than the more heavily studied store-and-forward routing in this context, but that performance superior to our provable bound is attainable in practice

    Complexity, action, and black holes

    Our earlier paper "Complexity Equals Action" conjectured that the quantum computational complexity of a holographic state is given by the classical action of a region in the bulk (the "Wheeler-DeWitt" patch). We provide calculations for the results quoted in that paper, explain how it fits into a broader (tensor) network of ideas, and elaborate on the hypothesis that black holes are the fastest computers in nature.Comment: 55+14 pages, many figures. v2: (so many) typos fixed, references adde

    Avoiding Braess' Paradox through Collective Intelligence

    In an Ideal Shortest Path Algorithm (ISPA), at each moment each router in a network sends all of its traffic down the path that will incur the lowest cost to that traffic. In the limit of an infinitesimally small amount of traffic for a particular router, its routing that traffic via an ISPA is optimal, as far as cost incurred by that traffic is concerned. We demonstrate though that in many cases, due to the side-effects of one router's actions on another routers performance, having routers use ISPA's is suboptimal as far as global aggregate cost is concerned, even when only used to route infinitesimally small amounts of traffic. As a particular example of this we present an instance of Braess' paradox for ISPA's, in which adding new links to a network decreases overall throughput. We also demonstrate that load-balancing, in which the routing decisions are made to optimize the global cost incurred by all traffic currently being routed, is suboptimal as far as global cost averaged across time is concerned. This is also due to "side-effects", in this case of current routing decision on future traffic. The theory of COllective INtelligence (COIN) is concerned precisely with the issue of avoiding such deleterious side-effects. We present key concepts from that theory and use them to derive an idealized algorithm whose performance is better than that of the ISPA, even in the infinitesimal limit. We present experiments verifying this, and also showing that a machine-learning-based version of this COIN algorithm in which costs are only imprecisely estimated (a version potentially applicable in the real world) also outperforms the ISPA, despite having access to less information than does the ISPA. In particular, this COIN algorithm avoids Braess' paradox.Comment: 28 page

    A powerful heuristic for telephone gossiping

    A refined heuristic for computing schedules for gossiping in the telephone model is presented. The heuristic is fast: for a network with n nodes and m edges, requiring R rounds for gossiping, the running time is O(R n log(n) m) for all tested classes of graphs. This moderate time consumption allows to compute gossiping schedules for networks with more than 10,000 PUs and 100,000 connections. The heuristic is good: in practice the computed schedules never exceed the optimum by more than a few rounds. The heuristic is versatile: it can also be used for broadcasting and more general information dispersion patterns. It can handle both the unit-cost and the linear-cost model. Actually, the heuristic is so good, that for CCC, shuffle-exchange, butterfly de Bruijn, star and pancake networks the constructed gossiping schedules are better than the best theoretically derived ones. For example, for gossiping on a shuffle-exchange network with 2^{13} PUs, the former upper bound was 49 rounds, while our heuristic finds a schedule requiring 31 rounds. Also for broadcasting the heuristic improves on many formerly known results. A second heuristic, works even better for CCC, butterfly, star and pancake networks. For example, with this heuristic we found that gossiping on a pancake network with 7! PUs can be performed in 15 rounds, 2 fewer than achieved by the best theoretical construction. This second heuristic is less versatile than the first, but by refined search techniques it can tackle even larger problems, the main limitation being the storage capacity. Another advantage is that the constructed schedules can be represented concisely

    On entanglement spreading from holography

    A global quench is an interesting setting where we can study thermalization of subsystems in a pure state. We investigate entanglement entropy (EE) growth in global quenches in holographic field theories and relate some of its aspects to quantities characterizing chaos. More specifically we obtain four key results: 1. We prove holographic bounds on the entanglement velocity vEv_E and the butterfly effect speed vBv_B that arises in the study of chaos. 2. We obtain the EE as a function of time for large spherical entangling surfaces analytically. We show that the EE is insensitive to the details of the initial state or quench protocol. 3. In a thermofield double state we determine analytically the two-sided mutual information between two large concentric spheres separated in time. 4. We derive a bound on the rate of growth of EE for arbitrary shapes, and develop an expansion for EE at early times. In a companion paper arXiv:1608.05101, we put these results in the broader context of EE growth in chaotic systems: we relate EE growth to the chaotic spreading of operators, derive bounds on EE at a given time, and compare the holographic results to spin chain numerics and toy models. In this paper, we perform holographic calculations that provide the basis of arguments presented in that paper.Comment: v2: presentation improved, typos fixed, 54 pages, 17 figures v1: 53 pages, 16 figure

    Efficient Interconnection Schemes for VLSI and Parallel Computation

    This thesis is primarily concerned with two problems of interconnecting components in VLSI technologies. In the first case, the goal is to construct efficient interconnection networks for general-purpose parallel computers. The second problem is a more specialized problem in the design of VLSI chips, namely multilayer channel routing. In addition, a final part of this thesis provides lower bounds on the area required for VLSI implementations of finite-state machines. This thesis shows that networks based on Leiserson\u27s fat-tree architecture are nearly as good as any network built in a comparable amount of physical space. It shows that these universal networks can efficiently simulate competing networks by means of an appropriate correspondence between network components and efficient algorithms for routing messages on the universal network. In particular, a universal network of area A can simulate competing networks with O(lg^3A) slowdown (in bit-times), using a very simple randomized routing algorithm and simple network components. Alternatively, a packet routing scheme of Leighton, Maggs, and Rao can be used in conjunction with more sophisticated switching components to achieve O(lg^2 A) slowdown. Several other important aspects of universality are also discussed. It is shown that universal networks can be constructed in area linear in the number of processors, so that there is no need to restrict the density of processors in competing networks. Also results are presented for comparisons between networks of different size or with processors of different sizes (as determined by the amount of attached memory). Of particular interest is the fact that a universal network built from sufficiently small processors can simulate (with the slowdown already quoted) any competing network of comparable size regardless of the size of processors in the competing network. In addition, many of the results given do not require the usual assumption of unit wire delay. Finally, though most of the discussion is in the two-dimensional world, the results are shown to apply in three dimensions by way of a simple demonstration of general results on graph layout in three dimensions. The second main problem considered in this thesis is channel routing when many layers of interconnect are available, a scenario that is becoming more and more meaningful as chip fabrication technologies advance. This thesis describes a system MulCh for multilayer channel routing which extends the Chameleon system developed at U. C. Berkeley. Like Chameleon, MulCh divides a multilayer problem into essentially independent subproblems of at most three layers, but unlike Chameleon, MulCh considers the possibility of using partitions comprised of a single layer instead of only partitions of two or three layers. Experimental results show that MulCh often performs better than Chameleon in terms of channel width, total net length, and number of vias. In addition to a description of MulCh as implemented, this thesis provides improved algorithms for subtasks performed by MulCh, thereby indicating potential improvements in the speed and performance of multilayer channel routing. In particular, a linear time algorithm is given for determining the minimum width required for a single-layer channel routing problem, and an algorithm is given for maintaining the density of a collection of nets in logarithmic time per net insertion. The last part of this thesis shows that straightforward techniques for implementing finite-state machines are optimal in the worst case. Specifically, for any s and k, there is a deterministic finite-state machine with s states and k symbols such that any layout algorithm requires (ks lg s) area to lay out its realization. For nondeterministic machines, there is an analogous lower bound of (ks^2) area