5 research outputs found

    ARBITRATE-AND-MOVE PRIMITIVES FOR HIGH THROUGHPUT ON-CHIP INTERCONNECTION NETWORKS

    Get PDF
    An n-leaf pipelined balanced binary tree is used for arbitration of order and movement of data from n input ports to one output port. A novel arbitrate-and-move primitive circuit for every node of the tree, which is based on a concept of reduced synchrony that benefits from attractive features of both asynchronous and synchronous designs, is presented. The design objective of the pipelined binary tree is to provide a key building block in a high-throughput mesh-of-trees interconnection network for Explicit Multi Threading (XMT) architecture, a recently introduced parallel computation framework. The proposed reduced synchrony circuit was compared with asynchronous and synchronous designs of arbitrate-and-move primitives. Simulations with 0.18m technology show that compared to an asynchronous design, the proposed reduced synchrony implementation achieves a higher throughput, up to 2 Giga- Requests per second on an 8-leaf binary tree. Our circuit also consumes less power than the synchronous design, and requires less silicon area than both the synchronous and asynchronous designs

    PI-OBS: a Parallel Iterative Optical Burst Scheduler for OBS networks

    Get PDF
    This paper presents the PI-OBS algorithm, a parallel-iterative scheduler for OBS nodes. Conventional schemes are greedy in the sense that they process headers one by one. In PI-OBS, all the headers received during a given time window are jointly processed to optimize the delay and output wavelength allocation, applying void filling techniques, and allowing traffic differentiation. Results show a similar or better performance than the LAUC-VF algorithm, commonly used as a performance bound for OBS schedulers. The PI-OBS scheduler has been designed to allow parallel electronic implementation similar to the ones in VOQ schedulers, with a deterministic response time.This research has been partially supported by the MEC projects TEC2007-67966-01/TCM CON-PARTE-1, and TEC2008-02552-E, and it is also developed in the framework of "Programa de Ayudas a Grupos de Excelencia de la R. de Murcia, F. Séneca"

    Design of switch architecture for the geographical cell transport protocol

    Get PDF
    The Internet is divided into multiple layers to reduce and manage complexity. The International Organization for Standardization (ISO) developed a 7 layer network model and had been revised to a 5 layer TCP/IP based Internet Model. The layers of the Internet can also be divided into top layer TCP/IP protocol suite layers and the underlying transport network layers. SONET/SDH, a dominant transport network, was designed initially for circuit based telephony services. Advancement in the internet world with voice and video services had pushed SONET/SDH to operate with reduced efficiencies and increased costs. Hence, redesign and redeployment of the transport network has been and continues to be a subject of research and development. Several projects are underway to explore new transport network ideas such as G.709 and GMPLS. This dissertation presents the Geographical Cell Transport (GCT) protocol as a candidate for a next generation transport network. The GCT transport protocol and its cell format are described. The benefits provided by the proposed GCT transport protocol as compared to the existing transport networks are investigated. Existing switch architectures are explored and a best architecture to be implemented in VLSI for the proposed transport network input queued virtual output queuing is obtained. The objectives of this switch are high performance, guaranteed fairness among all inputs and outputs, robust behavior under different traffic patterns, and support for Quality of Service (QoS) provisioning. An implementation of this switch architecture is carried out using HDL. A novel pseudo random number generation unit is designed to nullify the bias present in an arbitration unit. The validity of the designed is checked by developing a traffic load model. The speedup factor required in the switch to maintain desired throughput is explored and is presented in detail. Various simulation results are shown to study the behavior of the designed switch under uniform and hotspot traffic. The simulation results show that QoS behavior and the crossing traffic through the switch has not been affected by hotspots

    Design and implementation of high-speed arbiter for large scale VOQ crossbar switches

    No full text
    Crossbars are frequently used as the switching fabric for high-performance packet switches (IP routers, ATM switches, Ethernet switches). The performance, functionality, and scalability (in terms of line rate and/or number of ports) of these switches are directly related to the arbitration/scheduling algorithm which must retrieve the state information of input queues, compute a (pseudo-) optimum matching, and configure the crossbar accordingly, all within one packet cycle. In this paper, we give a detailed hardware design and implementation of a novel arbitration scheme, named RDSRR [1], that lends itself well to high-speed implementation, while at the same time achieves excellent performance under a variety of traffic patterns. We present a novel pipeline technique and the full-custom design of the arbiter circuit using TSMC 0.25 m CMOS technology which can support switch sizes of up to 256 x 256 at a line rate of 10 Gbps

    Mesh-of-Trees Interconnection Network for an Explicitly Multi-Threaded Parallel Computer Architecture

    Get PDF
    As the multiple-decade long increase in clock rates starts to slow down, main-stream general-purpose processors evolve towards single-chip parallel processing. On-chip interconnection networks are essential components of such machines, supporting the communication between processors and the memory system. This task is especially challenging for some easy-to-program parallel computers, which are designed with performance-demanding memory systems. This study proposes an interconnection network, with a novel implementation of the Mesh-of-Trees (MoT) topology. The MoT network is evaluated relative to metrics such as wire area complexity, total register count, bandwidth, network diameter, single switch delay, maximum throughput per area, trade-offs between throughput and latency, and post-layout performance. It is also compared with some other traditional network topologies, such as mesh, ring, hypercube, butterfly, fat trees, butterfly fat trees, and replicated butterfly networks. Concrete results show that MoT provides higher throughput and lower latency especially when the input traffic (or the on-chip parallelism) is high, at comparable area cost. The layout of MoT network is evaluated using standard cell design methodology. A prototype chip with 8-terminal MoT network was taped out at 90nm90nm technology and tested. In the context of an easy-to-program single-chip parallel processor, MoT network is embedded in the eXplicit Multi-Threading (XMT) architecture, and evaluated by running parallel applications. In addition to the basic MoT architecture, a novel hybrid extension of MoT is proposed, which allows significant area savings with a small reduction in throughput
    corecore