5 research outputs found
ARBITRATE-AND-MOVE PRIMITIVES FOR HIGH THROUGHPUT ON-CHIP INTERCONNECTION NETWORKS
An n-leaf pipelined balanced binary tree is used for
arbitration of order and movement of data from n input
ports to one output port. A novel arbitrate-and-move
primitive circuit for every node of the tree, which is based on
a concept of reduced synchrony that benefits from attractive
features of both asynchronous and synchronous designs, is
presented. The design objective of the pipelined binary tree
is to provide a key building block in a high-throughput
mesh-of-trees interconnection network for Explicit Multi
Threading (XMT) architecture, a recently introduced
parallel computation framework. The proposed reduced
synchrony circuit was compared with asynchronous and
synchronous designs of arbitrate-and-move primitives.
Simulations with 0.18m technology show that compared to
an asynchronous design, the proposed reduced synchrony
implementation achieves a higher throughput, up to 2 Giga-
Requests per second on an 8-leaf binary tree. Our circuit
also consumes less power than the synchronous design, and
requires less silicon area than both the synchronous and
asynchronous designs
PI-OBS: a Parallel Iterative Optical Burst Scheduler for OBS networks
This paper presents the PI-OBS algorithm, a parallel-iterative scheduler for OBS nodes. Conventional schemes are greedy in the sense that they process headers one by one. In PI-OBS, all the headers received during a given time window are jointly processed to optimize the delay and output wavelength allocation, applying void filling techniques, and allowing traffic differentiation. Results show a similar or better performance than the LAUC-VF algorithm, commonly used as a performance bound for OBS schedulers. The PI-OBS scheduler has been designed to allow parallel electronic implementation similar to the ones in VOQ schedulers, with a deterministic response time.This research has been partially supported by the MEC projects
TEC2007-67966-01/TCM CON-PARTE-1, and TEC2008-02552-E,
and it is also developed in the framework of "Programa de Ayudas a
Grupos de Excelencia de la R. de Murcia, F. Séneca"
Design of switch architecture for the geographical cell transport protocol
The Internet is divided into multiple layers to reduce and manage complexity. The International Organization for Standardization (ISO) developed a 7 layer network model and had been revised to a 5 layer TCP/IP based Internet Model. The layers of the Internet can also be divided into top layer TCP/IP protocol suite layers and the underlying transport network layers. SONET/SDH, a dominant transport network, was designed initially for circuit based telephony services. Advancement in the internet world with voice and video services had pushed SONET/SDH to operate with reduced efficiencies and increased costs. Hence, redesign and redeployment of the transport network has been and continues to be a subject of research and development. Several projects are underway to explore new transport network ideas such as G.709 and GMPLS.
This dissertation presents the Geographical Cell Transport (GCT) protocol as a candidate for a next generation transport network. The GCT transport protocol and its cell format are described. The benefits provided by the proposed GCT transport protocol as compared to the existing transport networks are investigated. Existing switch architectures are explored and a best architecture to be implemented in VLSI for the proposed transport network input queued virtual output queuing is obtained. The objectives of this switch are high performance, guaranteed fairness among all inputs and outputs, robust behavior under different traffic patterns, and support for Quality of Service (QoS) provisioning. An implementation of this switch architecture is carried out using HDL.
A novel pseudo random number generation unit is designed to nullify the bias present in an arbitration unit. The validity of the designed is checked by developing a traffic load model. The speedup factor required in the switch to maintain desired throughput is explored and is presented in detail. Various simulation results are shown to study the behavior of the designed switch under uniform and hotspot traffic. The simulation results show that QoS behavior and the crossing traffic through the switch has not been affected by hotspots
Design and implementation of high-speed arbiter for large scale VOQ crossbar switches
Crossbars are frequently used as the switching fabric for high-performance packet switches (IP routers, ATM switches, Ethernet switches). The performance, functionality, and scalability (in terms of line rate and/or number of ports) of these switches are directly related to the arbitration/scheduling algorithm which must retrieve the state information of input queues, compute a (pseudo-) optimum matching, and configure the crossbar accordingly, all within one packet cycle. In this paper, we give a detailed hardware design and implementation of a novel arbitration scheme, named RDSRR [1], that lends itself well to high-speed implementation, while at the same time achieves excellent performance under a variety of traffic patterns. We present a novel pipeline technique and the full-custom design of the arbiter circuit using TSMC 0.25 m CMOS technology which can support switch sizes of up to 256 x 256 at a line rate of 10 Gbps
Mesh-of-Trees Interconnection Network for an Explicitly Multi-Threaded Parallel Computer Architecture
As the multiple-decade long increase in clock rates starts to
slow down, main-stream general-purpose processors evolve towards
single-chip parallel processing.
On-chip interconnection networks are essential components of such
machines, supporting the communication between processors and
the memory system.
This task is especially challenging for some easy-to-program
parallel computers, which are designed with performance-demanding
memory systems.
This study proposes an interconnection network, with
a novel implementation of the Mesh-of-Trees (MoT) topology.
The MoT network is evaluated relative to metrics such as wire area
complexity, total register
count, bandwidth, network diameter, single switch delay, maximum
throughput per area, trade-offs between
throughput and latency, and post-layout performance.
It is also compared with some other traditional
network topologies, such as mesh, ring, hypercube, butterfly, fat
trees, butterfly fat trees, and replicated butterfly
networks.
Concrete results show that MoT provides
higher throughput and lower latency especially when the input
traffic (or the on-chip parallelism) is high, at comparable
area cost.
The layout of MoT network is evaluated using standard cell design
methodology. A prototype chip with 8-terminal MoT network
was taped out at technology and tested.
In the context of an easy-to-program single-chip parallel processor,
MoT network is
embedded in the eXplicit Multi-Threading (XMT) architecture, and
evaluated by running parallel applications.
In addition to the basic MoT architecture,
a novel hybrid extension of MoT is proposed, which allows
significant area savings with a small reduction in throughput