17 research outputs found

    Architecture design and performance analysis of practical buffered-crossbar packet switches

    Get PDF
    Combined input crosspoint buffered (CICB) packet switches were introduced to relax inputoutput arbitration timing and provide high throughput under admissible traffic. However, the amount of memory required in the crossbar of an N x N switch is N2x k x L, where k is the crosspoint buffer size and needs to be of size RTT in cells, L is the packet size. RTT is the round-trip time which is defined by the distance between line cards and switch fabric. When the switch size is large or RTT is not negligible, the memory amount required makes the implementation costly or infeasible for buffered crossbar switches. To reduce the required memory amount, a family of shared memory combined-input crosspoint-buffered (SMCB) packet switches, where the crosspoint buffers are shared among inputs, are introduced in this thesis. One of the proposed switches uses a memory speedup of in and dynamic memory allocation, and the other switch avoids speedup by arbitrating the access of inputs to the crosspoint buffers. These two switches reduce the required memory of the buffered crossbar by 50% or more and achieve equivalent throughput under independent and identical traffic with uniform distributions when using random selections. The proposed mSMCB switch is extended to support differentiated services and long RTT. To support P traffic classes with different priorities, CICB switches have been reported to use N2x k x L x P amount of memory to avoid blocking of high priority cells.The proposed SMCB switch with support for differentiated services requires 1/mP of the memory amount in the buffered crossbar and achieves similar throughput performance to that of a CICB switch with similar priority management, while using no speedup in the shared memory. The throughput performance of SMCB switch with crosspoint buffers shared by inputs (I-SMCB) is studied under multicast traffic. An output-based shared-memory crosspoint buffered (O-SMCB) packet switch is proposed where the crosspoint buffers are shared by two outputs and use no speedup. The proposed O-SMCB switch provides high performance under admissible uniform and nonuniform multicast traffic models while using 50% of the memory used in CICB switches. Furthermore, the O-SMCB switch provides higher throughput than the I-SMCB switch. As SMCB switches can efficiently support an RTT twice as long as that supported by CICB switches and as the performance of SMCB switches is bounded by a matching between inputs and crosspoint buffers, a new family of CICB switches with flexible access to crosspoint buffers are proposed to support longer RTTs than SMCB switches and to provide higher throughput under a wide variety of admissible traffic models. The CICB switches with flexible access allow an input to use any available crosspoint buffer at a given output. The proposed switches reduce the required crosspoint buffer size by a factor of N , keep the service of cells in sequence, and use no speedup. This new class of switches achieve higher throughput performance than CICB switches under a large variety of traffic models, while supporting long RTTs. Crosspoint buffered switches that are implemented in single chips have limited scalability. To support a large number of ports in crosspoint buffered switches, memory-memory-memory (MMM) Clos-network switches are an alternative. The MMM switches that use minimum memory amount at the central module is studied. Although, this switch can provide a moderate throughput, MMM switch may serve cells out of sequence. As keeping cells in sequence in an MMM switch may require buffers be distributed per flow, an MMM with extended memory in the switch modules is studied. To solve the out of sequence problem in MMM switches, a queuing architecture is proposed for an MMM switch. The service of cells in sequence is analyzed

    On-board B-ISDN fast packet switching architectures. Phase 2: Development. Proof-of-concept architecture definition report

    Get PDF
    For the next-generation packet switched communications satellite system with onboard processing and spot-beam operation, a reliable onboard fast packet switch is essential to route packets from different uplink beams to different downlink beams. The rapid emergence of point-to-point services such as video distribution, and the large demand for video conference, distributed data processing, and network management makes the multicast function essential to a fast packet switch (FPS). The satellite's inherent broadcast features gives the satellite network an advantage over the terrestrial network in providing multicast services. This report evaluates alternate multicast FPS architectures for onboard baseband switching applications and selects a candidate for subsequent breadboard development. Architecture evaluation and selection will be based on the study performed in phase 1, 'Onboard B-ISDN Fast Packet Switching Architectures', and other switch architectures which have become commercially available as large scale integration (LSI) devices

    BOOM: Broadcast Optimizations for On-chip Meshes

    Get PDF
    Future many-core chips will require an on-chip network that can support broadcasts and multicasts at good power-performance. A vanilla on-chip network would send multiple unicast packets for each broadcast packet, resulting in latency, throughput and power overheads. Recent research in on-chip multicast support has proposed forking of broadcast/multicast packets within the network at the router buffers, but these techniques are far from ideal, since they increase buffer occupancy which lowers throughput, and packets incur delay and power penalties at each router. In this work, we analyze an ideal broadcast mesh; show the substantial gaps between state-of-the-art multicast NoCs and the ideal; then propose BOOM, which comprises a WHIRL routing protocol that ideally load balances broadcast traffic, a mXbar multicast crossbar circuit that enables multicast traversal at similar energy-delay as unicasts, and speculative bypassing of buffering for multicast flits. Together, they enable broadcast packets to approach the delay, energy, and throughput of the ideal fabric. Our simulations show BOOM realizing an average network latency that is 5% off ideal, attaining 96% of ideal throughput, with energy consumption that is 9% above ideal. Evaluations using synthetic traffic show BOOM achieving a latency reduction of 61%, throughput improvement of 63%, and buffer power reduction of 80% as compared to a baseline broadcast. Simulations with PARSEC benchmarks show BOOM reducing average request and network latency by 40% and 15% respectively

    Configurable data center switch architectures

    Get PDF
    In this thesis, we explore alternative architectures for implementing con_gurable Data Center Switches along with the advantages that can be provided by such switches. Our first contribution centers around determining switch architectures that can be implemented on Field Programmable Gate Array (FPGA) to provide configurable switching protocols. In the process, we identify a gap in the availability of frameworks to realistically evaluate the performance of switch architectures in data centers and contribute a simulation framework that relies on realistic data center traffic patterns. Our framework is then used to evaluate the performance of currently existing as well as newly proposed FPGA-amenable switch designs. Through collaborative work with Meng and Papaphilippou, we establish that only small-medium range switches can be implemented on today's FPGAs. Our second contribution is a novel switch architecture that integrates a custom in-network hardware accelerator with a generic switch to accelerate Deep Neural Network training applications in data centers. Our proposed accelerator architecture is prototyped on an FPGA, and a scalability study is conducted to demonstrate the trade-offs of an FPGA implementation when compared to an ASIC implementation. In addition to the hardware prototype, we contribute a light weight load-balancing and congestion control protocol that leverages the unique communication patterns of ML data-parallel jobs to enable fair sharing of network resources across different jobs. Our large-scale simulations demonstrate the ability of our novel switch architecture and light weight congestion control protocol to both accelerate the training time of machine learning jobs by up to 1.34x and benefit other latency-sensitive applications by reducing their 99%-tile completion time by up to 4.5x. As for our final contribution, we identify the main requirements of in-network applications and propose a Network-on-Chip (NoC)-based architecture for supporting a heterogeneous set of applications. Observing the lack of tools to support such research, we provide a tool that can be used to evaluate NoC-based switch architectures.Open Acces

    Applications of satellite technology to broadband ISDN networks

    Get PDF
    Two satellite architectures for delivering broadband integrated services digital network (B-ISDN) service are evaluated. The first is assumed integral to an existing terrestrial network, and provides complementary services such as interconnects to remote nodes as well as high-rate multicast and broadcast service. The interconnects are at a 155 Mbs rate and are shown as being met with a nonregenerative multibeam satellite having 10-1.5 degree spots. The second satellite architecture focuses on providing private B-ISDN networks as well as acting as a gateway to the public network. This is conceived as being provided by a regenerative multibeam satellite with on-board ATM (asynchronous transfer mode) processing payload. With up to 800 Mbs offered, higher satellite EIRP is required. This is accomplished with 12-0.4 degree hopping beams, covering a total of 110 dwell positions. It is estimated the space segment capital cost for architecture one would be about 190Mwhereasthesecondarchitecturewouldbeabout190M whereas the second architecture would be about 250M. The net user cost is given for a variety of scenarios, but the cost for 155 Mbs services is shown to be about $15-22/minute for 25 percent system utilization

    Analyzing Traffic and Multicast Switch Issues in an ATM Network.

    Get PDF
    This dissertation attempts to solve two problems related to an ATM network. First, we consider packetized voice and video sources as the incoming traffic to an ATM multiplexer and propose modeling methods for both individual and aggregated traffic sources. These methods are, then, used to analyze performance parameters such as buffer occupancy, cell loss probability, and cell delay. Results, thus obtained, for different buffer sizes and number of voice and video sources are analyzed and compared with those generated from existing techniques. Second, we study the priority handling feature for time critical services in an ATM multicast switch. For this, we propose a non-blocking copy network and priority handling algorithms. We, then, analyze the copy network using an analytical method and simulation. The analysis utilizes both priority and non-priority cells for two different output reservation schemes. The performance parameters, based on cell delay, delay jitter, and cell loss probability, are studied for different buffer sizes and fan-outs under various input traffic loads. Our results show that the proposed copy network provides a better performance for the priority cells while the performance for the non-priority cells is slightly inferior in comparison with the scenario when the network does not consider priority handling. We also study the fault-tolerant behavior of the copy network, specially for the broadcast banyan network subsection, and present a routing scheme considering the non-blocking property under a specific pattern of connection assignments. A fault tolerant characteristic can be quantified using the full access probability. The computation of the full access probability for a general network is known to be NP-hard. We, therefore, provide a new bounding technique utilizing the concept of minimal cuts to compute full access probability of the copy network. Our study for the fault-tolerant multi-stage interconnection network having either an extra stage or chaining shows that the proposed technique provides tighter bounds as compared to those given by existing approaches. We also apply our bounding method to compute full access probability of the fault-tolerant copy network

    On-board B-ISDN fast packet switching architectures. Phase 1: Study

    Get PDF
    The broadband integrate services digital network (B-ISDN) is an emerging telecommunications technology that will meet most of the telecommunications networking needs in the mid-1990's to early next century. The satellite-based system is well positioned for providing B-ISDN service with its inherent capabilities of point-to-multipoint and broadcast transmission, virtually unlimited connectivity between any two points within a beam coverage, short deployment time of communications facility, flexible and dynamic reallocation of space segment capacity, and distance insensitive cost. On-board processing satellites, particularly in a multiple spot beam environment, will provide enhanced connectivity, better performance, optimized access and transmission link design, and lower user service cost. The following are described: the user and network aspects of broadband services; the current development status in broadband services; various satellite network architectures including system design issues; and various fast packet switch architectures and their detail designs

    Distributed call set-up algorithms in BISDN environment.

    Get PDF
    by Shum Kam Hong.Thesis (M.Phil.)--Chinese University of Hong Kong, 1992.Includes bibliographical references (leaves 125-131).Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Outline of the thesis --- p.6Chapter 1.3 --- Current Art in Packet Switching --- p.9Chapter 2 --- Management of Control Information --- p.17Chapter 2.1 --- Inter-node Exchange of Link Congestion Status --- p.21Chapter 2.2 --- Consistency of Control Information --- p.24Chapter 2.3 --- Alternate Format of Control Information --- p.26Chapter 3 --- Traffic Flow Control --- p.29Chapter 3.1 --- Control of Traffic Influx into the Network --- p.29Chapter 3.2 --- Control of Traffic Loading from the Node --- p.30Chapter 3.3 --- Flow Control for Connection Oriented Traffic --- p.32Chapter 3.4 --- Judgement of Link Status --- p.38Chapter 3.5 --- Starvation-free and Deadlock-free --- p.42Chapter 4 --- Call Set-up Algorithm Traffic Modelling --- p.47Chapter 4.1 --- Basic Algorithm --- p.47Chapter 4.2 --- Minimization of Bandwidth Overhead --- p.48Chapter 4.3 --- Two-way Transmission --- p.51Chapter 4.4 --- Traffic Modelling --- p.52Chapter 4.4.1 --- Aggregate Traffic Models --- p.53Chapter 4.4.2 --- Traffic Burstiness --- p.57Chapter 5 --- Parameters Tuning and Analysis --- p.76Chapter 5.1 --- Scheme I : Scout Pumping --- p.76Chapter 5.2 --- Scheme II : Speed-up Scout Pumping --- p.85Chapter 5.3 --- Blocking Probability --- p.90Chapter 5.4 --- Scout Stream Collision --- p.92Chapter 6 --- Simulation Modelling & Performance Evaluation --- p.96Chapter 6.1 --- The Network Simulator --- p.96Chapter 6.1.1 --- Simulation Event Scheduling --- p.97Chapter 6.1.2 --- Input Traffic Regulation --- p.100Chapter 6.1.3 --- Actual Offered Load --- p.101Chapter 6.1.4 --- Static and Dynamic Parameters --- p.103Chapter 6.2 --- Simulation Results --- p.107Chapter 7 --- Conclusions --- p.123Chapter A --- List of Symbols --- p.13

    The TurboLAN project. Phase 1: Protocol choices for high speed local area networks. Phase 2: TurboLAN Intelligent Network Adapter Card, (TINAC) architecture

    Get PDF
    The hardware and the software architecture of the TurboLAN Intelligent Network Adapter Card (TINAC) are described. A high level as well as detailed treatment of the workings of various components of the TINAC are presented. The TINAC is divided into the following four major functional units: (1) the network access unit (NAU); (2) the buffer management unit; (3) the host interface unit; and (4) the node processor unit

    A formalism for describing and simulating systems with interacting components.

    Get PDF
    This thesis addresses the problem of descriptive complexity presented by systems involving a high number of interacting components. It investigates the evaluation measure of performability and its application to such systems. A new description and simulation language, ICE and it's application to performability modelling is presented. ICE (Interacting ComponEnts) is based upon an earlier description language which was first proposed for defining reliability problems. ICE is declarative in style and has a limited number of keywords. The ethos in the development of the language has been to provide an intuitive formalism with a powerful descriptive space. The full syntax of the language is presented with discussion as to its philosophy. The implementation of a discrete event simulator using an ICE interface is described, with use being made of examples to illustrate the functionality of the code and the semantics of the language. Random numbers are used to provide the required stochastic behaviour within the simulator. The behaviour of an industry standard generator within the simulator and different methods of number allocation are shown. A new generator is proposed that is a development of a fast hardware shift register generator and is demonstrated to possess good statistical properties and operational speed. For the purpose of providing a rigorous description of the language and clarification of its semantics, a computational model is developed using the formalism of extended coloured Petri nets. This model also gives an indication of the language's descriptive power relative to that of a recognised and well developed technique. Some recognised temporal and structural problems of system event modelling are identified. and ICE solutions given. The growing research area of ATM communication networks is introduced and a sophisticated top down model of an ATM switch presented. This model is simulated and interesting results are given. A generic ICE framework for performability modelling is developed and demonstrated. This is considered as a positive contribution to the general field of performability research
    corecore