164 research outputs found

    Network Coding in a Multicast Switch

    Full text link
    We consider the problem of serving multicast flows in a crossbar switch. We show that linear network coding across packets of a flow can sustain traffic patterns that cannot be served if network coding were not allowed. Thus, network coding leads to a larger rate region in a multicast crossbar switch. We demonstrate a traffic pattern which requires a switch speedup if coding is not allowed, whereas, with coding the speedup requirement is eliminated completely. In addition to throughput benefits, coding simplifies the characterization of the rate region. We give a graph-theoretic characterization of the rate region with fanout splitting and intra-flow coding, in terms of the stable set polytope of the 'enhanced conflict graph' of the traffic pattern. Such a formulation is not known in the case of fanout splitting without coding. We show that computing the offline schedule (i.e. using prior knowledge of the flow arrival rates) can be reduced to certain graph coloring problems. Finally, we propose online algorithms (i.e. using only the current queue occupancy information) for multicast scheduling based on our graph-theoretic formulation. In particular, we show that a maximum weighted stable set algorithm stabilizes the queues for all rates within the rate region.Comment: 9 pages, submitted to IEEE INFOCOM 200

    Fastpass: A Centralized “Zero-Queue” Datacenter Network

    Get PDF
    An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) avoidance. Current datacenter networks inherit the principles that went into the design of the Internet, where packet transmission and path selection decisions are distributed among the endpoints and routers. Instead, we propose that each sender should delegate control—to a centralized arbiter—of when each packet should be transmitted and what path it should follow. This paper describes Fastpass, a datacenter network architecture built using this principle. Fastpass incorporates two fast algorithms: the first determines the time at which each packet should be transmitted, while the second determines the path to use for that packet. In addition, Fastpass uses an efficient protocol between the endpoints and the arbiter and an arbiter replication strategy for fault-tolerant failover. We deployed and evaluated Fastpass in a portion of Facebook’s datacenter network. Our results show that Fastpass achieves high throughput comparable to current networks at a 240 reduction is queue lengths (4.35 Mbytes reducing to 18 Kbytes), achieves much fairer and consistent flow throughputs than the baseline TCP (5200 reduction in the standard deviation of per-flow throughput with five concurrent connections), scalability from 1 to 8 cores in the arbiter implementation with the ability to schedule 2.21 Terabits/s of traffic in software on eight cores, and a 2.5 reduction in the number of TCP retransmissions in a latency-sensitive service at Facebook.National Science Foundation (U.S.) (grant IIS-1065219)Irwin Mark Jacobs and Joan Klein Jacobs Presidential FellowshipHertz Foundation (Fellowship

    Deadline-ordered burst-based parallel scheduling strategy for IP-over-ATM with QoS support.

    Get PDF
    Siu Chun.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 66-68).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Thesis Overview --- p.3Chapter 2 --- Background and Related work --- p.4Chapter 2.1 --- Emergence of IP-over-ATM --- p.4Chapter 2.2 --- ATM architecture --- p.5Chapter 2.3 --- Scheduling issues in output-queued switch --- p.6Chapter 2.4 --- Scheduling issues in input-queued switch --- p.18Chapter 3 --- The Deadline-ordered Burst-based Parallel Scheduling Strategy --- p.23Chapter 3.1 --- Introduction --- p.23Chapter 3.2 --- Switch and queueing model --- p.24Chapter 3.2.1 --- Switch model --- p.24Chapter 3.2.2 --- Queueing model --- p.25Chapter 3.3 --- The DBPS Strategy --- p.26Chapter 3.3.1 --- Motivation --- p.26Chapter 3.3.2 --- Strategy --- p.31Chapter 3.4 --- The Deadline-ordered Burst-based Parallel Iterative Matching --- p.33Chapter 3.4.1 --- Algorithm --- p.34Chapter 3.4.2 --- An example of DBPIM --- p.35Chapter 3.5 --- Simulation results --- p.33Chapter 3.6 --- Discussions --- p.46Chapter 3.7 --- Future work --- p.47Chapter 4 --- The Quasi-static DBPIM Algorithm --- p.50Chapter 4.1 --- Introduction --- p.50Chapter 4.2 --- Quasi-static path scheduling principle --- p.51Chapter 4.3 --- Quasi-static DBPIM algorithm --- p.56Chapter 4.4 --- An example of Quasi-static DBPIM --- p.59Chapter 5 --- Conclusion --- p.63Bibliography --- p.6

    Scheduling algorithms for throughput maximization in data networks

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 215-226).This thesis considers the performance implications of throughput optimal scheduling in physically and computationally constrained data networks. We study optical networks, packet switches, and wireless networks, each of which has an assortment of features and constraints that challenge the design decisions of network architects. In this work, each of these network settings are subsumed under a canonical model and scheduling framework. Tools of queueing analysis are used to evaluate network throughput properties, and demonstrate throughput optimality of scheduling and routing algorithms under stochastic traffic. Techniques of graph theory are used to study network topologies having desirable throughput properties. Combinatorial algorithms are proposed for efficient resource allocation. In the optical network setting, the key enabling technology is wavelength division multiplexing (WDM), which allows each optical fiber link to simultaneously carry a large number of independent data streams at high rate. To take advantage of this high data processing potential, engineers and physicists have developed numerous technologies, including wavelength converters, optical switches, and tunable transceivers.(cont.) While the functionality provided by these devices is of great importance in capitalizing upon the WDM resources, a major challenge exists in determining how to configure these devices to operate efficiently under time-varying data traffic. In the WDM setting, we make two main contributions. First, we develop throughput optimal joint WDM reconfiguration and electronic-layer routing algorithms, based on maxweight scheduling. To mitigate the service disruption associated with WDM reconfiguration, our algorithms make decisions at frame intervals. Second, we develop analytic tools to quantify the maximum throughput achievable in general network settings. Our approach is to characterize several geometric features of the maximum region of arrival rates that can be supported in the network. In the packet switch setting, we observe through numerical simulation the attractive throughput properties of a simple maximal weight scheduler. Subsequently, we consider small switches, and analytically demonstrate the attractive throughput properties achievable using maximal weight scheduling. We demonstrate that such throughput properties may not be sustained in larger switches.(cont.) In the wireless network setting, mesh networking is a promising technology for achieving connectivity in local and metropolitan area networks. Wireless access points and base stations adhering to the IEEE 802.11 wireless networking standard can be bought off the shelf at little cost, and can be configured to access the Internet in minutes. With ubiquitous low-cost Internet access perceived to be of tremendous societal value, such technology is naturally garnering strong interest. Enabling such wireless technology is thus of great importance. An important challenge in enabling mesh networks, and many other wireless network applications, results from the fact that wireless transmission is achieved by broadcasting signals through the air, which has the potential for interfering with other parts of the network. Furthermore, the scarcity of wireless transmission resources implies that link activation and packet routing should be effected using simple distributed algorithms. We make three main contributions in the wireless setting. First, we determine graph classes under which simple, distributed, maximal weight schedulers achieve throughput optimality.(cont.) Second, we use this acquired knowledge of graph classes to develop combinatorial algorithms, based on matroids, for allocating channels to wireless links, such that each channel can achieve maximum throughput using simple distributed schedulers. Third, we determine new conditions under which distributed algorithms for joint link activation and routing achieve throughput optimality.by Andrew Brzezinski.Ph.D

    Design and Implementation of a Multi-Class Network Architecture for Hardware Neural Networks

    Get PDF
    Die vorliegende Arbeit beschreibt den Entwurf und die Implementierung einer Netzwerkarchitektur, welche Techniken von leitungsvermittelnden und paketvermittelnden Netzwerken verbindet, um zwei verschiedene Dienstgüten anzubieten: isochrone Verbindungen und paketbasierte Verbindungen mit bestmöglicher Zustellung. Isochrone Verbindungen verwenden reservierte Netzwerkresourcen, um eine verlustfreie Übertragung sowie eine niedrige Ende-zu-Ende Verzögerung mit begrenzter Varianz zu garantieren. Die Synchronisierung aller Netzwerkknoten sowie die Berechnung einer kompakten Reservierungsbelegung werden durch effiziente Algorithmen gelöst. Paketbasierte Übertragungen verwenden die verbleibende Bandbreite. Das Multiplexen beider Verkehrsklassen wird von einem neuartigen Bypass-Switch geleistet, der skalierbar ist in der Anzahl der Schnittstellen sowie in der externen Bandbreite und ohne eine interne Beschleunigung auskommt. Die Netzwerkarchitektur kommt in der Forschung innerhalb des FACETS Projektes mit großskaligen künstlichen neuronalen Netzen in Hardware zum Einsatz, für die Vernetzung eines verteilten Systems aus VLSI neuronalen Netzen. Axonale Verbindungen zwischen Neuronen werden mit Hilfe von isochronen Verbindungen modelliert, wohingegen paketbasierte Übertragung die Grundlage für eine systemweite gemeinsame Speicherarchitektur bildet. Der zur Laufzeit ausgeführte Teil des Netzwerkes ist in programmierbarer Logik implementiert und arbeitet mit einer externen Übertragungsrate von 3.125 Gbit/s. Die Arbeit diskutiert die anwendungsbezogenen Anforderungen an das Netzwerk, sowie dessen Entwurf und Referenzimplementierung in programmierbarer Logik und Software. Theoretische Überlegungen über die Leistungsfähigkeit werden durch Messungen und Simulationen verifiziert. Obwohl die Netzwerkarchitektur für die spezielle Anwendung mit neuronalen Netzen entworfen wurde, stellt sie eine generelle Lösung für alle Netzwerkumgebungen dar, welche isochrone Verbindungen und Paketvermittlung mit niedriger Komplexität benötigen. Die Architektur ist insbesondere für den Einsatz in der nächsten Stufe der Hardwareentwicklung des FACETS Projektes zur Vernetzung künstlicher neuronaler Netze auf Wafer-Ebene geeignet

    Multistage Packet-Switching Fabrics for Data Center Networks

    Get PDF
    Recent applications have imposed stringent requirements within the Data Center Network (DCN) switches in terms of scalability, throughput and latency. In this thesis, the architectural design of the packet-switches is tackled in different ways to enable the expansion in both the number of connected endpoints and traffic volume. A cost-effective Clos-network switch with partially buffered units is proposed and two packet scheduling algorithms are described. The first algorithm adopts many simple and distributed arbiters, while the second approach relies on a central arbiter to guarantee an ordered packet delivery. For an improved scalability, the Clos switch is build using a Network-on-Chip (NoC) fabric instead of the common crossbar units. The Clos-UDN architecture made with Input-Queued (IQ) Uni-Directional NoC modules (UDNs) simplifies the input line cards and obviates the need for the costly Virtual Output Queues (VOQs). It also avoids the need for complex, and synchronized scheduling processes, and offers speedup, load balancing, and good path diversity. Under skewed traffic, a reliable micro load-balancing contributes to boosting the overall network performance. Taking advantage of the NoC paradigm, a wrapped-around multistage switch with fully interconnected Central Modules (CMs) is proposed. The architecture operates with a congestion-aware routing algorithm that proactively distributes the traffic load across the switching modules, and enhances the switch performance under critical packet arrivals. The implementation of small on-chip buffers has been made perfectly feasible using the current technology. This motivated the implementation of a large switching architecture with an Output-Queued (OQ) NoC fabric. The design merges assets of the output queuing, and NoCs to provide high throughput, and smooth latency variations. An approximate analytical model of the switch performance is also proposed. To further exploit the potential of the NoC fabrics and their modularity features, a high capacity Clos switch with Multi-Directional NoC (MDN) modules is presented. The Clos-MDN switching architecture exhibits a more compact layout than the Clos-UDN switch. It scales better and faster in port count and traffic load. Results achieved in this thesis demonstrate the high performance, expandability and programmability features of the proposed packet-switches which makes them promising candidates for the next-generation data center networking infrastructure
    corecore