212 research outputs found

    Does it hurt when others prosper?: Exploring the impact of heterogeneous reordering robustness of TCP

    Get PDF
    The congestion control mechanisms in the standardized Transmission Control Protocol (TCP) may misinterpret packet reordering as congestive loss, leading to spurious congestion response and under-utilization of network capacity. Therefore, many TCP enhancements have been proposed to better differentiate between packet reordering and congestive loss, in order to enhance the reordering robustness (RR) of TCP. Since such enhancements are incrementally deployed, it is important to study the interactions of TCP flows with heterogeneous RR. This paper presents the first systematic study of such interactions by exploring how changing RR of TCP flows influences the bandwidth sharing among these flows. We define the quantified RR (QRR) of a TCP flow as the probability that packet reordering causes congestion response. We analyze the variation of bandwidth sharing as QRR changes. This leads to the discovery of several interesting properties. Most notably, we discover the counter-intuitive result that changing one flow's QRR does not affect its competing flows in certain network topologies. We further characterize the deviation, from the ideal case of bandwidth sharing, as RR changes. We find that enhancing RR of a flow may increase, rather than decrease, the deviation in some typical network scenarios. © 2013 IEEE.published_or_final_versio

    Parallel network protocol stacks using replication

    Get PDF
    Computing applications demand good performance from networking systems. This includes high-bandwidth communication using protocols with sophisticated features such as ordering, reliability, and congestion control. Much of this protocol processing occurs in software, both on desktop systems and servers. Multi-processing is a requirement on today\u27s computer architectures because their design does not allow for increased processor frequencies. At the same time, network bandwidths continue to increase. In order to meet application demand for throughput, protocol processing must be parallel to leverage the full capabilities of multi-processor or multi-core systems. Existing parallelization strategies have performance difficulties that limit their scalability and their application to single, high-speed data streams. This dissertation introduces a new approach to parallelizing network protocol processing without the need for locks or for global state. Rather than maintain global states, each processor maintains its own copy of protocol state. Therefore, updates are local and don\u27t require fine-grained locks or explicit synchronization. State management work is replicated, but logically independent work is parallelized. Along with the approach, this dissertation describes Dominoes, a new framework for implementing replicated processing systems. Dominoes organizes the state information into Domains and the communication into Channels. These two abstractions provide a powerful, but flexible model for testing the replication approach. This dissertation uses Dominoes to build a replicated network protocol system. The performance of common protocols, such as TCP/IP, is increased by multiprocessing single connections. On commodity hardware, throughput increases between 15-300% depending on the type of communication. Most gains are possible when communicating with unmodified peer implementations, such as Linux. In addition to quantitative results, protocol behavior is studied as it relates to the replication approach

    PABO: Mitigating Congestion via Packet Bounce in Data Center Networks

    Get PDF
    In today's data center, a diverse mix of throughput-sensitive long flows and delay-sensitive short flows are commonly presented in shallow-buffered switches. Long flows could potentially block the transmission of delay-sensitive short flows, leading to degraded performance. Congestion can also be caused by the synchronization of multiple TCP connections for short flows, as typically seen in the partition/aggregate traffic pattern. While multiple end-to-end transport-layer solutions have been proposed, none of them have tackled the real challenge: reliable transmission in the network. In this paper, we fill this gap by presenting PABO -- a novel link-layer design that can mitigate congestion by temporarily bouncing packets to upstream switches. PABO's design fulfills the following goals: i) providing per-flow based flow control on the link layer, ii) handling transient congestion without the intervention of end devices, and iii) gradually back propagating the congestion signal to the source when the network is not capable to handle the congestion.Experiment results show that PABO can provide prominent advantage of mitigating transient congestions and can achieve significant gain on end-to-end delay

    Terabit Burst Switching Final Report

    Get PDF
    This is the final report For Washington University\u27s Terabit Burst Switching Project, supported by DARPA and Rome Air Force Laboratory. The primary objective of the project has been to demonstrate the feasibility of Burst Switching, a new data communication service, which seeks to more effectively exploit the large bandwidths becoming available in WDM transmission systems. Burst switching systems dynamically assign data bursts to channels in optical datalinks, using routing information carried in parallel control channels

    Study on the Performance of TCP over 10Gbps High Speed Networks

    Get PDF
    Internet traffic is expected to grow phenomenally over the next five to ten years. To cope with such large traffic volumes, high-speed networks are expected to scale to capacities of terabits-per-second and beyond. Increasing the role of optics for packet forwarding and transmission inside the high-speed networks seems to be the most promising way to accomplish this capacity scaling. Unfortunately, unlike electronic memory, it remains a formidable challenge to build even a few dozen packets of integrated all-optical buffers. On the other hand, many high-speed networks depend on the TCP/IP protocol for reliability which is typically implemented in software and is sensitive to buffer size. For example, TCP requires a buffer size of bandwidth delay product in switches/routers to maintain nearly 100\% link utilization. Otherwise, the performance will be much downgraded. But such large buffer will challenge hardware design and power consumption, and will generate queuing delay and jitter which again cause problems. Therefore, improve TCP performance over tiny buffered high-speed networks is a top priority. This dissertation studies the TCP performance in 10Gbps high-speed networks. First, a 10Gbps reconfigurable optical networking testbed is developed as a research environment. Second, a 10Gbps traffic sniffing tool is developed for measuring and analyzing TCP performance. New expressions for evaluating TCP loss synchronization are presented by carefully examining the congestion events of TCP. Based on observation, two basic reasons that cause performance problems are studied. We find that minimize TCP loss synchronization and reduce flow burstiness impact are critical keys to improve TCP performance in tiny buffered networks. Finally, we present a new TCP protocol called Multi-Channel TCP and a new congestion control algorithm called Desynchronized Multi-Channel TCP (DMCTCP). Our algorithm implementation takes advantage of a potential parallelism from the Multi-Path TCP in Linux. Over an emulated 10Gbps network ruled by routers with only a few dozen packets of buffers, our experimental results confirm that bottleneck link utilization can be much better improved by DMCTCP than by many other TCP variants. Our study is a new step towards the deployment of optical packet switching/routing networks

    Branch Prediction For Network Processors

    Get PDF
    Originally designed to favour flexibility over packet processing performance, the future of the programmable network processor is challenged by the need to meet both increasing line rate as well as providing additional processing capabilities. To meet these requirements, trends within networking research has tended to focus on techniques such as offloading computation intensive tasks to dedicated hardware logic or through increased parallelism. While parallelism retains flexibility, challenges such as load-balancing limit its scope. On the other hand, hardware offloading allows complex algorithms to be implemented at high speed but sacrifice flexibility. To this end, the work in this thesis is focused on a more fundamental aspect of a network processor, the data-plane processing engine. Performing both system modelling and analysis of packet processing functions; the goal of this thesis is to identify and extract salient information regarding the performance of multi-processor workloads. Following on from a traditional software based analysis of programme workloads, we develop a method of modelling and analysing hardware accelerators when applied to network processors. Using this quantitative information, this thesis proposes an architecture which allows deeply pipelined micro-architectures to be implemented on the data-plane while reducing the branch penalty associated with these architectures

    Rationale, Scenarios, and Profiles for the Application of the Internet Protocol Suite (IPS) in Space Operations

    Get PDF
    This greenbook captures some of the current, planned and possible future uses of the Internet Protocol (IP) as part of Space Operations. It attempts to describe how the Internet Protocol is used in specific scenarios. Of primary focus is low-earth-orbit space operations, which is referred to here as the design reference mission (DRM). This is because most of the program experience drawn upon derives from this type of mission. Application profiles are provided. This includes parameter settings programs have proposed for sending IP datagrams over CCSDS links, the minimal subsets and features of the IP protocol suite and applications expected for interoperability between projects, and the configuration, operations and maintenance of these IP functions. Of special interest is capturing the lessons learned from the Constellation Program in this area, since that program included a fairly ambitious use of the Internet Protocol

    Automatic synthesis and optimization of chip multiprocessors

    Get PDF
    The microprocessor technology has experienced an enormous growth during the last decades. Rapid downscale of the CMOS technology has led to higher operating frequencies and performance densities, facing the fundamental issue of power dissipation. Chip Multiprocessors (CMPs) have become the latest paradigm to improve the power-performance efficiency of computing systems by exploiting the parallelism inherent in applications. Industrial and prototype implementations have already demonstrated the benefits achieved by CMPs with hundreds of cores.CMP architects are challenged to take many complex design decisions. Only a few of them are:- What should be the ratio between the core and cache areas on a chip?- Which core architectures to select?- How many cache levels should the memory subsystem have?- Which interconnect topologies provide efficient on-chip communication?These and many other aspects create a complex multidimensional space for architectural exploration. Design Automation tools become essential to make the architectural exploration feasible under the hard time-to-market constraints. The exploration methods have to be efficient and scalable to handle future generation on-chip architectures with hundreds or thousands of cores.Furthermore, once a CMP has been fabricated, the need for efficient deployment of the many-core processor arises. Intelligent techniques for task mapping and scheduling onto CMPs are necessary to guarantee the full usage of the benefits brought by the many-core technology. These techniques have to consider the peculiarities of the modern architectures, such as availability of enhanced power saving techniques and presence of complex memory hierarchies.This thesis has several objectives. The first objective is to elaborate the methods for efficient analytical modeling and architectural design space exploration of CMPs. The efficiency is achieved by using analytical models instead of simulation, and replacing the exhaustive exploration with an intelligent search strategy. Additionally, these methods incorporate high-level models for physical planning. The related contributions are described in Chapters 3, 4 and 5 of the document.The second objective of this work is to propose a scalable task mapping algorithm onto general-purpose CMPs with power management techniques, for efficient deployment of many-core systems. This contribution is explained in Chapter 6 of this document.Finally, the third objective of this thesis is to address the issues of the on-chip interconnect design and exploration, by developing a model for simultaneous topology customization and deadlock-free routing in Networks-on-Chip. The developed methodology can be applied to various classes of the on-chip systems, ranging from general-purpose chip multiprocessors to application-specific solutions. Chapter 7 describes the proposed model.The presented methods have been thoroughly tested experimentally and the results are described in this dissertation. At the end of the document several possible directions for the future research are proposed