78 research outputs found

    Supercharged PlanetLab Platform Architecture

    Get PDF
    This report describes the Supercharged Planetlab Platform (SPP), a system designed as a prototype of an internet-scale overlay hosting platform. Overlay networks have become an important vehicle for delivering Internet applications. Overlay network nodes are typically implemented using general purpose servers or clusters. The SPP offers a more integrated architecture, combining general-purpose servers with high performance Network Processor (NP) subsystems. SPP nodes have recently been deployed as part of the Global Environment for Network Innovation (GENI) and are available for use by research users

    Abstractions and Algorithms for Control of Extensible and Heterogeneous Virtualized Network Infrastructures

    Get PDF
    Virtualized network infrastructures are currently deployed in both research and commercial contexts. The complexity of the virtualization layer varies greatly in different deployments, ranging from cloud computing environments, to carrier Ethernet applications using stacked VLANs, to networking testbeds. In all of these cases, many users are sharing the resources of one provider and each user expects their resources to be isolated from all other users. There are many challenges associated with the control and management of these systems, including resource allocation and sharing, resource isolation, system security, and usability. Among the different types of virtualized infrastructures, network testbeds are of particular interest due to their widespread use in education and in the networking research community. Networking researchers rely extensively on testbeds when evaluating new protocols and ideas. Indeed, a substantial percentage of top research papers include results gathered from testbeds. Network emulation testbeds in particular are often used to conduct innovative research because they allow users to emulate diverse network topologies in a controlled environment. That is, researchers run experiments with a collection of resources that can be reconfigured to represent many different network scenarios. The user typically has control over most of the resources in their experiment which results in a high level of reproducibility. As such, these types of testbeds provide an excellent bridge between simulation and deployment of new ideas. Unfortunately, most testbeds suffer from a general lack of resource extensibility and diversity. This dissertation extends the current state of the art by designing a new, more general testbed infrastructure that expands and enhances the capabilities of modern testbeds. This includes pertinent abstractions, software design, and related algorithms. The design has also been prototyped in the form of the Open Network Laboratory network testbed, which has been successfully used in educational and research pursuits. While the focus is on network testbeds, the results of this research will also be applicable to the broader class of virtualized system infrastructures

    Router-based algorithms for improving internet quality of service.

    Get PDF
    We begin this thesis by generalizing some results related to a recently proposed positive system model of TCP congestion control algorithms. Then, motivated by a mean ¯eld analysis of the positive system model, a novel, stateless, queue management scheme is designed: Multi-Level Comparisons with index l (MLC(l)). In the limit, MLC(l) enforces max-min fairness in a network of TCP flows. We go further, showing that counting past drops at a congested link provides su±cient information to enforce max-min fairness among long-lived flows and to reduce the flow completion times of short-lived flows. Analytical models are presented, and the accuracy of predictions are validated by packet level ns2 simulations. We then move our attention to e±cient measurement and monitoring techniques. A small active counter architecture is presented that addresses the problem of accurate approximation of statistics counter values at very-high speeds that can be both updated and estimated on a per-packet basis. These algorithms are necessary in the design of router-based flow control algorithms since on-chip Static RAM (SRAM) currently is a scarce resource, and being economical with its usage is an important task. A highly scalable method for heavy-hitter identifcation that uses our small active counters architecture is developed based on heuristic argument. Its performance is compared to several state-of-the-art algorithms and shown to out-perform them. In the last part of the thesis we discuss the delay-utilization tradeoff in the congested Internet links. While several groups of authors have recently analyzed this tradeoff, the lack of realistic assumption in their models and the extreme complexity in estimation of model parameters, reduces their applicability at real Internet links. We propose an adaptive scheme that regulates the available queue space to keep utilization at desired, high, level. As a consequence, in large-number-of-users regimes, sacrifcing 1-2% of bandwidth can result in queueing delays that are an order of magnitude smaller than in the standard BDP-bu®ering case. We go further and introduce an optimization framework for describing the problem of interest and propose an online algorithm for solving it

    Router-based algorithms for improving internet quality of service.

    Get PDF
    We begin this thesis by generalizing some results related to a recently proposed positive system model of TCP congestion control algorithms. Then, motivated by a mean ¯eld analysis of the positive system model, a novel, stateless, queue management scheme is designed: Multi-Level Comparisons with index l (MLC(l)). In the limit, MLC(l) enforces max-min fairness in a network of TCP flows. We go further, showing that counting past drops at a congested link provides su±cient information to enforce max-min fairness among long-lived flows and to reduce the flow completion times of short-lived flows. Analytical models are presented, and the accuracy of predictions are validated by packet level ns2 simulations. We then move our attention to e±cient measurement and monitoring techniques. A small active counter architecture is presented that addresses the problem of accurate approximation of statistics counter values at very-high speeds that can be both updated and estimated on a per-packet basis. These algorithms are necessary in the design of router-based flow control algorithms since on-chip Static RAM (SRAM) currently is a scarce resource, and being economical with its usage is an important task. A highly scalable method for heavy-hitter identifcation that uses our small active counters architecture is developed based on heuristic argument. Its performance is compared to several state-of-the-art algorithms and shown to out-perform them. In the last part of the thesis we discuss the delay-utilization tradeoff in the congested Internet links. While several groups of authors have recently analyzed this tradeoff, the lack of realistic assumption in their models and the extreme complexity in estimation of model parameters, reduces their applicability at real Internet links. We propose an adaptive scheme that regulates the available queue space to keep utilization at desired, high, level. As a consequence, in large-number-of-users regimes, sacrifcing 1-2% of bandwidth can result in queueing delays that are an order of magnitude smaller than in the standard BDP-bu®ering case. We go further and introduce an optimization framework for describing the problem of interest and propose an online algorithm for solving it

    Reducing Internet Latency : A Survey of Techniques and their Merit

    Get PDF
    Bob Briscoe, Anna Brunstrom, Andreas Petlund, David Hayes, David Ros, Ing-Jyh Tsang, Stein Gjessing, Gorry Fairhurst, Carsten Griwodz, Michael WelzlPeer reviewedPreprin

    Memory Management for Emerging Memory Technologies

    Get PDF
    The Memory Wall, or the gap between CPU speed and main memory latency, is ever increasing. The latency of Dynamic Random-Access Memory (DRAM) is now of the order of hundreds of CPU cycles. Additionally, the DRAM main memory is experiencing power, performance and capacity constraints that limit process technology scaling. On the other hand, the workloads running on such systems are themselves changing due to virtualization and cloud computing demanding more performance of the data centers. Not only do these workloads have larger working set sizes, but they are also changing the way memory gets used, resulting in higher sharing and increased bandwidth demands. New Non-Volatile Memory technologies (NVM) are emerging as an answer to the current main memory issues. This thesis looks at memory management issues as the emerging memory technologies get integrated into the memory hierarchy. We consider the problems at various levels in the memory hierarchy, including sharing of CPU LLC, traffic management to future non-volatile memories behind the LLC, and extending main memory through the employment of NVM. The first solution we propose is “Adaptive Replacement and Insertion" (ARI), an adaptive approach to last-level CPU cache management, optimizing the cache miss rate and writeback rate simultaneously. Our specific focus is to reduce writebacks as much as possible while maintaining or improving miss rate relative to conventional LRU replacement policy, with minimal hardware overhead. ARI reduces writebacks on benchmarks from SPEC2006 suite on average by 32.9% while also decreasing misses on average by 4.7%. In a PCM based memory system, this decreases energy consumption by 23% compared to LRU and provides a 49% lifetime improvement beyond what is possible with randomized wear-leveling. Our second proposal is “Variable-Timeslice Thread Scheduling" (VATS), an OS kernel-level approach to CPU cache sharing. With modern, large, last-level caches (LLC), the time to fill the LLC is greater than the OS scheduling window. As a result, when a thread aggressively thrashes the LLC by replacing much of the data in it, another thread may not be able to recover its working set before being rescheduled. We isolate the threads in time by increasing their allotted time quanta, and allowing larger periods of time between interfering threads. Our approach, compared to conventional scheduling, mitigates up to 100% of the performance loss caused by CPU LLC interference. The system throughput is boosted by up to 15%. As an unconventional approach to utilizing emerging memory technologies, we present a Ternary Content-Addressable Memory (TCAM) design with Flash transistors. TCAM is successfully used in network routing but can also be utilized in the OS Virtual Memory applications. Based on our layout and circuit simulation experiments, we conclude that our FTCAM block achieves an area improvement of 7.9× and a power improvement of 1.64× compared to a CMOS approach. In order to lower the cost of Main Memory in systems with huge memory demand, it is becoming practical to extend the DRAM in the system with the less-expensive NVMe Flash, for a much lower system cost. However, given the relatively high Flash devices access latency, naively using them as main memory leads to serious performance degradation. We propose OSVPP, a software-only, OS swap-based page prefetching scheme for managing such hybrid DRAM + NVM systems. We show that it is possible to gain about 50% of the lost performance due to swapping into the NVM and thus enable the utilization of such hybrid systems for memory-hungry applications, lowering the memory cost while keeping the performance comparable to the DRAM-only system

    Memory Management for Emerging Memory Technologies

    Get PDF
    The Memory Wall, or the gap between CPU speed and main memory latency, is ever increasing. The latency of Dynamic Random-Access Memory (DRAM) is now of the order of hundreds of CPU cycles. Additionally, the DRAM main memory is experiencing power, performance and capacity constraints that limit process technology scaling. On the other hand, the workloads running on such systems are themselves changing due to virtualization and cloud computing demanding more performance of the data centers. Not only do these workloads have larger working set sizes, but they are also changing the way memory gets used, resulting in higher sharing and increased bandwidth demands. New Non-Volatile Memory technologies (NVM) are emerging as an answer to the current main memory issues. This thesis looks at memory management issues as the emerging memory technologies get integrated into the memory hierarchy. We consider the problems at various levels in the memory hierarchy, including sharing of CPU LLC, traffic management to future non-volatile memories behind the LLC, and extending main memory through the employment of NVM. The first solution we propose is “Adaptive Replacement and Insertion" (ARI), an adaptive approach to last-level CPU cache management, optimizing the cache miss rate and writeback rate simultaneously. Our specific focus is to reduce writebacks as much as possible while maintaining or improving miss rate relative to conventional LRU replacement policy, with minimal hardware overhead. ARI reduces writebacks on benchmarks from SPEC2006 suite on average by 32.9% while also decreasing misses on average by 4.7%. In a PCM based memory system, this decreases energy consumption by 23% compared to LRU and provides a 49% lifetime improvement beyond what is possible with randomized wear-leveling. Our second proposal is “Variable-Timeslice Thread Scheduling" (VATS), an OS kernel-level approach to CPU cache sharing. With modern, large, last-level caches (LLC), the time to fill the LLC is greater than the OS scheduling window. As a result, when a thread aggressively thrashes the LLC by replacing much of the data in it, another thread may not be able to recover its working set before being rescheduled. We isolate the threads in time by increasing their allotted time quanta, and allowing larger periods of time between interfering threads. Our approach, compared to conventional scheduling, mitigates up to 100% of the performance loss caused by CPU LLC interference. The system throughput is boosted by up to 15%. As an unconventional approach to utilizing emerging memory technologies, we present a Ternary Content-Addressable Memory (TCAM) design with Flash transistors. TCAM is successfully used in network routing but can also be utilized in the OS Virtual Memory applications. Based on our layout and circuit simulation experiments, we conclude that our FTCAM block achieves an area improvement of 7.9× and a power improvement of 1.64× compared to a CMOS approach. In order to lower the cost of Main Memory in systems with huge memory demand, it is becoming practical to extend the DRAM in the system with the less-expensive NVMe Flash, for a much lower system cost. However, given the relatively high Flash devices access latency, naively using them as main memory leads to serious performance degradation. We propose OSVPP, a software-only, OS swap-based page prefetching scheme for managing such hybrid DRAM + NVM systems. We show that it is possible to gain about 50% of the lost performance due to swapping into the NVM and thus enable the utilization of such hybrid systems for memory-hungry applications, lowering the memory cost while keeping the performance comparable to the DRAM-only system

    Configurable data center switch architectures

    Get PDF
    In this thesis, we explore alternative architectures for implementing con_gurable Data Center Switches along with the advantages that can be provided by such switches. Our first contribution centers around determining switch architectures that can be implemented on Field Programmable Gate Array (FPGA) to provide configurable switching protocols. In the process, we identify a gap in the availability of frameworks to realistically evaluate the performance of switch architectures in data centers and contribute a simulation framework that relies on realistic data center traffic patterns. Our framework is then used to evaluate the performance of currently existing as well as newly proposed FPGA-amenable switch designs. Through collaborative work with Meng and Papaphilippou, we establish that only small-medium range switches can be implemented on today's FPGAs. Our second contribution is a novel switch architecture that integrates a custom in-network hardware accelerator with a generic switch to accelerate Deep Neural Network training applications in data centers. Our proposed accelerator architecture is prototyped on an FPGA, and a scalability study is conducted to demonstrate the trade-offs of an FPGA implementation when compared to an ASIC implementation. In addition to the hardware prototype, we contribute a light weight load-balancing and congestion control protocol that leverages the unique communication patterns of ML data-parallel jobs to enable fair sharing of network resources across different jobs. Our large-scale simulations demonstrate the ability of our novel switch architecture and light weight congestion control protocol to both accelerate the training time of machine learning jobs by up to 1.34x and benefit other latency-sensitive applications by reducing their 99%-tile completion time by up to 4.5x. As for our final contribution, we identify the main requirements of in-network applications and propose a Network-on-Chip (NoC)-based architecture for supporting a heterogeneous set of applications. Observing the lack of tools to support such research, we provide a tool that can be used to evaluate NoC-based switch architectures.Open Acces

    Design of an Embedded Readout System for the ALOFT Gamma-Ray Detector Instrument

    Get PDF
    Birkeland Center for Space Science has proposed a campaign known as the Airborne Lightning Observatory for FEGS & TGFs (ALOFT) to study Terrestrial Gamma-Ray Flashes (TGFs). TGFs are the most energetic natural phenomena occurring in the Earth’s atmosphere, and are important to our knowledge about the relationship between the Earth and space. The ALOFT campaign will use a gamma-ray detector instrument built by the University of Bergen which will be mounted to the NASA ER-2 High-Altitude Airborne Science Aircraft. This work covers the design and development of the embedded software used to offload and operate the detector readout system of said instrument. A similar instrument was built and flown in 2017. The new instrument differs from this by being implemented on a System on a Chip (SoC) embedded platform, reusing relevant modules from the old instrument. The software has been implemented with the FreeRTOS Realtime Operating System (RTOS). Design considerations to limit complexity, and the impact of the radiation environment the instrument is to be operated in, has been performed trough implementation of a checksum algorithm, cyclic rewriting of registers, and modular design strategies. A verification system has been realized with a prototype hardware setup, in which test systems has been added to process synthetic TGF-events in the software and hardware. Test with emulated data and a Telnet control interface has been successfully implemented. The current implementation focuses on modularity, and thus offers a very good framework for further development of the instrument when campaign specifications are decided.Masteroppgåve i fysikkMAMN-PHYSPHYS39
    corecore