9 research outputs found
Incast mitigation in a data center storage cluster through a dynamic fair-share buffer policy
Incast is a phenomenon when multiple devices interact with only one device at a given time. Multiple storage senders overflow either the switch buffer or the single-receiver memory. This pattern causes all concurrent-senders to stop and wait for buffer/memory availability, and leads to a packet loss and retransmissionâresulting in a huge latency. We present a software-defined technique tackling the many-to-one communication patternâIncastâin a data center storage cluster. Our proposed method decouples the default TCP windowing mechanism from all storage servers, and delegates it to the software-defined storage controller. The proposed method removes the TCP saw-tooth behavior, provides a global flow awareness, and implements the dynamic fair-share buffer policy for end-to-end I/O path. It considers all I/O stages (applications, device drivers, NICs, switches/routers, file systems, I/O schedulers, main memory, and physical disks) while achieving the maximum I/O throughput. The policy, which is part of the proposed method, allocates fair-share bandwidth utilization for all storage servers. Priority queues are incorporated to handle the most important data flows. In addition, the proposed method provides better manageability and maintainability compared with traditional storage networks, where data plane and control plane reside in the same device
Datacenter Traffic Control: Understanding Techniques and Trade-offs
Datacenters provide cost-effective and flexible access to scalable compute
and storage resources necessary for today's cloud computing needs. A typical
datacenter is made up of thousands of servers connected with a large network
and usually managed by one operator. To provide quality access to the variety
of applications and services hosted on datacenters and maximize performance, it
deems necessary to use datacenter networks effectively and efficiently.
Datacenter traffic is often a mix of several classes with different priorities
and requirements. This includes user-generated interactive traffic, traffic
with deadlines, and long-running traffic. To this end, custom transport
protocols and traffic management techniques have been developed to improve
datacenter network performance.
In this tutorial paper, we review the general architecture of datacenter
networks, various topologies proposed for them, their traffic properties,
general traffic control challenges in datacenters and general traffic control
objectives. The purpose of this paper is to bring out the important
characteristics of traffic control in datacenters and not to survey all
existing solutions (as it is virtually impossible due to massive body of
existing research). We hope to provide readers with a wide range of options and
factors while considering a variety of traffic control mechanisms. We discuss
various characteristics of datacenter traffic control including management
schemes, transmission control, traffic shaping, prioritization, load balancing,
multipathing, and traffic scheduling. Next, we point to several open challenges
as well as new and interesting networking paradigms. At the end of this paper,
we briefly review inter-datacenter networks that connect geographically
dispersed datacenters which have been receiving increasing attention recently
and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial
Network and Server Resource Management Strategies for Data Centre Infrastructures: A Survey
The advent of virtualisation and the increasing demand for outsourced, elastic compute charged on a pay-as-you-use basis has stimulated the development of large-scale Cloud Data Centres (DCs) housing tens of thousands of computer
clusters. Of the signi�cant capital outlay required for building and operating such infrastructures, server and network equipment account for 45% and 15% of the total cost, respectively, making resource utilisation e�ciency paramount in order to increase the operators' Return-on-Investment (RoI).
In this paper, we present an extensive survey on the management of server and network resources over virtualised Cloud DC infrastructures, highlighting
key concepts and results, and critically discussing their limitations and implications for future research opportunities. We highlight the need for and bene
�ts of adaptive resource provisioning that alleviates reliance on static utilisation prediction models and exploits direct measurement of resource utilisation
on servers and network nodes. Coupling such distributed measurement with logically-centralised Software De�ned Networking (SDN) principles, we subsequently
discuss the challenges and opportunities for converged resource management over converged ICT environments, through unifying control loops to globally orchestrate adaptive and load-sensitive resource provisioning
Fastpass: A Centralized âZero-Queueâ Datacenter Network
An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) avoidance. Current datacenter networks inherit the principles that went into the design of the Internet, where packet transmission and path selection decisions are distributed among the endpoints and routers. Instead, we propose that each sender should delegate controlâto a centralized arbiterâof when each packet should be transmitted and what path it should follow. This paper describes Fastpass, a datacenter network architecture built using this principle. Fastpass incorporates two fast algorithms: the first determines the time at which each packet should be transmitted, while the second determines the path to use for that packet. In addition, Fastpass uses an efficient protocol between the endpoints and the arbiter and an arbiter replication strategy for fault-tolerant failover. We deployed and evaluated Fastpass in a portion of Facebookâs datacenter network. Our results show that Fastpass achieves high throughput comparable to current networks at a 240 reduction is queue lengths (4.35 Mbytes reducing to 18 Kbytes), achieves much fairer and consistent flow throughputs than the baseline TCP (5200 reduction in the standard deviation of per-flow throughput with five concurrent connections), scalability from 1 to 8 cores in the arbiter implementation with the ability to schedule 2.21 Terabits/s of traffic in software on eight cores, and a 2.5 reduction in the number of TCP retransmissions in a latency-sensitive service at Facebook.National Science Foundation (U.S.) (grant IIS-1065219)Irwin Mark Jacobs and Joan Klein Jacobs Presidential FellowshipHertz Foundation (Fellowship
Software-defined datacenter network debugging
Software-defined Networking (SDN) enables flexible network management, but as networks
evolve to a large number of end-points with diverse network policies, higher
speed, and higher utilization, abstraction of networks by SDN makes monitoring and
debugging network problems increasingly harder and challenging. While some problems
impact packet processing in the data plane (e.g., congestion), some cause policy
deployment failures (e.g., hardware bugs); both create inconsistency between operator
intent and actual network behavior. Existing debugging tools are not sufficient to
accurately detect, localize, and understand the root cause of problems observed in a
large-scale networks; either they lack in-network resources (compute, memory, or/and
network bandwidth) or take long time for debugging network problems.
This thesis presents three debugging tools: PathDump, SwitchPointer, and Scout,
and a technique for tracing packet trajectories called CherryPick. We call for a different
approach to network monitoring and debugging: in contrast to implementing
debugging functionality entirely in-network, we should carefully partition the debugging
tasks between end-hosts and network elements. Towards this direction, we present
CherryPick, PathDump, and SwitchPointer. The core of CherryPick is to cherry-pick the
links that are key to representing an end-to-end path of a packet, and to embed picked
linkIDs into its header on its way to destination.
PathDump is an end-host based network debugger based on tracing packet trajectories,
and exploits resources at the end-hosts to implement various monitoring and
debugging functionalities. PathDump currently runs over a real network comprising
only of commodity hardware, and yet, can support surprisingly a large class of network
debugging problems with minimal in-network functionality.
The key contributions of SwitchPointer is to efficiently provide network visibility
to end-host based network debuggers like PathDump by using switch memory as a
"directory service" â each switch, rather than storing telemetry data necessary for
debugging functionalities, stores pointers to end hosts where relevant telemetry data is
stored. The key design choice of thinking about memory as a directory service allows
to solve performance problems that were hard or infeasible with existing designs.
Finally, we present and solve a network policy fault localization problem that arises
in operating policy management frameworks for a production network. We develop
Scout, a fully-automated system that localizes faults in a large scale policy deployment
and further pin-points the physical-level failures which are most likely cause for
observed faults
Online learning on the programmable dataplane
This thesis makes the case for managing computer networks with datadriven methods automated statistical inference and control based on measurement data and runtime observationsâand argues for their tight integration with programmable dataplane hardware to make management decisions faster and from more precise data. Optimisation, defence, and measurement of networked infrastructure are each challenging tasks in their own right, which are currently dominated by the use of hand-crafted heuristic methods. These become harder to reason about and deploy as networks scale in rates and number of forwarding elements, but their design requires expert knowledge and care around unexpected protocol interactions. This makes tailored, per-deployment or -workload solutions infeasible to develop. Recent advances in machine learning offer capable function approximation and closed-loop control which suit many of these tasks. New, programmable dataplane hardware enables more agility in the networkâ runtime reprogrammability, precise traffic measurement, and low latency on-path processing. The synthesis of these two developments allows complex decisions to be made on previously unusable state, and made quicker by offloading inference to the network.
To justify this argument, I advance the state of the art in data-driven defence of networks, novel dataplane-friendly online reinforcement learning algorithms, and in-network data reduction to allow classification of switchscale data. Each requires co-design aware of the network, and of the failure modes of systems and carried traffic. To make online learning possible in the dataplane, I use fixed-point arithmetic and modify classical (non-neural) approaches to take advantage of the SmartNIC compute model and make use of rich device local state. I show that data-driven solutions still require great care to correctly design, but with the right domain expertise they can improve on pathological cases in DDoS defence, such as protecting legitimate UDP traffic. In-network aggregation to histograms is shown to enable accurate classification from fine temporal effects, and allows hosts to scale such classification to far larger flow counts and traffic volume. Moving reinforcement learning to the dataplane is shown to offer substantial benefits to stateaction latency and online learning throughput versus host machines; allowing policies to react faster to fine-grained network events. The dataplane environment is key in making reactive online learning feasibleâto port further algorithms and learnt functions, I collate and analyse the strengths of current and future hardware designs, as well as individual algorithms
The terminator : an AI-based framework to handle dependability threats in large-scale distributed systems
With the advent of resource-hungry applications such as scientific simulations and artificial intelligence (AI), the need for high-performance computing (HPC) infrastructure is becoming more pressing. HPC systems are typically characterised by the scale of the resources they possess, containing a large number of sophisticated HW components that are tightly integrated. This scale and design complexity inherently contribute to sources of uncertainties, i.e., there are dependability threats that perturb the system during application execution. During system execution, these HPC systems generate a massive amount of log messages that capture the health status of the various components. Several previous works have leveraged those systemsâ logs for dependability purposes, such as failure prediction, with varying results. In this work, three novel AI-based techniques are proposed to address two major dependability problems, those of (i) error detection and (ii) failure prediction. The proposed error detection technique leverages the sentiments embedded in log messages in a novel way, making the approach HPC system-independent, i.e., the technique can be used to detect errors in any HPC system. On the other hand, two novel self-supervised transformer neural networks are developed for failure prediction, thereby obviating the need for labels, which are notoriously difficult to obtain in HPC systems. The first transformer technique, called Clairvoyant, accurately predicts the location of the failure, while the second technique, called Time Machine, extends Clairvoyant by also accurately predicting the lead time to failure (LTTF). Time Machine addresses the typical regression problem of LTTF as a novel multi-class classification problem, using a novel oversampling method for online time-based task training. Results from six real-world HPC clustersâ datasets show that our approaches significantly outperform the state-of-the-art methods on various metrics