5 research outputs found

    Distributed intelligence for pervasive optical network telemetry

    Get PDF
    Optical network automation and failure management require measuring the status and the performance of the different network devices to anticipate any degradation and ensure the quality of the provided services, i.e., optical connectivity. Such pervasive network telemetry entails collecting large amounts of measurements and events from different sources and with very fine granularity, which given the amount and variety of telemetry sources and the size of each measurement and event, imposes requirements that are hard to achieve without large investments. In this paper, we analyze the main limitations of telemetry architectures relying exclusively on centralized systems for data analysis and propose an architecture with distributed intelligence. Data aggregation techniques, especially conceived for optical network telemetry, are presented with the objective of reducing data dimensionality. Illustrative results from our experimental telemetry system reveal a reduction of 3 orders of magnitude in terms of total data volume without introducing significant error and processing delay and, more importantly, helping network automation algorithms to identify meaningful changes in the network status.HORIZON EUROPE Framework Programme [SEASON (101096120)]; Agencia Estatal de Investigación [IBON (PID2020-114135RB-I00)]; Institució Catalana de Recerca i Estudis Avançats.Peer ReviewedPostprint (author's final draft

    Datacenter Traffic Control: Understanding Techniques and Trade-offs

    Get PDF
    Datacenters provide cost-effective and flexible access to scalable compute and storage resources necessary for today's cloud computing needs. A typical datacenter is made up of thousands of servers connected with a large network and usually managed by one operator. To provide quality access to the variety of applications and services hosted on datacenters and maximize performance, it deems necessary to use datacenter networks effectively and efficiently. Datacenter traffic is often a mix of several classes with different priorities and requirements. This includes user-generated interactive traffic, traffic with deadlines, and long-running traffic. To this end, custom transport protocols and traffic management techniques have been developed to improve datacenter network performance. In this tutorial paper, we review the general architecture of datacenter networks, various topologies proposed for them, their traffic properties, general traffic control challenges in datacenters and general traffic control objectives. The purpose of this paper is to bring out the important characteristics of traffic control in datacenters and not to survey all existing solutions (as it is virtually impossible due to massive body of existing research). We hope to provide readers with a wide range of options and factors while considering a variety of traffic control mechanisms. We discuss various characteristics of datacenter traffic control including management schemes, transmission control, traffic shaping, prioritization, load balancing, multipathing, and traffic scheduling. Next, we point to several open challenges as well as new and interesting networking paradigms. At the end of this paper, we briefly review inter-datacenter networks that connect geographically dispersed datacenters which have been receiving increasing attention recently and pose interesting and novel research problems.Comment: Accepted for Publication in IEEE Communications Surveys and Tutorial

    Optical Layer Failures in a Large Backbone

    No full text
    ABSTRACT We analyze optical layer outages in a large backbone, using data for over a year from thousands of optical channels carrying live IP layer traffic. Our analysis uncovers several findings that can help improve network management and routing. For instance, we find that optical links have a wide range of availabilities, which questions the common assumption in fault-tolerant routing designs that all links have equal failure probabilities. We also find that by monitoring changes in optical signal quality (not visible at IP layer), we can better predict (probabilistically) future outages. Our results suggest that backbone traffic engineering strategies should consider current and past optical layer performance and route computation should be based on the outage-risk profile of the underlying optical links. Keywords Wide-area backbone network; Optical layer; Q-factor; Availability; Outage WHY STUDY OPTICAL LINKS? Wide-area backbone networks (WANs) of Internet service providers and cloud providers are the workhorses of Internet traffic delivery. Providers spend millions of dollars building access points around the world and interconnecting them through optical links. Improving the availability and efficiency of the WAN is central to their ability to provide services in a reliable, cost-effective manner. Consequently, there has been significant research into measuring and characterizing various aspects of WANs, including topology, routing, traffic, and reliability However, prior studies tend to focus exclusively on the IP layer, and little is publicly known about the characteristics of the optical layer which forms the physical transmission Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. medium of WANs. There are studies that focus on dispersion and modulation Studying optical layer characteristics of backbone networks is not simply a matter of curiosity. The health of this layer ultimately determines the network's effectiveness at carrying traffic. For instance, poor optical signal quality can lead to corruption and even silent packet drops We uncover several notable characteristics of this backbone. First, the availability (i.e., uptime) of different optical segments and channels differs by over three orders of magnitude. Second, the distribution of time to repair of planned outages is similar for both optical segments and channels, even though a segment outage tends to represent an order of magnitude greater impairment in network capacity. Third, almost four in five optical segment outages are unidirectional; i.e., one direction is functional while the other is down. Finally, outages can be predicted (probabilistically) based on sudden drops in optical signal quality (which is not visible at the IP layer). There is a 50% chance of an outage within an hour of a drop event and a 70% chance of an outage within one day. Our findings motivate smarter IP layer management and routing, one that is aware of optical layer characteristics. For instance, a common assumption in fault-tolerant routing schemes is that each IP layer link is equally likely to fai
    corecore