85 research outputs found

    Uncovering the big players of the web

    Get PDF
    In this paper we aim at observing how today the Internet large organizations deliver web content to end users. Using one-week long data sets collected at three vantage points aggregating more than 30,000 Internet customers, we characterize the offered services precisely quantifying and comparing the performance of different players. Results show that today 65% of the web traffic is handled by the top 10 organiza- tions. We observe that, while all of them serve the same type of content, different server architectures have been adopted considering load bal- ancing schemes, servers number and location: some organizations handle thousands of servers with the closest being few milliseconds far away from the end user, while others manage few data centers. Despite this, the performance of bulk transfer rate offered to end users are typically good, but impairment can arise when content is not readily available at the server and has to be retrieved from the CDN back-en

    Predictive CDN Selection for Video Delivery Based on LSTM Network Performance Forecasts and Cost-Effective Trade-Offs

    Get PDF
    Owing to increasing consumption of video streams and demand for higher quality content and more advanced displays, future telecommunication networks are expected to outperform current networks in terms of key performance indicators (KPIs). Currently, content delivery networks (CDNs) are used to enhance media availability and delivery performance across the Internet in a cost-effective manner. The proliferation of CDN vendors and business models allows the content provider (CP) to use multiple CDN providers simultaneously. However, extreme concurrency dynamics can affect CDN capacity, causing performance degradation and outages, while overestimated demand affects costs. 5G standardization communities envision advanced network functions executing video analytics to enhance or boost media services. Network accelerators are required to enforce CDN resilience and efficient utilization of CDN assets. In this regard, this study investigates a cost-effective service to dynamically select the CDN for each session and video segment at the Media Server, without any modification to the video streaming pipeline being required. This service performs time series forecasts by employing a Long Short-Term Memory (LSTM) network to process real time measurements coming from connected video players. This service also ensures reliable and cost-effective content delivery through proactive selection of the CDN that fits with performance and business constraints. To this end, the proposed service predicts the number of players that can be served by each CDN at each time; then, it switches the required players between CDNs to keep the (Quality of Service) QoS rates or to reduce the CP's operational expenditure (OPEX). The proposed solution is evaluated by a real server, CDNs, and players and delivering dynamic adaptive streaming over HTTP (MPEG-DASH), where clients are notified to switch to another CDN through a standard MPEG-DASH media presentation description (MPD) update mechanismThis work was supported in part by the EC projects Fed4Fire+, under Grant 732638 (H2020-ICT-13-2016, Research and Innovation Action), and in part by Open-VERSO project (Red Cervera Program, Spanish Government's Centre for the Development of Industrial Technology

    Go-With-The-Winner: Client-Side Server Selection for Content Delivery

    Full text link
    Content delivery networks deliver much of the web and video content in the world by deploying a large distributed network of servers. We model and analyze a simple paradigm for client-side server selection that is commonly used in practice where each user independently measures the performance of a set of candidate servers and selects the one that performs the best. For web (resp., video) delivery, we propose and analyze a simple algorithm where each user randomly chooses two or more candidate servers and selects the server that provided the best hit rate (resp., bit rate). We prove that the algorithm converges quickly to an optimal state where all users receive the best hit rate (resp., bit rate), with high probability. We also show that if each user chose just one random server instead of two, some users receive a hit rate (resp., bit rate) that tends to zero. We simulate our algorithm and evaluate its performance with varying choices of parameters, system load, and content popularity.Comment: 15 pages, 9 figures, published in IFIP Networking 201

    The Need for an Intelligent Measurement Plane: the Example of Time-Variant CDN Policies

    Get PDF
    In this paper we characterize how web-based ser- vices are delivered by large organizations in today's Internet. Taking advantage of two week-long data sets separated in time by 10 months and reporting the web activity of more than 10,000 ADSL residential customers, we identify the services offered by large organizations like Google, Akamai and Amazon. We then compare the evolution of both policies used to serve requests, and the infrastructure they use to match the users' demand. Results depict an overcrowded scenario in constant evolution. Big-players are more and more responsible for the majority of the volume and a plethora of other organizations offering similar or more specific services through different CDNs and traffic policies. Unfortunately, no standard tools and methodologies are available to capture and expose the hidden properties of this in constant evolution picture. A deeper understanding of such dynamics is however fundamental to improve the performance of current and future Internet. To this extend, we claim the need for a Internet- wide, standard, flexible and intelligent measurement plane to be added to the current Internet infrastructur

    DNS to the rescue: Discerning Content and Services in a Tangled Web

    Get PDF
    A careful perusal of the Internet evolution reveals two major trends - explosion of cloud-based services and video stream- ing applications. In both of the above cases, the owner (e.g., CNN, YouTube, or Zynga) of the content and the organiza- tion serving it (e.g., Akamai, Limelight, or Amazon EC2) are decoupled, thus making it harder to understand the asso- ciation between the content, owner, and the host where the content resides. This has created a tangled world wide web that is very hard to unwind, impairing ISPs' and network ad- ministrators' capabilities to control the traffic flowing on the network. In this paper, we present DN-Hunter, a system that lever- ages the information provided by DNS traffic to discern the tangle. Parsing through DNS queries, DN-Hunter tags traffic flows with the associated domain name. This association has several applications and reveals a large amount of useful in- formation: (i) Provides a fine-grained traffic visibility even when the traffic is encrypted (i.e., TLS/SSL flows), thus en- abling more effective policy controls, (ii) Identifies flows even before the flows begin, thus providing superior net- work management capabilities to administrators, (iii) Un- derstand and track (over time) different CDNs and cloud providers that host content for a particular resource, (iv) Discern all the services/content hosted by a given CDN or cloud provider in a particular geography and time, and (v) Provides insights into all applications/services running on any given layer-4 port number. We conduct extensive experimental analysis and show that the results from real traffic traces, ranging from FTTH to 4G ISPs, that support our hypothesis. Simply put, the informa- tion provided by DNS traffic is one of the key components required to unveil the tangled web, and bring the capabilities of controlling the traffic back to the network carrier

    DCCast: Efficient Point to Multipoint Transfers Across Datacenters

    Full text link
    Using multiple datacenters allows for higher availability, load balancing and reduced latency to customers of cloud services. To distribute multiple copies of data, cloud providers depend on inter-datacenter WANs that ought to be used efficiently considering their limited capacity and the ever-increasing data demands. In this paper, we focus on applications that transfer objects from one datacenter to several datacenters over dedicated inter-datacenter networks. We present DCCast, a centralized Point to Multi-Point (P2MP) algorithm that uses forwarding trees to efficiently deliver an object from a source datacenter to required destination datacenters. With low computational overhead, DCCast selects forwarding trees that minimize bandwidth usage and balance load across all links. With simulation experiments on Google's GScale network, we show that DCCast can reduce total bandwidth usage and tail Transfer Completion Times (TCT) by up to 50%50\% compared to delivering the same objects via independent point-to-point (P2P) transfers.Comment: 9th USENIX Workshop on Hot Topics in Cloud Computing, https://www.usenix.org/conference/hotcloud17/program/presentation/noormohammadpou

    Machine Learning and Big Data Methodologies for Network Traffic Monitoring

    Get PDF
    Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users’ Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies

    Content and Geographical Locality in User-Generated Content Sharing Systems

    Get PDF
    International audienceUser Generated Content (UGC), such as YouTube videos, accounts for a substantial fraction of the Internet traffic. To optimize their performance, UGC services usually rely on both proactive and reactive approaches that exploit spatial and temporal locality in access patterns. Alternative types of locality are also relevant and hardly ever considered together. In this paper, we show on a large (more than 650,000 videos) YouTube dataset that content locality (induced by the related videos feature) and geographic locality, are in fact correlated. More specifically, we show how the geographic view distribution of a video can be inferred to a large extent from that of its related videos. We leverage these findings to propose a UGC storage system that proactively places videos close to the expected requests. Compared to a caching-based solution, our system decreases by 16% the number of requests served from a different country than that of the requesting user, and even in this case, the distance between the user and the server is 29% shorter on average
    • 

    corecore