76 research outputs found

    Inside Dropbox: Understanding Personal Cloud Storage Services

    Get PDF
    Personal cloud storage services are gaining popularity. With a rush of providers to enter the market and an increasing of- fer of cheap storage space, it is to be expected that cloud storage will soon generate a high amount of Internet traffic. Very little is known about the architecture and the perfor- mance of such systems, and the workload they have to face. This understanding is essential for designing efficient cloud storage systems and predicting their impact on the network. This paper presents a characterization of Dropbox, the leading solution in personal cloud storage in our datasets. By means of passive measurements, we analyze data from four vantage points in Europe, collected during 42 consecu- tive days. Our contributions are threefold: Firstly, we are the first to study Dropbox, which we show to be the most widely-used cloud storage system, already accounting for a volume equivalent to around one third of the YouTube traffic at campus networks on some days. Secondly, we characterize the workload typical users in different environments gener- ate to the system, highlighting how this reflects on network traffic. Lastly, our results show possible performance bot- tlenecks caused by both the current system architecture and the storage protocol. This is exacerbated for users connected far from control and storage data-center

    A Graph Theoretic Perspective on Internet Topology Mapping

    Get PDF
    Understanding the topological characteristics of the Internet is an important research issue as the Internet grows with no central authority. Internet topology mapping studies help better understand the structure and dynamics of the Internet backbone. Knowing the underlying topology, researchers can better develop new protocols and services or fine-tune existing ones. Subnet-level Internet topology measurement studies involve three stages: topology collection, topology construction, and topology analysis. Each of these stages contains challenging tasks, especially when large-scale backbone topologies of millions of nodes are studied. In this dissertation, I first discuss issues in subnet-level Internet topology mapping and review state-of-the-art approaches to handle them. I propose a novel graph data indexing approach to to efficiently process large scale topology data. I then conduct an experimental study to understand how the responsiveness of routers has changed over the last decade and how it differs based on the probing mechanism. I then propose an efficient unresponsive resolution approach by incorporating our structural graph indexing technique. Finally, I introduce Cheleby, an integrated Internet topology mapping system. Cheleby first dynamically probes observed subnetworks using a team of PlanetLab nodes around the world to obtain comprehensive backbone topologies. Then, it utilizes efficient algorithms to resolve subnets, IP aliases, and unresponsive routers in the collected data sets to construct comprehensive subnet-level topologies. Sample topologies are provided at http://cheleby.cse.unr.edu

    Automatic Parallelization of Software Network Functions

    Full text link
    Software network functions (NFs) trade-off flexibility and ease of deployment for an increased challenge of performance. The traditional way to increase NF performance is by distributing traffic to multiple CPU cores, but this poses a significant challenge: how to parallelize an NF without breaking its semantics? We propose Maestro, a tool that analyzes a sequential implementation of an NF and automatically generates an enhanced parallel version that carefully configures the NIC's Receive Side Scaling mechanism to distribute traffic across cores, while preserving semantics. When possible, Maestro orchestrates a shared-nothing architecture, with each core operating independently without shared memory coordination, maximizing performance. Otherwise, Maestro choreographs a fine-grained read-write locking mechanism that optimizes operation for typical Internet traffic. We parallelized 8 software NFs and show that they generally scale-up linearly until bottlenecked by PCIe when using small packets or by 100Gbps line-rate with typical Internet traffic. Maestro further outperforms modern hardware-based transactional memory mechanisms, even for challenging parallel-unfriendly workloads.Comment: 21 pages, 14 figures, to be published in NSDI2

    Machine Learning and Big Data Methodologies for Network Traffic Monitoring

    Get PDF
    Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users’ Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies

    Big Data for Traffic Monitoring and Management

    Get PDF
    The last two decades witnessed tremendous advances in the Information and Com- munications Technologies. Beside improvements in computational power and storage capacity, communication networks carry nowadays an amount of data which was not envisaged only few years ago. Together with their pervasiveness, network complexity increased at the same pace, leaving operators and researchers with few instruments to understand what happens in the networks, and, on the global scale, on the Internet. Fortunately, recent advances in data science and machine learning come to the res- cue of network analysts, and allow analyses with a level of complexity and spatial/tem- poral scope not possible only 10 years ago. In my thesis, I take the perspective of an In- ternet Service Provider (ISP), and illustrate challenges and possibilities of analyzing the traffic coming from modern operational networks. I make use of big data and machine learning algorithms, and apply them to datasets coming from passive measurements of ISP and University Campus networks. The marriage between data science and network measurements is complicated by the complexity of machine learning algorithms, and by the intrinsic multi-dimensionality and variability of this kind of data. As such, my work proposes and evaluates novel techniques, inspired from popular machine learning approaches, but carefully tailored to operate with network traffic. In this thesis, I first provide a thorough characterization of the Internet traffic from 2013 to 2018. I show the most important trends in the composition of traffic and users’ habits across the last 5 years, and describe how the network infrastructure of Internet big players changed in order to support faster and larger traffic. Then, I show the chal- lenges in classifying network traffic, with particular attention to encryption and to the convergence of Internet around few big players. To overcome the limitations of classical approaches, I propose novel algorithms for traffic classification and management lever- aging machine learning techniques, and, in particular, big data approaches. Exploiting temporal correlation among network events, and benefiting from large datasets of op- erational traffic, my algorithms learn common traffic patterns of web services, and use them for (i) traffic classification and (ii) fine-grained traffic management. My proposals are always validated in experimental environments, and, then, deployed in real opera- tional networks, from which I report the most interesting findings I obtain. I also focus on the Quality of Experience (QoE) of web users, as their satisfaction represents the final objective of computer networks. Again, I show that using big data approaches, the network can achieve visibility on the quality of web browsing of users. In general, the algorithms I propose help ISPs have a detailed view of traffic that flows in their network, allowing fine-grained traffic classification and management, and real-time monitoring of users QoE
    • …
    corecore