118 research outputs found

    Big Data for Traffic Monitoring and Management

    Get PDF
    The last two decades witnessed tremendous advances in the Information and Communications Technologies. Beside improvements in computational power and storage capacity, communication networks carry nowadays an amount of data which was not envisaged only few years ago. Together with their pervasiveness, network complexity increased at the same pace, leaving operators and researchers with few instruments to understand what happens in the networks, and, on the global scale, on the Internet. Fortunately, recent advances in data science and machine learning come to the rescue of network analysts, and allow analyses with a level of complexity and spatial/temporal scope not possible only 10 years ago. In my thesis, I take the perspective of an Internet Service Provider (ISP), and illustrate challenges and possibilities of analyzing the traffic coming from modern operational networks. I make use of big data and machine learning algorithms, and apply them to datasets coming from passive measurements of ISP and University Campus networks. The marriage between data science and network measurements is complicated by the complexity of machine learning algorithms, and by the intrinsic multi-dimensionality and variability of this kind of data. As such, my work proposes and evaluates novel techniques, inspired from popular machine learning approaches, but carefully tailored to operate with network traffic

    Big Data for Traffic Monitoring and Management

    Get PDF
    The last two decades witnessed tremendous advances in the Information and Com- munications Technologies. Beside improvements in computational power and storage capacity, communication networks carry nowadays an amount of data which was not envisaged only few years ago. Together with their pervasiveness, network complexity increased at the same pace, leaving operators and researchers with few instruments to understand what happens in the networks, and, on the global scale, on the Internet. Fortunately, recent advances in data science and machine learning come to the res- cue of network analysts, and allow analyses with a level of complexity and spatial/tem- poral scope not possible only 10 years ago. In my thesis, I take the perspective of an In- ternet Service Provider (ISP), and illustrate challenges and possibilities of analyzing the traffic coming from modern operational networks. I make use of big data and machine learning algorithms, and apply them to datasets coming from passive measurements of ISP and University Campus networks. The marriage between data science and network measurements is complicated by the complexity of machine learning algorithms, and by the intrinsic multi-dimensionality and variability of this kind of data. As such, my work proposes and evaluates novel techniques, inspired from popular machine learning approaches, but carefully tailored to operate with network traffic. In this thesis, I first provide a thorough characterization of the Internet traffic from 2013 to 2018. I show the most important trends in the composition of traffic and users’ habits across the last 5 years, and describe how the network infrastructure of Internet big players changed in order to support faster and larger traffic. Then, I show the chal- lenges in classifying network traffic, with particular attention to encryption and to the convergence of Internet around few big players. To overcome the limitations of classical approaches, I propose novel algorithms for traffic classification and management lever- aging machine learning techniques, and, in particular, big data approaches. Exploiting temporal correlation among network events, and benefiting from large datasets of op- erational traffic, my algorithms learn common traffic patterns of web services, and use them for (i) traffic classification and (ii) fine-grained traffic management. My proposals are always validated in experimental environments, and, then, deployed in real opera- tional networks, from which I report the most interesting findings I obtain. I also focus on the Quality of Experience (QoE) of web users, as their satisfaction represents the final objective of computer networks. Again, I show that using big data approaches, the network can achieve visibility on the quality of web browsing of users. In general, the algorithms I propose help ISPs have a detailed view of traffic that flows in their network, allowing fine-grained traffic classification and management, and real-time monitoring of users QoE

    Robust URL Classification With Generative Adversarial Networks

    Get PDF
    Classifying URLs is essential for different applications, such as parental control, URL filtering and Ads/tracking protection. Such systems historically identify URLs by means of regular expressions, even if machine learning alternatives have been proposed to overcome the time-consuming maintenance of classification rules. Classical machine learning algorithms, however, require large samples of URLs to train the models, covering the diverse classes of URLs (i.e., a ground truth), which somehow limits the applicability of the approach. We here give a first step towards the use of Generative Adversarial Neural Networks (GANs) to classify URLs. GANs are attractive for this problem for two reasons. First, GANs can produce samples of URLs belonging to specific classes even if exposed to a limited training set, outputting both synthetic traces and a robust discriminator. Second, a GAN can be trained to discriminate a class of URLs without being exposed to all other URLs classes – i.e., GANs are robust even if not exposed to uninteresting URL classes during training. Experiments on real data show that not only the generated synthetic traces are somehow realistic, but also the URL classification is accurate with GANs. © is is held held by by author/owner(s). author/owner(s)

    Impact of Access Line Capacity on Adaptive Video Streaming Quality - A Passive Perspective

    Get PDF
    Adaptive streaming over HTTP is largely used to deliver live and on-demand video. It works by adjusting video quality according to network conditions. While QoE for different streaming services has been studied, it is still unclear how access line capacity impacts QoE of broadband users in video sessions. We make a first step to answer this question by characterizing parameters influencing QoE, such as frequency of video adaptations. We take a passive point of view, and analyze a dataset summarizing video sessions of a large population for one year. We first split customers based on their estimated access line capacity. Then, we quantify how the latter affects QoE metrics by parsing HTTP requests of Microsoft Smooth Streaming (MSS) services. For selected services, we observe that at least 3~Mbps of downstream capacity is needed to let the player select the best bitrate, while at least 6~Mbps are required to minimize delays to retrieve initial fragments. Surprisingly, customers with faster access lines obtain limited benefits, hinting to restrictions on the design of services

    The stock exchange of influencers: a financial approach for studying fanbase variation trends

    Get PDF
    In many online social networks (OSNs), a limited portion of profiles emerges and reaches a large base of followers, i.e., the so-called social influencers. One of their main goals is to increase their fanbase to increase their visibility, engaging users through their content. In this work, we propose a novel parallel between the ecosystem of OSNs and the stock exchange market. Followers act as private investors, and they follow influencers, i.e., buy stocks, based on their individual preferences and on the information they gather through external sources. In this preliminary study, we show how the approaches proposed in the context of the stock exchange market can be successfully applied to social networks. Our case study focuses on 60 Italian Instagram influencers and shows how their followers short-term trends obtained through Bollinger bands become close to those found in external sources, Google Trends in our case, similarly to phenomena already observed in the financial market. Besides providing a strong correlation between these different trends, our results pose the basis for studying social networks with a new lens, linking them with a different domain

    Disentangling the Information Flood on OSNs: Finding Notable Posts and Topics

    Get PDF
    Online Social Networks (OSNs) are an integral part of modern life for sharing thoughts, stories, and news. An ecosystem of influencers generates a flood of content in the form of posts, some of which have an unusually high level of engagement with the influencer’s fan base. These posts relate to blossoming topics of discussion that generate particular interest among users: The COVID-19 pandemic is a prominent example. Studying these phenomena provides an understanding of the OSN landscape and requires appropriate methods. This paper presents a methodology to discover notable posts and group them according to their related topic. By combining anomaly detection, graph modelling and community detection techniques, we pinpoint salient events automatically, with the ability to tune the amount of them. We showcase our approach using a large Instagram dataset and extract some notable weekly topics that gained momentum from 1.4 million posts. We then illustrate some use cases ranging from the COVID-19 outbreak to sporting events

    Measuring Web Speed From Passive Traces

    Get PDF
    Understanding the quality of Experience (QoE) of web brows- ing is key to optimize services and keep users’ loyalty. This is crucial for both Content Providers and Internet Service Providers (ISPs). Quality is subjective, and the complexity of today’s pages challenges its measurement. OnLoad time and SpeedIndex are notable attempts to quantify web performance with objective metrics. However, these metrics can only be computed by instrumenting the browser and, thus, are not available to ISPs. We designed PAIN: PAssive INdicator for ISPs. It is an automatic system to monitor the performance of web pages from passive measurements. It is open source and available for download. It leverages only flow-level and DNS measurements which are still possible in the network despite the deployment of HTTPS. With unsupervised learn- ing, PAIN automatically creates a machine learning model from the timeline of requests issued by browsers to render web pages, and uses it to measure web performance in real- time. We compared PAIN to indicators based on in-browser instrumentation and found strong correlations between the approaches. PAIN correctly highlights worsening network conditions and provides visibility into web performance. We let PAIN run on a real ISP network, and found that it is able to pinpoint performance variations across time and groups of users

    Realistic testing of RTC applications under mobile networks

    Get PDF
    The increasing usage of Real-Time Communication (RTC) applications for leisure and remote working calls for realistic and reproducible techniques to test them. They are used under very different network conditions: from high-speed broadband networks, to noisy wireless links. As such, it is of paramount importance to assess the impact of the network on users’ Quality of Experience (QoE), especially when it comes to the application’s mechanisms such as video quality adjustment or transmission of redundant data. In this work, we pose the basis for a system in which a target RTC application is tested in an emulated mobile environment. To this end, we leverage ERRANT, a data-driven emulator which includes 32 distinct profiles modeling mobile network performance in different conditions. As a use case, we opt for Cisco Webex, a popular RTC application. We show how variable network conditions impact the packet loss, and, in turn, trigger video quality adjustments, impairing the users’ QoE

    The Internet with Privacy Policies: Measuring The Web Upon Consent

    Get PDF
    To protect user privacy, legislators have regulated the use of tracking technologies, mandating the acquisition of users' consent before collecting data. As a result, websites started showing more and more consent management modules -- i.e., Consent Banners -- the visitors have to interact with to access the website content. Since these banners change the content the browser loads, they challenge web measurement collection, primarily to monitor the extent of tracking technologies, but also to measure web performance. If not correctly handled, Consent Banners prevent crawlers from observing the actual content of the websites. In this paper, we present a comprehensive measurement campaign focusing on popular websites in Europe and the US, visiting both landing and internal pages from different countries around the world. We engineer \TOOL, a Web crawler able to accept the Consent Banners, as most users would do in practice. It lets us compare how webpages change before and after accepting such policies, if present. Our results show that all measurements performed ignoring the Consent Banners offer a biased and partial view of the Web. After accepting the privacy policies, web tracking is far more pervasive, webpages are larger and slower to load
    • …
    corecore