163 research outputs found

    悪性Webサイト探索のための効率的な巡回順序の決定法

    Get PDF

    Web content usage behavior: a case study of a university in Sub-Saharan Africa

    Get PDF
    This study characterizes the behavior of students in one of the Sub-Saharan Africa universities with respect to usage of valuable, but expensive Internet resources in a developing economy. Traffic allocation to web domains was compared with known statistical distributions such as Zipf and Stretched exponential distributions. Observed results show that traffic allocation follows stretched exponential distribution and that students use only small fraction of traffic for actual education activities and the rest of the traffic is associated with video services and social networks. The cumulative results from male and female students’ hostels show gender differences in browsing habits. More importantly, actual usage is at variance with the surveys conducted by OFCOM in UK and those conducted among students at the university, thus showing that usage patterns significantly differ from surveys result and between the users in advanced and developing economies

    Network overload avoidance by traffic engineering and content caching

    Get PDF
    The Internet traffic volume continues to grow at a great rate, now driven by video and TV distribution. For network operators it is important to avoid congestion in the network, and to meet service level agreements with their customers. This thesis presents work on two methods operators can use to reduce links loads in their networks: traffic engineering and content caching. This thesis studies access patterns for TV and video and the potential for caching. The investigation is done both using simulation and by analysis of logs from a large TV-on-Demand system over four months. The results show that there is a small set of programs that account for a large fraction of the requests and that a comparatively small local cache can be used to significantly reduce the peak link loads during prime time. The investigation also demonstrates how the popularity of programs changes over time and shows that the access pattern in a TV-on-Demand system very much depends on the content type. For traffic engineering the objective is to avoid congestion in the network and to make better use of available resources by adapting the routing to the current traffic situation. The main challenge for traffic engineering in IP networks is to cope with the dynamics of Internet traffic demands. This thesis proposes L-balanced routings that route the traffic on the shortest paths possible but make sure that no link is utilised to more than a given level L. L-balanced routing gives efficient routing of traffic and controlled spare capacity to handle unpredictable changes in traffic. We present an L-balanced routing algorithm and a heuristic search method for finding L-balanced weight settings for the legacy routing protocols OSPF and IS-IS. We show that the search and the resulting weight settings work well in real network scenarios

    Per-host DDoS mitigation by direct-control reinforcement learning

    Get PDF
    DDoS attacks plague the availability of online services today, yet like many cybersecurity problems are evolving and non-stationary. Normal and attack patterns shift as new protocols and applications are introduced, further compounded by burstiness and seasonal variation. Accordingly, it is difficult to apply machine learning-based techniques and defences in practice. Reinforcement learning (RL) may overcome this detection problem for DDoS attacks by managing and monitoring consequences; an agent’s role is to learn to optimise performance criteria (which are always available) in an online manner. We advance the state-of-the-art in RL-based DDoS mitigation by introducing two agent classes designed to act on a per-flow basis, in a protocol-agnostic manner for any network topology. This is supported by an in-depth investigation of feature suitability and empirical evaluation. Our results show the existence of flow features with high predictive power for different traffic classes, when used as a basis for feedback-loop-like control. We show that the new RL agent models can offer a significant increase in goodput of legitimate TCP traffic for many choices of host density

    On Internet Traffic Classification: A Two-Phased Machine Learning Approach

    Get PDF
    Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption. This paper aims to overcome these challenges by proposing two-phased machine learning classification mechanism with NetFlow as input. The individual flow classes are derived per application through k-means and are further used to train a C5.0 decision tree classifier. As part of validation, the initial unsupervised phase used flow records of fifteen popular Internet applications that were collected and independently subjected to k-means clustering to determine unique flow classes generated per application. The derived flow classes were afterwards used to train and test a supervised C5.0 based decision tree. The resulting classifier reported an average accuracy of 92.37% on approximately 3.4 million test cases increasing to 96.67% with adaptive boosting. The classifier specificity factor which accounted for differentiating content specific from supplementary flows ranged between 98.37% and 99.57%. Furthermore, the computational performance and accuracy of the proposed methodology in comparison with similar machine learning techniques lead us to recommend its extension to other applications in achieving highly granular real-time traffic classification

    Understanding the performance of Internet video over residential networks

    Get PDF
    Video streaming applications are now commonplace among home Internet users, who typically access the Internet using DSL or Cable technologies. However, the effect of these technologies on video performance, in terms of degradations in video quality, is not well understood. To enable continued deployment of applications with improved quality of experience for home users, it is essential to understand the nature of network impairments and develop means to overcome them. In this dissertation, I demonstrate the type of network conditions experienced by Internet video traffic, by presenting a new dataset of the packet level performance of real-time streaming to residential Internet users. Then, I use these packet level traces to evaluate the performance of commonly used models for packet loss simulation, and finding the models to be insufficient, present a new type of model that more accurately captures the loss behaviour. Finally, to demonstrate how a better understanding of the network can improve video quality in a real application scenario, I evaluate the performance of forward error correction schemes for Internet video using the measurements. I show that performance can be poor, devise a new metric to predict performance of error recovery from the characteristics of the input, and validate that the new packet loss model allows more realistic simulations. For the effective deployment of Internet video systems to users of residential access networks, a firm understanding of these networks is required. This dissertation provides insights into the packet level characteristics that can be expected from such networks, and techniques to realistically simulate their behaviour, promoting development of future video applications

    Improving the accuracy of spoofed traffic inference in inter-domain traffic

    Get PDF
    Ascertaining that a network will forward spoofed traffic usually requires an active probing vantage point in that network, effectively preventing a comprehensive view of this global Internet vulnerability. We argue that broader visibility into the spoofing problem may lie in the capability to infer lack of Source Address Validation (SAV) compliance from large, heavily aggregated Internet traffic data, such as traffic observable at Internet Exchange Points (IXPs). The key idea is to use IXPs as observatories to detect spoofed packets, by leveraging Autonomous System (AS) topology knowledge extracted from Border Gateway Protocol (BGP) data to infer which source addresses should legitimately appear across parts of the IXP switch fabric. In this thesis, we demonstrate that the existing literature does not capture several fundamental challenges to this approach, including noise in BGP data sources, heuristic AS relationship inference, and idiosyncrasies in IXP interconnec- tivity fabrics. We propose Spoofer-IX, a novel methodology to navigate these challenges, leveraging Customer Cone semantics of AS relationships to guide precise classification of inter-domain traffic as In-cone, Out-of-cone ( spoofed ), Unverifiable, Bogon, and Unas- signed. We apply our methodology on extensive data analysis using real traffic data from two distinct IXPs in Brazil, a mid-size and a large-size infrastructure. In the mid-size IXP with more than 200 members, we find an upper bound volume of Out-of-cone traffic to be more than an order of magnitude less than the previous method inferred on the same data, revealing the practical importance of Customer Cone semantics in such analysis. We also found no significant improvement in deployment of SAV in networks using the mid-size IXP between 2017 and 2019. In hopes that our methods and tools generalize to use by other IXPs who want to avoid use of their infrastructure for launching spoofed-source DoS attacks, we explore the feasibility of scaling the system to larger and more diverse IXP infrastructures. To promote this goal, and broad replicability of our results, we make the source code of Spoofer-IX publicly available. This thesis illustrates the subtleties of scientific assessments of operational Internet infrastructure, and the need for a community focus on reproducing and repeating previous methods.A constatação de que uma rede encaminhará tráfego falsificado geralmente requer um ponto de vantagem ativo de medição nessa rede, impedindo efetivamente uma visão abrangente dessa vulnerabilidade global da Internet. Isto posto, argumentamos que uma visibilidade mais ampla do problema de spoofing pode estar na capacidade de inferir a falta de conformidade com as práticas de Source Address Validation (SAV) a partir de dados de tráfego da Internet altamente agregados, como o tráfego observável nos Internet Exchange Points (IXPs). A ideia chave é usar IXPs como observatórios para detectar pacotes falsificados, aproveitando o conhecimento da topologia de sistemas autônomos extraído dos dados do protocolo BGP para inferir quais endereços de origem devem aparecer legitimamente nas comunicações através da infra-estrutura de um IXP. Nesta tese, demonstramos que a literatura existente não captura diversos desafios fundamentais para essa abordagem, incluindo ruído em fontes de dados BGP, inferência heurística de relacionamento de sistemas autônomos e características específicas de interconectividade nas infraestruturas de IXPs. Propomos o Spoofer-IX, uma nova metodologia para superar esses desafios, utilizando a semântica do Customer Cone de relacionamento de sistemas autônomos para guiar com precisão a classificação de tráfego inter-domínio como In-cone, Out-of-cone ( spoofed ), Unverifiable, Bogon, e Unassigned. Aplicamos nossa metodologia em análises extensivas sobre dados reais de tráfego de dois IXPs distintos no Brasil, uma infraestrutura de médio porte e outra de grande porte. No IXP de tamanho médio, com mais de 200 membros, encontramos um limite superior do volume de tráfego Out-of-cone uma ordem de magnitude menor que o método anterior inferiu sob os mesmos dados, revelando a importância prática da semântica do Customer Cone em tal análise. Além disso, não encontramos melhorias significativas na implantação do Source Address Validation (SAV) em redes usando o IXP de tamanho médio entre 2017 e 2019. Na esperança de que nossos métodos e ferramentas sejam aplicáveis para uso por outros IXPs que desejam evitar o uso de sua infraestrutura para iniciar ataques de negação de serviço através de pacotes de origem falsificada, exploramos a viabilidade de escalar o sistema para infraestruturas IXP maiores e mais diversas. Para promover esse objetivo e a ampla replicabilidade de nossos resultados, disponibilizamos publicamente o código fonte do Spoofer-IX. Esta tese ilustra as sutilezas das avaliações científicas da infraestrutura operacional da Internet e a necessidade de um foco da comunidade na reprodução e repetição de métodos anteriores

    Improving Anycast with Measurements

    Get PDF
    Since the first Distributed Denial-of-Service (DDoS) attacks were launched, the strength of such attacks has been steadily increasing, from a few megabits per second to well into the terabit/s range. The damage that these attacks cause, mostly in terms of financial cost, has prompted researchers and operators alike to investigate and implement mitigation strategies. Examples of such strategies include local filtering appliances, Border Gateway Protocol (BGP)-based blackholing and outsourced mitigation in the form of cloud-based DDoS protection providers. Some of these strategies are more suited towards high bandwidth DDoS attacks than others. For example, using a local filtering appliance means that all the attack traffic will still pass through the owner's network. This inherently limits the maximum capacity of such a device to the bandwidth that is available. BGP Blackholing does not have such limitations, but can, as a side-effect, cause service disruptions to end-users. A different strategy, that has not attracted much attention in academia, is based on anycast. Anycast is a technique that allows operators to replicate their service across different physical locations, while keeping that service addressable with just a single IP-address. It relies on the BGP to effectively load balance users. In practice, it is combined with other mitigation strategies to allow those to scale up. Operators can use anycast to scale their mitigation capacity horizontally. Because anycast relies on BGP, and therefore in essence on the Internet itself, it can be difficult for network engineers to fine tune this balancing behavior. In this thesis, we show that that is indeed the case through two different case studies. In the first, we focus on an anycast service during normal operations, namely the Google Public DNS, and show that the routing within this service is far from optimal, for example in terms of distance between the client and the server. In the second case study, we observe the root DNS, while it is under attack, and show that even though in aggregate the bandwidth available to this service exceeds the attack we observed, clients still experienced service degradation. This degradation was caused due to the fact that some sites of the anycast service received a much higher share of traffic than others. In order for operators to improve their anycast networks, and optimize it in terms of resilience against DDoS attacks, a method to assess the actual state of such a network is required. Existing methodologies typically rely on external vantage points, such as those provided by RIPE Atlas, and are therefore limited in scale, and inherently biased in terms of distribution. We propose a new measurement methodology, named Verfploeter, to assess the characteristics of anycast networks in terms of client to Point-of-Presence (PoP) mapping, i.e. the anycast catchment. This method does not rely on external vantage points, is free of bias and offers a much higher resolution than any previous method. We validated this methodology by deploying it on a testbed that was locally developed, as well as on the B root DNS. We showed that the increased \textit{resolution} of this methodology improved our ability to assess the impact of changes in the network configuration, when compared to previous methodologies. As final validation we implement Verfploeter on Cloudflare's global-scale anycast Content Delivery Network (CDN), which has almost 200 global Points-of-Presence and an aggregate bandwidth of 30 Tbit/s. Through three real-world use cases, we demonstrate the benefits of our methodology: Firstly, we show that changes that occur when withdrawing routes from certain PoPs can be accurately mapped, and that in certain cases the effect of taking down a combination of PoPs can be calculated from individual measurements. Secondly, we show that Verfploeter largely reinstates the ping to its former glory, showing how it can be used to troubleshoot network connectivity issues in an anycast context. Thirdly, we demonstrate how accurate anycast catchment maps offer operators a new and highly accurate tool to identify and filter spoofed traffic. Where possible, we make datasets collected over the course of the research in this thesis available as open access data. The two best (open) dataset awards that were awarded for these datasets confirm that they are a valued contribution. In summary, we have investigated two large anycast services and have shown that their deployments are not optimal. We developed a novel measurement methodology, that is free of bias and is able to obtain highly accurate anycast catchment mappings. By implementing this methodology and deploying it on a global-scale anycast network we show that our method adds significant value to the fast-growing anycast CDN industry and enables new ways of detecting, filtering and mitigating DDoS attacks
    corecore