6 research outputs found

    Inferring Complex AS Relationships

    Get PDF
    ABSTRACT The traditional approach of modeling relationships between ASes abstracts relationship types into three broad categories: transit, peering, and sibling. More complicated configurations exist, and understanding them may advance our knowledge of Internet economics and improve models of routing. We use BGP, traceroute, and geolocation data to extend CAIDA's AS relationship inference algorithm to infer two types of complex relationships: hybrid relationships, where two ASes have different relationships at different interconnection points, and partial transit relationships, which restrict the scope of a customer relationship to the provider's peers and customers. Using this new algorithm, we find 4.5% of the 90,272 provider-customer relationships observed in March 2014 were complex, including 1,071 hybrid relationships and 2,955 partial-transit relationships. Because most peering relationships are invisible, we believe these numbers are lower bounds. We used feedback from operators, and relationships encoded in BGP communities and RPSL, to validate 20% and 6.9% of our partial transit and hybrid inferences, respectively, and found our inferences have 92.9% and 97.0% positive predictive values. Hybrid relationships are not only established between large transit providers; in 57% of the inferred hybrid transit/peering relationships the customer had a customer cone of fewer than 5 ASes

    From the edge to the core : towards informed vantage point selection for internet measurement studies

    Get PDF
    Since the early days of the Internet, measurement scientists are trying to keep up with the fast-paced development of the Internet. As the Internet grew organically over time and without build-in measurability, this process requires many workarounds and due diligence. As a result, every measurement study is only as good as the data it relies on. Moreover, data quality is relative to the research question—a data set suitable to analyze one problem may be insufficient for another. This is entirely expected as the Internet is decentralized, i.e., there is no single observation point from which we can assess the complete state of the Internet. Because of that, every measurement study needs specifically selected vantage points, which fit the research question. In this thesis, we present three different vantage points across the Internet topology— from the edge to the Internet core. We discuss their specific features, suitability for different kinds of research questions, and how to work with the corresponding data. The data sets obtained at the presented vantage points allow us to conduct three different measurement studies and shed light on the following aspects: (a) The prevalence of IP source address spoofing at a large European Internet Exchange Point (IXP), (b) the propagation distance of BGP communities, an optional transitive BGP attribute used for traffic engineering, and (c) the impact of the global COVID-19 pandemic on Internet usage behavior at a large Internet Service Provider (ISP) and three IXPs.Seit den frühen Tagen des Internets versuchen Forscher im Bereich Internet Measu- rement, mit der rasanten Entwicklung des des Internets Schritt zu halten. Da das Internet im Laufe der Zeit organisch gewachsen ist und nicht mit Blick auf Messbar- keit entwickelt wurde, erfordert dieser Prozess eine Meg Workarounds und Sorgfalt. Jede Measurement Studie ist nur so gut wie die Daten, auf die sie sich stützt. Und Datenqualität ist relativ zur Forschungsfrage - ein Datensatz, der für die Analyse eines Problems geeiget ist, kann für ein anderes unzureichend sein. Dies ist durchaus zu erwarten, da das Internet dezentralisiert ist, d. h. es gibt keinen einzigen Be- obachtungspunkt, von dem aus wir den gesamten Zustand des Internets beurteilen können. Aus diesem Grund benötigt jede Measurement Studie gezielt ausgewählte Beobachtungspunkte, die zur Forschungsfrage passen. In dieser Arbeit stellen wir drei verschiedene Beobachtungspunkte vor, die sich über die gsamte Internet-Topologie erstrecken— vom Rand bis zum Kern des Internets. Wir diskutieren ihre spezifischen Eigenschaften, ihre Eignung für verschiedene Klas- sen von Forschungsfragen und den Umgang mit den entsprechenden Daten. Die an den vorgestellten Beobachtungspunkten gewonnenen Datensätze ermöglichen uns die Durchführung von drei verschiedenen Measurement Studien und damit die folgenden Aspekte zu beleuchten: (a) Die Prävalenz von IP Source Address Spoofing bei einem großen europäischen Internet Exchange Point (IXP), (b) die Ausbreitungsdistanz von BGP-Communities, ein optionales transitives BGP-Attribut, das Anwendung im Bereich Traffic-Enigneering findet sowie (c) die Auswirkungen der globalen COVID- 19-Pandemie auf das Internet-Nutzungsverhalten an einem großen Internet Service Provider (ISP) und drei IXPs

    Machine Learning and Big Data Methodologies for Network Traffic Monitoring

    Get PDF
    Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users’ Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies

    Abstracting network policies

    Get PDF
    Almost every human activity in recent years relies either directly or indirectly on the smooth and efficient operation of the Internet. The Internet is an interconnection of multiple autonomous networks that work based on agreed upon policies between various institutions across the world. The network policies guiding an institution’s computer infrastructure both internally (such as firewall relationships) and externally (such as routing relationships) are developed by a diverse group of lawyers, accountants, network administrators, managers amongst others. Network policies developed by this group of individuals are usually done on a white-board in a graph-like format. It is however the responsibility of network administrators to translate and configure the various network policies that have been agreed upon. The configuration of these network policies are generally done on physical devices such as routers, domain name servers, firewalls and other middle boxes. The manual configuration process of such network policies is known to be tedious, time consuming and prone to human error which can lead to various network anomalies in the configuration commands. In recent years, many research projects and corporate organisations have to some level abstracted the network management process with emphasis on network devices (such as Cisco VIRL) or individual network policies (such as Propane). [Continues.]</div

    Improving the Accuracy of the Internet Cartography

    Get PDF
    As the global Internet expands to satisfy the demands of the ever-increasing connected population, profound changes are occurring in its interconnection structure. The pervasive growth of IXPs and CDNs, two initially independent but synergistic infrastructure sectors, have contributed to the gradual flattening of the Internet’s inter-domain hierarchy with primary routing paths shifting from backbone networks to peripheral peering links. At the same time the IPv6 deployment has taken off due to the depletion of unallocated IPv4 addresses. These fundamental changes in Internet dynamics has obvious implications for network engineering and operations, which can be benefited by accurate topology maps to understand the properties of this critical infrastructure. This thesis presents a set of new measurement techniques and inference algorithms to construct a new type of semantically rich Internet map, and improve the state of the art in Internet cartography. The author first develops a methodology to extract large-scale validation data from the Communities BGP attribute, which encodes rich routing meta-data on BGP messages. Based on this better-informed dataset the author proceeds to analyse popular assumptions about inter-domain routing policies and devise a more accurate model to describe inter-AS business relationships. Accordingly, the thesis proposes a new relationship inference algorithm to accurately capture both simple and complex AS relationships across two dimensions: prefix type, and geographic location. Validation against three sources of ground-truth data reveals that the proposed algorithm achieves a near-perfect accuracy. However, any inference approach is constrained by the inability of the existing topology data sources to provide a complete view of the inter-domain topology. To limit the topology incompleteness problem the author augments traditional BGP data with routing policy data obtained directly from IXPs to discover massive peering meshes which have thus far been largely invisible

    Violation of interdomain routing assumptions

    No full text
    International audienceWe challenge a set of assumptions that are frequently used to model interdomain routing in the Internet by confronting them with routing decisions that are actually taken by ASes, as revealed through publicly available BGP feeds. Our results quantify for the first time the extent to which such assumptions are too simple to model real-world Internet routing policies. This should introduce a note of caution into future work that makes these assumptions and should prompt attempts to find more accurate models
    corecore