12 research outputs found

    Исследование структурных свойств сети Интернет на основе метаграфовых моделей

    Get PDF
    Studying the Internet its structure is usually divided into levels: Autonomous Systems Level (AS), Point of Presence Level (PoP), Router Level, etc. The global network can be represented on each of them as a graph based on the initial data obtained from open sources. Consideration of a network within the framework of a separate level facilitates analysis, but does not allow to systematically assess its structural properties when providing the connectivity between several segments of the network related, particularly, to the objects of critical information infrastructure. To overcome this contradiction, a mathematical model of the global network in the form of a metagraph was developed at the interface between AS-level and PoP-level that takes into account the characteristics of each level and allows to find bottlenecks both in the interdomain routing system and in the topology of internal networks of Internet providers. Based on the proposed model some structural phenomena of the global network are described: stub, multihomed and transit autonomous systems, content providers. Taking into account available data from open sources about Internet structure, a method for constructing a metagraph is proposed. A comparative analysis of tools that automate the process of analyzing a network model is carried out. The practice-oriented problems of finding a cutting subset in a metagraph were set. Certain areas of further research are software implementation of the models using module MGtoolkit in Python and the assessment of structural phenomena of Russian segments of the Internet.При исследовании сети Интернет ее структуру разделяют на уровни: уровень автономных систем, уровень точек присутствия операторов связи, уровень оборудования и так далее. На каждом из них глобальная сеть может быть описана в виде графа на основании исходных данных, получаемых из открытых источников. Рассмотрение сети в рамках отдельного уровня упрощает анализ, однако не позволяет системно оценить ее структурные свойства при решении задач обеспечения связности нескольких сегментов сети, относящихся, в частности, к объектам критической информационной инфраструктуры. Для преодоления этого противоречия разработана математическая модель глобальной сети на стыке уровня автономных систем и уровня точек присутствия операторов связи в виде метаграфа, которая учитывает особенности каждого из уровней и позволяет находить «узкие» места как в системе междоменной маршрутизации, так и в топологии внутренних сетей интернет-провайдеров. На основе предложенной модели описаны некоторые структурные феномены глобальной сети: тупиковые, многоинтерфейсные и транзитные автономные системы, контент-провайдеры. С учетом доступных в открытых источниках данных о структуре сети Интернет предложен способ построения метаграфа. Проведен сравнительный анализ инструментов, автоматизирующих процесс анализа модели сети. Сформулированы ориентированные на практику задачи поиска разрезающего подмножества в метаграфе. Определены направления дальнейших исследований – программная реализация инструментов анализа структуры глобальной сети с использованием общедоступного модуля MGtoolkit на языке Python и оценивание структурных феноменов российского сегмента сети Интернет

    Исследование структурных свойств сети Интернет на основе метаграфовых моделей

    Get PDF
    При исследовании сети Интернет ее структуру разделяют на уровни: уровень автономных систем, уровень точек присутствия операторов связи, уровень оборудования и так далее. На каждом из них глобальная сеть может быть описана в виде графа на основании исходных данных, получаемых из открытых источников. Рассмотрение сети в рамках отдельного уровня упрощает анализ, однако не позволяет системно оценить ее структурные свойства при решении задач обеспечения связности нескольких сегментов сети, относящихся, в частности, к объектам критической информационной инфраструктуры. Для преодоления этого противоречия разработана математическая модель глобальной сети на стыке уровня автономных систем и уровня точек присутствия операторов связи в виде метаграфа, которая учитывает особенности каждого из уровней и позволяет находить «узкие» места как в системе междоменной маршрутизации, так и в топологии внутренних сетей интернет-провайдеров. На основе предложенной модели описаны некоторые структурные феномены глобальной сети: тупиковые, многоинтерфейсные и транзитные автономные системы, контент-провайдеры. С учетом доступных в открытых источниках данных о структуре сети Интернет предложен способ построения метаграфа. Проведен сравнительный анализ инструментов, автоматизирующих процесс анализа модели сети. Сформулированы ориентированные на практику задачи поиска разрезающего подмножества в метаграфе. Определены направления дальнейших исследований – программная реализация инструментов анализа структуры глобальной сети с использованием общедоступного модуля MGtoolkit на языке Python и оценивание структурных феноменов российского сегмента сети Интернет

    Border Gateway Protocol Anomaly Detection Using Machine Learning Techniques

    Get PDF
    As the primary protocol used to exchange routing information between network domains, Border Gateway Protocol (BGP) plays a central role in the functioning of the Internet. Border Gateway Protocol is a standardized router protocol used to initiate and maintain communication between domains, or autonomous systems, on the Internet. This protocol can exhibit anomalous behavior caused by improper provisioning, malicious attacks, traffic or equipment failure, and network operator error. At large internet service providers, many BGP issues are not immediately seen or explicitly monitored by network operations centers. This possible blind spot is due to the enormous number of BGP handshakes that occur throughout the network along with the fact that there are many of these sub-interfaces associated to a single physical connection. We will present machine learning methods for anomaly detection using unsupervised learning techniques and create a data pipeline to quickly collect and trigger on these anomalies when they occur. Clustering techniques including k-means and DBSCAN were successfully implemented and able to detect known anomalies for historical events. This approach could incur soft savings by triggering early detection warnings of anomalous BGP events, but human intervention may still be required in order to address possible false positives

    Internet routing paths stability model and relation to forwarding paths

    Get PDF
    Analysis of real datasets to characterize the local stability properties of the Internet routing paths suggests that extending the route selection criteria to account for such property would not increase the routing path length. Nevertheless, even if selecting a more stable routing path could be considered as valuable from a routing perspective, it does not necessarily imply that the associated forwarding path would be more stable. Hence, if the dynamics of the Internet routing and forwarding system show different properties, then one can not straightforwardly derive the one from the other. If this assumption is verified, then the relationship between the stability of the forwarding path (followed by the traffic) and the corresponding routing path as selected by the path-vector routing algorithm requires further characterization. For this purpose, we locally relate, i.e., at the router level, the stability properties of routing path with the corresponding forwarding path. The proposed stability model and measurement results verify this assumption and show that, although the main cause of instability results from the forwarding plane, a second order effect relates forwarding and routing path instability events. This observation provides the first indication that differential stability can safely be taken into account as part of the route selection process

    Machine Learning and Big Data Methodologies for Network Traffic Monitoring

    Get PDF
    Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users’ Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies

    Prefix-Hijacking im Internetrouting : Monitoring, Analyse und Mitigation

    Get PDF
    Die vorliegende Arbeit betrachtet IT-Sicherheitsaspekte des Internetroutings und verbessert etablierte Ansätze zur Entdeckung, zur Klassifikation und zur Untersuchung der Folgen von Anomalien im Internetrouting und entwickelt darüber hinaus das Konzept und zeigt die Erprobung einer effektiven Gegenmaßnahme auf. Dabei steht das Border-Gateway-Protokoll (BGP), das die weltweite Kommunikation zwischen Computersystemen über das Internet erst ermöglicht, im Fokus der Betrachtung. Ausgehend von Computernetzwerken im militärischen Kontext und in Forschungseinrichtungen in den 1980er Jahren, entwickelte sich das Internet zu einem weltumspannenden Netzwerk von Computernetzwerken, das aus der zivilen Gesellschaft nicht mehr wegzudenken ist. Während moderne Anwendungen Haushaltsgeräte miteinander vernetzen und der Austausch von individuellen Erlebnissen den Takt in der modernen Gesellschaft vorgibt, sind die grundlegenden Mechanismen dieser Vernetzung in den letzten Jahren unverändert geblieben. Als Netzwerk von Computernetzwerken ist das Internet ein dynamischer Zusammenschluss sogenannter Autonomer Systeme (AS), also Computernetzwerken von Unternehmen, Forschungseinrichtungen sowie Regierungs- und Nicht-Regierungs-Organisationen. Um Datenpakete zwischen zwei Endgeräten unterschiedlicher AS auszutauschen, müssen auf den unteren Ebenen des eingesetzten TCP/IP-Protokollstacks notwendige Erreichbarkeitsinformationen ausgetauscht und regelmäßig aktualisiert werden. Für den Austausch dieser Erreichbarkeitsinformationen im Internet wird BGP verwendet. Die mit BGP ausgetauschten Erreichbarkeitsinformationen bestehen aus einem IP-Adressbereich (Prefix) und dem AS-Pfad, den ein Paket auf dem Weg zum Ziel durch andere AS zurücklegen muss. Dabei ist in BGP keine Validierung der ausgetauschten Erreichbarkeitsinformationen vorgesehen. Jedes AS kann damit im Grunde beliebige Informationen in das Internetrouting einbringen oder bei der Weiterleitung bestehende Informationen manipulieren. Falsche Erreichbarkeitsinformationen haben unterschiedliche Ursachen, etwa Fehler in der Routing-Hardware, Konfigurationsfehler in der Administration oder gezielte Angriffe. Aus falschen Erreichbarkeitsinformationen resultieren Routinganomalien unterschiedlicher Kritikalität, bis hin zur Nicht-Erreichbarkeit von Prefixen oder der Übernahme von Prefixen durch Angreifer. Diese Übernahme fremder Prefixe durch einen Angreifer nennt man Prefix-Hijacking, also die Entführung eines IP-Adressbereichs. Es gibt keine globale Sicht auf das Internetrouting, so dass eine globale Erkennung von Prefix-Hijacking ohne weiteres nicht möglich ist. Vielmehr besitzt jedes AS eine ganz eigene Sicht auf das Internet, bedingt durch die mit den Nachbarn ausgetauschten Erreichbarkeitsinformationen. Für einen Überblick müssen diese lokalen Sichten zunächst zu einer globalen Sicht zusammengefasst werden. Da Prefix-Hijacking mit der eingesetzten Version von BGP einfach realisiert werden kann, sind weitere Maßnahmen notwendig, um die Schutzziele der IT-Sicherheit im Internetrouting umzusetzen. Präventive Maßnahmen, wie die nachträgliche Absicherung der Erreichbarkeitsinformationen über Protokollerweiterungen oder zusätzliche Protokolle sind bisher nicht flächendeckend eingesetzt und daher ohne Erfolg. Für Prefix-Besitzer bleibt das kontinuierliche Monitoring der eigenen Prefixe im Internet als Maßnahme zur Gewährleistung der IT-Sicherheit. Die vorliegende Arbeit analysiert zunächst die Datenlage zur Umsetzung eines effektiven Monitorings des Internetroutings und berücksichtigt dabei die in der Literatur genutzten Routingarchive unterschiedlicher Anbieter. Durch die Hinzunahme weiterer Quellen, wie Internetknotenpunkten oder sogenannten Looking-Glass-Diensten, werden die in den Routingarchiven enthaltenen Informationen angereichert und die globale Sicht verbessert. Anschließend folgt die Revision der etablierten Methode zur Abschätzung einer Prefix-Hijacking-Resilienz für AS und die Herleitung einer verbesserten Formel zur Folgenabschätzung. Daraufhin wird eine effektive Gegenmaßnahme vorgestellt, die mit der Unterstützung von Partner-AS die Reichweite der legitimen Erreichbarkeitsinformationen ermöglicht und damit eine Mitigation von Prefix-Hijacking zumindest grundsätzlich möglich macht. Durch die vorgestellten Ansätze zur Verbreiterung der Datenbasis, zur Verbesserung der Analyse von Prefix-Hijacking-Folgen und dem Ansatz zur Mitigation von Prefix-Hijacking durch die Prefix-Besitzer, lassen sich verbesserte Maßnahmen zur Sicherstellung der IT-Sicherheits-Schutzziele umsetzen

    Measuring Effectiveness of Address Schemes for AS-level Graphs

    Get PDF
    This dissertation presents measures of efficiency and locality for Internet addressing schemes. Historically speaking, many issues, faced by the Internet, have been solved just in time, to make the Internet just work~\cite{justWork}. Consensus, however, has been reached that today\u27s Internet routing and addressing system is facing serious scaling problems: multi-homing which causes finer granularity of routing policies and finer control to realize various traffic engineering requirements, an increased demand for provider-independent prefix allocations which injects unaggregatable prefixes into the Default Free Zone (DFZ) routing table, and ever-increasing Internet user population and mobile edge devices. As a result, the DFZ routing table is again growing at an exponential rate. Hierarchical, topology-based addressing has long been considered crucial to routing and forwarding scalability. Recently, however, a number of research efforts are considering alternatives to this traditional approach. With the goal of informing such research, we investigated the efficiency of address assignment in the existing (IPv4) Internet. In particular, we ask the question: ``how can we measure the locality of an address scheme given an input AS-level graph?\u27\u27 To do so, we first define a notion of efficiency or locality based on the average number of bit-hops required to advertize all prefixes in the Internet. In order to quantify how far from ``optimal the current Internet is, we assign prefixes to ASes ``from scratch in a manner that preserves observed semantics, using three increasingly strict definitions of equivalence. Next we propose another metric that in some sense quantifies the ``efficiency of the labeling and is independent of forwarding/routing mechanisms. We validate the effectiveness of the metric by applying it to a series of address schemes with increasing randomness given an input AS-level graph. After that we apply the metric to the current Internet address scheme across years and compare the results with those of compact routing schemes

    Internet Interconnection Ecosystem in Finland

    Get PDF
    For both fixed and mobile network operators, interconnection constitutes an indisputably key element to provide end users with a variety of services. Internet interconnection is particularly an intriguing subject due to the importance of the Internet in our everyday lives and our genuine curiosity to grasp its underlying structure. This thesis aims to provide a holistic approach to study the Internet interconnections in a nation-centric stance. To accomplish the objective, initially the method that breaks down the key features of the interconnection analysis is introduced. The nation-centric analysis is conducted for Finland by jointly utilizing the Internet registry data and collected Internet routing data. Covering the last decade of the Finnish Internet, the longitudinal analysis yields significant findings for the Internet address usage statistics and the level of multi-homed networks, along with the classification and inference of relationships between stakeholders in the interconnection ecosystem. The implications that the emerging interconnection models pose for the future global service delivery among both fixed and mobile networks are expounded from the perspective of the existing domestic interconnection practices. The longitudinal interconnectivity study allows us to comprehend both technical and business interfaces between market players by revealing a complete list of customer-provider relationships. Within a national milieu, the assessment of the current Internet market dynamics and future implications of emerging models can be considered in more rationally anticipated manner. Hence, authorities who desire to design new pricing schemes and policies for future networking interconnections can be guided more thoroughly

    On the Analysis of the Internet from a Geographic and Economic Perspective via BGP Raw Data

    Get PDF
    The Internet is nowadays an integral part of the everyone's life, and will become even more important for future generations. Proof of that is the exponential growth of the number of people who are introduced to the network through mobile phones and smartphones and are connected 24/7. Most of them rely on the Internet even for common services, such as online personal bank accounts, or even having a videoconference with a colleague living across the ocean. However, there are only a few people who are aware of what happens to their data once sent from their own devices towards the Internet, and an even smaller number -- represented by an elite of researchers -- have an overview of the infrastructure of the real Internet. Researchers have attempted during the last years to discover details about the characteristics of the Internet in order to create a model on which it would be possible to identify and address possible weaknesses of the real network. Despite several efforts in this direction, currently no model is known to represent the Internet effectively, especially due to the lack of data and the excessive coarse granularity applied by the studies done to date. This thesis addresses both issues considering Internet as a graph whose nodes are represented by Autonomous Systems (AS) and connections are represented by logical connections between ASes. In the first instance, this thesis has the objective to provide new algorithms and heuristics for studying the Internet at a level of granularity considerably more relevant to reality, by introducing economic and geographical elements that actually limit the number of possible paths between the various ASes that data can undertake. Based on these heuristics, this thesis also provides an innovative methodology suitable to quantify the completeness of the available data to identify which ASes should be involved in the BGP data collection process as feeders in order to get a complete and real view of the core of the Internet. Although the results of this methodology highlights that current BGP route collectors are not able to obtain data regarding the vast majority of the ASes part of the core of the Internet, the situation can still be improved by creating new services and incentives to attract the ASes identified by the previous methodology and introduce them as feeders of a BGP route collector
    corecore