80 research outputs found

    From internet architecture research to standards

    Get PDF
    Many Internet architectural research initiatives have been undertaken over last twenty years. None of them actually reached their intended goal: the evolution of the Internet architecture is still driven by its protocols not by genuine architectural evolutions. As this approach becomes the main limiting factor of Internet growth and application deployment, this paper proposes an alternative research path starting from the root causes (the progressive depletion of the design principles of the Internet) and motivates the need for a common architectural foundation. For this purpose, it proposes a practical methodology to incubate architectural research results as part of the standardization process

    On the dynamics of interdomain routing in the Internet

    Full text link
    The routes used in the Internet's interdomain routing system are a rich information source that could be exploited to answer a wide range of questions.  However, analyzing routes is difficult, because the fundamental object of study is a set of paths. In this dissertation, we present new analysis tools -- metrics and methods -- for analyzing paths, and apply them to study interdomain routing in the Internet over long periods of time. Our contributions are threefold. First, we build on an existing metric (Routing State Distance) to define a new metric that allows us to measure the similarity between two prefixes with respect to the state of the global routing system. Applying this metric over time yields a measure of how the set of paths to each prefix varies at a given timescale. Second, we present PathMiner, a system to extract large scale routing events from background noise and identify the AS (Autonomous System) or AS-link most likely responsible for the event. PathMiner is distinguished from previous work in its ability to identify and analyze large-scale events that may re-occur many times over long timescales. We show that it is scalable, being able to extract significant events from multiple years of routing data at a daily granularity. Finally, we equip Routing State Distance with a new set of tools for identifying and characterizing unusually-routed ASes. At the micro level, we use our tools to identify clusters of ASes that have the most unusual routing at each time. We also show that analysis of individual ASes can expose business and engineering strategies of the organizations owning the ASes.  These strategies are often related to content delivery or service replication. At the macro level, we show that the set of ASes with the most unusual routing defines discernible and interpretable phases of the Internet's evolution. Furthermore, we show that our tools can be used to provide a quantitative measure of the "flattening" of the Internet

    Online Aggregation of the Forwarding Information Base: Accounting for Locality and Churn

    Get PDF

    Improving the accuracy of spoofed traffic inference in inter-domain traffic

    Get PDF
    Ascertaining that a network will forward spoofed traffic usually requires an active probing vantage point in that network, effectively preventing a comprehensive view of this global Internet vulnerability. We argue that broader visibility into the spoofing problem may lie in the capability to infer lack of Source Address Validation (SAV) compliance from large, heavily aggregated Internet traffic data, such as traffic observable at Internet Exchange Points (IXPs). The key idea is to use IXPs as observatories to detect spoofed packets, by leveraging Autonomous System (AS) topology knowledge extracted from Border Gateway Protocol (BGP) data to infer which source addresses should legitimately appear across parts of the IXP switch fabric. In this thesis, we demonstrate that the existing literature does not capture several fundamental challenges to this approach, including noise in BGP data sources, heuristic AS relationship inference, and idiosyncrasies in IXP interconnec- tivity fabrics. We propose Spoofer-IX, a novel methodology to navigate these challenges, leveraging Customer Cone semantics of AS relationships to guide precise classification of inter-domain traffic as In-cone, Out-of-cone ( spoofed ), Unverifiable, Bogon, and Unas- signed. We apply our methodology on extensive data analysis using real traffic data from two distinct IXPs in Brazil, a mid-size and a large-size infrastructure. In the mid-size IXP with more than 200 members, we find an upper bound volume of Out-of-cone traffic to be more than an order of magnitude less than the previous method inferred on the same data, revealing the practical importance of Customer Cone semantics in such analysis. We also found no significant improvement in deployment of SAV in networks using the mid-size IXP between 2017 and 2019. In hopes that our methods and tools generalize to use by other IXPs who want to avoid use of their infrastructure for launching spoofed-source DoS attacks, we explore the feasibility of scaling the system to larger and more diverse IXP infrastructures. To promote this goal, and broad replicability of our results, we make the source code of Spoofer-IX publicly available. This thesis illustrates the subtleties of scientific assessments of operational Internet infrastructure, and the need for a community focus on reproducing and repeating previous methods.A constatação de que uma rede encaminhará tráfego falsificado geralmente requer um ponto de vantagem ativo de medição nessa rede, impedindo efetivamente uma visão abrangente dessa vulnerabilidade global da Internet. Isto posto, argumentamos que uma visibilidade mais ampla do problema de spoofing pode estar na capacidade de inferir a falta de conformidade com as práticas de Source Address Validation (SAV) a partir de dados de tráfego da Internet altamente agregados, como o tráfego observável nos Internet Exchange Points (IXPs). A ideia chave é usar IXPs como observatórios para detectar pacotes falsificados, aproveitando o conhecimento da topologia de sistemas autônomos extraído dos dados do protocolo BGP para inferir quais endereços de origem devem aparecer legitimamente nas comunicações através da infra-estrutura de um IXP. Nesta tese, demonstramos que a literatura existente não captura diversos desafios fundamentais para essa abordagem, incluindo ruído em fontes de dados BGP, inferência heurística de relacionamento de sistemas autônomos e características específicas de interconectividade nas infraestruturas de IXPs. Propomos o Spoofer-IX, uma nova metodologia para superar esses desafios, utilizando a semântica do Customer Cone de relacionamento de sistemas autônomos para guiar com precisão a classificação de tráfego inter-domínio como In-cone, Out-of-cone ( spoofed ), Unverifiable, Bogon, e Unassigned. Aplicamos nossa metodologia em análises extensivas sobre dados reais de tráfego de dois IXPs distintos no Brasil, uma infraestrutura de médio porte e outra de grande porte. No IXP de tamanho médio, com mais de 200 membros, encontramos um limite superior do volume de tráfego Out-of-cone uma ordem de magnitude menor que o método anterior inferiu sob os mesmos dados, revelando a importância prática da semântica do Customer Cone em tal análise. Além disso, não encontramos melhorias significativas na implantação do Source Address Validation (SAV) em redes usando o IXP de tamanho médio entre 2017 e 2019. Na esperança de que nossos métodos e ferramentas sejam aplicáveis para uso por outros IXPs que desejam evitar o uso de sua infraestrutura para iniciar ataques de negação de serviço através de pacotes de origem falsificada, exploramos a viabilidade de escalar o sistema para infraestruturas IXP maiores e mais diversas. Para promover esse objetivo e a ampla replicabilidade de nossos resultados, disponibilizamos publicamente o código fonte do Spoofer-IX. Esta tese ilustra as sutilezas das avaliações científicas da infraestrutura operacional da Internet e a necessidade de um foco da comunidade na reprodução e repetição de métodos anteriores

    Inferring hidden features in the Internet (PhD thesis)

    Full text link
    The Internet is a large-scale decentralized system that is composed of thousands of independent networks. In this system, there are two main components, interdomain routing and traffic, that are vital inputs for many tasks such as traffic engineering, security, and business intelligence. However, due to the decentralized structure of the Internet, global knowledge of both interdomain routing and traffic is hard to come by. In this dissertation, we address a set of statistical inference problems with the goal of extending the knowledge of the interdomain-level Internet. In the first part of this dissertation we investigate the relationship between the interdomain topology and an individual network’s inference ability. We first frame the questions through abstract analysis of idealized topologies, and then use actual routing measurements and topologies to study the ability of real networks to infer traffic flows. In the second part, we study the ability of networks to identify which paths flow through their network. We first discuss that answering this question is surprisingly hard due to the design of interdomain routing systems where each network can learn only a limited set of routes. Therefore, network operators have to rely on observed traffic. However, observed traffic can only identify that a particular route passes through its network but not that a route does not pass through its network. In order to solve the routing inference problem, we propose a nonparametric inference technique that works quite accurately. The key idea behind our technique is measuring the distances between destinations. In order to accomplish that, we define a metric called Routing State Distance (RSD) to measure distances in terms of routing similarity. Finally, in the third part, we study our new metric, RSD in detail. Using RSD we address an important and difficult problem of characterizing the set of paths between networks. The collection of the paths across networks is a great source to understand important phenomena in the Internet as path selections are driven by the economic and performance considerations of the networks. We show that RSD has a number of appealing properties that can discover these hidden phenomena

    Machine Learning and Big Data Methodologies for Network Traffic Monitoring

    Get PDF
    Over the past 20 years, the Internet saw an exponential grown of traffic, users, services and applications. Currently, it is estimated that the Internet is used everyday by more than 3.6 billions users, who generate 20 TB of traffic per second. Such a huge amount of data challenge network managers and analysts to understand how the network is performing, how users are accessing resources, how to properly control and manage the infrastructure, and how to detect possible threats. Along with mathematical, statistical, and set theory methodologies machine learning and big data approaches have emerged to build systems that aim at automatically extracting information from the raw data that the network monitoring infrastructures offer. In this thesis I will address different network monitoring solutions, evaluating several methodologies and scenarios. I will show how following a common workflow, it is possible to exploit mathematical, statistical, set theory, and machine learning methodologies to extract meaningful information from the raw data. Particular attention will be given to machine learning and big data methodologies such as DBSCAN, and the Apache Spark big data framework. The results show that despite being able to take advantage of mathematical, statistical, and set theory tools to characterize a problem, machine learning methodologies are very useful to discover hidden information about the raw data. Using DBSCAN clustering algorithm, I will show how to use YouLighter, an unsupervised methodology to group caches serving YouTube traffic into edge-nodes, and latter by using the notion of Pattern Dissimilarity, how to identify changes in their usage over time. By using YouLighter over 10-month long races, I will pinpoint sudden changes in the YouTube edge-nodes usage, changes that also impair the end users’ Quality of Experience. I will also apply DBSCAN in the deployment of SeLINA, a self-tuning tool implemented in the Apache Spark big data framework to autonomously extract knowledge from network traffic measurements. By using SeLINA, I will show how to automatically detect the changes of the YouTube CDN previously highlighted by YouLighter. Along with these machine learning studies, I will show how to use mathematical and set theory methodologies to investigate the browsing habits of Internauts. By using a two weeks dataset, I will show how over this period, the Internauts continue discovering new websites. Moreover, I will show that by using only DNS information to build a profile, it is hard to build a reliable profiler. Instead, by exploiting mathematical and statistical tools, I will show how to characterize Anycast-enabled CDNs (A-CDNs). I will show that A-CDNs are widely used either for stateless and stateful services. That A-CDNs are quite popular, as, more than 50% of web users contact an A-CDN every day. And that, stateful services, can benefit of A-CDNs, since their paths are very stable over time, as demonstrated by the presence of only a few anomalies in their Round Trip Time. Finally, I will conclude by showing how I used BGPStream an open-source software framework for the analysis of both historical and real-time Border Gateway Protocol (BGP) measurement data. By using BGPStream in real-time mode I will show how I detected a Multiple Origin AS (MOAS) event, and how I studies the black-holing community propagation, showing the effect of this community in the network. Then, by using BGPStream in historical mode, and the Apache Spark big data framework over 16 years of data, I will show different results such as the continuous growth of IPv4 prefixes, and the growth of MOAS events over time. All these studies have the aim of showing how monitoring is a fundamental task in different scenarios. In particular, highlighting the importance of machine learning and of big data methodologies

    Foutbestendige toekomstige internetarchitecturen

    Get PDF

    Macro- and microscopic analysis of the internet economy from network measurements

    Get PDF
    Tesi per compendi de publicacions.The growth of the Internet impacts multiple areas of the world economy, and it has become a permanent part of the economic landscape both at the macro- and at microeconomic level. On-line traffic and information are currently assets with large business value. Even though commercial Internet has been a part of our lives for more than two decades, its impact on global, and everyday, economy still holds many unknowns. In this work we analyse important macro- and microeconomic aspects of the Internet. First we investigate the characteristics of the interdomain traffic, which is an important part of the macroscopic economy of the Internet. Finally, we investigate the microeconomic phenomena of price discrimination in the Internet. At the macroscopic level, we describe quantitatively the interdomain traffic matrix (ITM), as seen from the perspective of a large research network. The ITM describes the traffic flowing between autonomous systems (AS) in the Internet. It depicts the traffic between the largest Internet business entities, therefore it has an important impact on the Internet economy. In particular, we analyse the sparsity and statistical distribution of the traffic, and observe that the shape of the statistical distribution of the traffic sourced from an AS might be related to congestion within the network. We also investigate the correlations between rows in the ITM. Finally, we propose a novel method to model the interdomain traffic, that stems from first-principles and recognizes the fact that the traffic is a mixture of different Internet applications, and can have regional artifacts. We present and evaluate a tool to generate such matrices from open and available data. Our results show that our first-principles approach is a promising alternative to the existing solutions in this area, which enables the investigation of what-if scenarios and their impact on the Internet economy. At the microscopic level, we investigate the rising phenomena of price discrimination (PD). We find empirical evidences that Internet users can be subject to price and search discrimination. In particular, we present examples of PD on several ecommerce websites and uncover the information vectors facilitating PD. Later we show that crowd-sourcing is a feasible method to help users to infer if they are subject to PD. We also build and evaluate a system that allows any Internet user to examine if she is subject to PD. The system has been deployed and used by multiple users worldwide, and uncovered more examples of PD. The methods presented in the following papers are backed with thorough data analysis and experiments.Internet es hoy en día un elemento crucial en la economía mundial, su constante crecimiento afecta directamente múltiples aspectos tanto a nivel macro- como a nivel microeconómico. Entre otros aspectos, el tráfico de red y la información que transporta se han convertido en un producto de gran valor comercial para cualquier empresa. Sin embargo, más de dos decadas después de su introducción en nuestras vidas y siendo un elemento de vital importancia, el impacto de Internet en la economía global y diaria es un tema que alberga todavía muchas incógnitas que resolver. En esta disertación analizamos importantes aspectos micro y macroeconómicos de Internet. Primero, investigamos las características del tráfico entre Sistemas Autónomos (AS), que es un parte decisiva de la macroeconomía de Internet. A continuacin, estudiamos el controvertido fenómeno microeconómico de la discriminación de precios en Internet. A nivel macroeconómico, mostramos cuantitatívamente la matriz del tráfico entre AS ("Interdomain Traffic Matrix - ITM"), visto desde la perspectiva de una gran red científica. La ITM obtenida empíricamente muestra la cantidad de tráfico compartido entre diferentes AS, las entidades más grandes en Internet, siendo esto uno de los principales aspectos a evaluar en la economiá de Internet. Esto nos permite por ejemplo, analizar diferentes propiedades estadísticas del tráfico para descubrir si la distribución del tráfico producido por un AS está directamente relacionado con la congestión dentro de la red. Además, este estudio también nos permite investigar las correlaciones entre filas de la ITM, es decir, entre diferentes AS. Por último, basándonos en el estudio empírico, proponemos una innovadora solución para modelar el tráfico en una ITM, teniendo en cuenta que el tráfico modelado es dependiente de las particularidades de cada escenario (e.g., distribución de apliaciones, artefactos). Para obtener resultados representativos, la herramienta propuesta para crear estas matrices es evaluada a partir de conjuntos de datos abiertos, disponibles para toda la comunidad científica. Los resultados obtenidos muestran que el método propuesto es una prometedora alternativa a las soluciones de la literatura. Permitiendo así, la nueva investigación de escenarios desconocidos y su impacto en la economía de Internet. A nivel microeconómico, en esta tesis investigamos el fenómeno de la discriminación de precios en Internet ("price discrimination" - PD). Nuestros estudios permiten mostrar pruebas empíricas de que los usuarios de Internet están expuestos a discriminación de precios y resultados de búsquedas. En particular, presentamos ejemplos de PD en varias páginas de comercio electrónico y descubrimos que informacin usan para llevarlo a cabo. Posteriormente, mostramos como una herramienta crowdsourcing puede ayudar a la comunidad de usuarios a inferir que páginas aplican prácticas de PD. Con el objetivo de mitigar esta cada vez más común práctica, publicamos y evaluamos una herramienta que permite al usuario deducir si está siendo víctima de PD. Esta herramienta, con gran repercusión mediática, ha sido usada por multitud de usuarios alrededor del mundo, descubriendo así más ejemplos de discriminación. Por último remarcar que todos los metodos presentados en esta disertación están respaldados por rigurosos análisis y experimentos.Postprint (published version
    • …
    corecore