8 research outputs found

    Social network behaviour inferred from O-D Pair traffic

    Get PDF
    Because traffic is predominantly formed by communication between users or between users and servers which communicate with users, network traffic inherently exhibits social networking behaviour; the extent of interaction between entities – as identified by their IP addresses – can be extracted from the data and analysed in a multiplicity of ways. In this paper, Anonymized Internet Trace Datasets obtained from the Center for Applied Internet Data Analysis (CAIDA) have been used to identify and estimate characteristics of the underlying social network from the overall traffic. The analysis methods used here fall into two groups, the first being based on frequency analysis and second method being based on the use of traffic matrices, with the latter analysis method being further sub-divided into groups based on the traffic mean, variance and co-variance. The frequency analysis of origin, destination and O-D Pair statistics exhibit heavy tailed behaviour. Because the large number of IP addresses contained in the CAIDA Datasets, only the most predominate IP Addresses are used when estimating all three sub-divided groups of traffic matrices. Principal Component Analysis and related methods are applied to identify key features of each type of traffic matrix. A new system called Antraff has been developed by the authors to carry out all the analysis procedures

    The Internet-Wide Impact of P2P Traffic Localization on ISP Profitability

    Get PDF
    We conduct a detailed simulation study to examine how localizing P2P traffic within network boundaries impacts the profitability of an ISP. A distinguishing aspect of our work is the focus on Internet-wide implications, i.e., how adoption of localization within an ISP affects both itself and other ISPs. Our simulations are based on detailed models that estimate inter-autonomous-system (AS) P2P traffic and inter-AS routing, localization models that predict the extent to which P2P traffic is reduced, and pricing models that predict the impact of changes in traffic on the profit of an ISP. We evaluate our models by using a large-scale crawl of BitTorrent containing over 138 million users sharing 2.75 million files. Our results show that the benefits of localization must not be taken for granted. Some of our key findings include: 1) residential ISPs can actually lose money when localization is employed, and some of them will not see increased profitability until other ISPs employ localization; 2) the reduction in costs due to localization will be limited for small ISPs and tends to grow only logarithmically with client population; and 3) some ISPs can better increase profitability through alternate strategies to localization by taking advantage of the business relationships they have with other ISP

    Identifiability of flow distributions from link measurements with applications to computer networks

    Full text link
    We study the problem of identifiability of distributions of flows on a graph from aggregate measurements collected on its edges. This is a canonical example of a statistical inverse problem motivated by recent developments in computer networks. In this paper (i) we introduce a number of models for multi-modal data that capture their spatio-temporal correlation, (ii) provide sufficient conditions for the identifiability of nth order cumulants and also for a special class of heavy tailed distributions. Further, we investigate conditions on network routing for the flows that prove sufficient for identifiability of their distributions (up to mean). Finally, we extend our results to directed acyclic graphs and discuss some open problems.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/58107/2/ip7_5_004.pd

    Traffic matrix estimation with enhanced origin destination generator algorithm using simulation of real network

    Get PDF
    The rapid growth of the Internet has made the issue of ensuring reliability and redundancy a big challenge. Studies of these issues using Traffic Engineering and simulation have been extensively done. In Traffic Matrix Estimation (TME), the Origin–Destination Generator algorithm (ODGen) is limited to the number of hops, where the Expectation Maximization (EM) accuracy is 92%. Most studies have not taken into account real traffic parameters and integration of TME models with routing protocols in their simulation models. Also, there is no a comprehensive model consisting of TME, Border Gateway Protocol (BGP) and Hot Potato (HP) routing in the NS-2 network simulator based on real networks. In this research, Integrated Simulated Model (ISM) is introduced consisting of ODGen-HP algorithm and BGP integrated into the NS-2 network simulator. ISM is then used to simulate the infrastructure of a real production network using actual captured traffic data parameters. Validation is then done against the changes in network topology based on packet loss, delay and throughput. Results gave the average error for packet sent by simulated and production networks of 0% and the average error for packet received by simulation and production networks of 3.61%. The network is modelled with a baseline topology where 5 main nodes were connected together, with redundant links for some nodes. The simulations were repeated for link failures, node addition, and node removal. TME used in ISM is based on ODGen, that is optimized with unlimited number of hops, the accuracy of EM increases to 97% and Central Processing Unit complexity is reduced. HP helps in improving the node which experiences a link failure to select shorter distance route to egress router. In the case of a link failure, HP switching time between the links is 0.05 seconds. ISM performance was evaluated by comparing trace file before and after link failure or by adding nodes (up to 32) or removing nodes. The parameters used for comparison are the packets loss, delay and throughput. The ISM error percentage obtained for packets loss is 0.025%, delay 0.013% and throughput 0.003%

    Inferring hidden features in the Internet (PhD thesis)

    Full text link
    The Internet is a large-scale decentralized system that is composed of thousands of independent networks. In this system, there are two main components, interdomain routing and traffic, that are vital inputs for many tasks such as traffic engineering, security, and business intelligence. However, due to the decentralized structure of the Internet, global knowledge of both interdomain routing and traffic is hard to come by. In this dissertation, we address a set of statistical inference problems with the goal of extending the knowledge of the interdomain-level Internet. In the first part of this dissertation we investigate the relationship between the interdomain topology and an individual network’s inference ability. We first frame the questions through abstract analysis of idealized topologies, and then use actual routing measurements and topologies to study the ability of real networks to infer traffic flows. In the second part, we study the ability of networks to identify which paths flow through their network. We first discuss that answering this question is surprisingly hard due to the design of interdomain routing systems where each network can learn only a limited set of routes. Therefore, network operators have to rely on observed traffic. However, observed traffic can only identify that a particular route passes through its network but not that a route does not pass through its network. In order to solve the routing inference problem, we propose a nonparametric inference technique that works quite accurately. The key idea behind our technique is measuring the distances between destinations. In order to accomplish that, we define a metric called Routing State Distance (RSD) to measure distances in terms of routing similarity. Finally, in the third part, we study our new metric, RSD in detail. Using RSD we address an important and difficult problem of characterizing the set of paths between networks. The collection of the paths across networks is a great source to understand important phenomena in the Internet as path selections are driven by the economic and performance considerations of the networks. We show that RSD has a number of appealing properties that can discover these hidden phenomena

    Macro- and microscopic analysis of the internet economy from network measurements

    Get PDF
    The growth of the Internet impacts multiple areas of the world economy, and it has become a permanent part of the economic landscape both at the macro- and at microeconomic level. On-line traffic and information are currently assets with large business value. Even though commercial Internet has been a part of our lives for more than two decades, its impact on global, and everyday, economy still holds many unknowns. In this work we analyse important macro- and microeconomic aspects of the Internet. First we investigate the characteristics of the interdomain traffic, which is an important part of the macroscopic economy of the Internet. Finally, we investigate the microeconomic phenomena of price discrimination in the Internet. At the macroscopic level, we describe quantitatively the interdomain traffic matrix (ITM), as seen from the perspective of a large research network. The ITM describes the traffic flowing between autonomous systems (AS) in the Internet. It depicts the traffic between the largest Internet business entities, therefore it has an important impact on the Internet economy. In particular, we analyse the sparsity and statistical distribution of the traffic, and observe that the shape of the statistical distribution of the traffic sourced from an AS might be related to congestion within the network. We also investigate the correlations between rows in the ITM. Finally, we propose a novel method to model the interdomain traffic, that stems from first-principles and recognizes the fact that the traffic is a mixture of different Internet applications, and can have regional artifacts. We present and evaluate a tool to generate such matrices from open and available data. Our results show that our first-principles approach is a promising alternative to the existing solutions in this area, which enables the investigation of what-if scenarios and their impact on the Internet economy. At the microscopic level, we investigate the rising phenomena of price discrimination (PD). We find empirical evidences that Internet users can be subject to price and search discrimination. In particular, we present examples of PD on several ecommerce websites and uncover the information vectors facilitating PD. Later we show that crowd-sourcing is a feasible method to help users to infer if they are subject to PD. We also build and evaluate a system that allows any Internet user to examine if she is subject to PD. The system has been deployed and used by multiple users worldwide, and uncovered more examples of PD. The methods presented in the following papers are backed with thorough data analysis and experiments.Internet es hoy en día un elemento crucial en la economía mundial, su constante crecimiento afecta directamente múltiples aspectos tanto a nivel macro- como a nivel microeconómico. Entre otros aspectos, el tráfico de red y la información que transporta se han convertido en un producto de gran valor comercial para cualquier empresa. Sin embargo, más de dos decadas después de su introducción en nuestras vidas y siendo un elemento de vital importancia, el impacto de Internet en la economía global y diaria es un tema que alberga todavía muchas incógnitas que resolver. En esta disertación analizamos importantes aspectos micro y macroeconómicos de Internet. Primero, investigamos las características del tráfico entre Sistemas Autónomos (AS), que es un parte decisiva de la macroeconomía de Internet. A continuacin, estudiamos el controvertido fenómeno microeconómico de la discriminación de precios en Internet. A nivel macroeconómico, mostramos cuantitatívamente la matriz del tráfico entre AS ("Interdomain Traffic Matrix - ITM"), visto desde la perspectiva de una gran red científica. La ITM obtenida empíricamente muestra la cantidad de tráfico compartido entre diferentes AS, las entidades más grandes en Internet, siendo esto uno de los principales aspectos a evaluar en la economiá de Internet. Esto nos permite por ejemplo, analizar diferentes propiedades estadísticas del tráfico para descubrir si la distribución del tráfico producido por un AS está directamente relacionado con la congestión dentro de la red. Además, este estudio también nos permite investigar las correlaciones entre filas de la ITM, es decir, entre diferentes AS. Por último, basándonos en el estudio empírico, proponemos una innovadora solución para modelar el tráfico en una ITM, teniendo en cuenta que el tráfico modelado es dependiente de las particularidades de cada escenario (e.g., distribución de apliaciones, artefactos). Para obtener resultados representativos, la herramienta propuesta para crear estas matrices es evaluada a partir de conjuntos de datos abiertos, disponibles para toda la comunidad científica. Los resultados obtenidos muestran que el método propuesto es una prometedora alternativa a las soluciones de la literatura. Permitiendo así, la nueva investigación de escenarios desconocidos y su impacto en la economía de Internet. A nivel microeconómico, en esta tesis investigamos el fenómeno de la discriminación de precios en Internet ("price discrimination" - PD). Nuestros estudios permiten mostrar pruebas empíricas de que los usuarios de Internet están expuestos a discriminación de precios y resultados de búsquedas. En particular, presentamos ejemplos de PD en varias páginas de comercio electrónico y descubrimos que informacin usan para llevarlo a cabo. Posteriormente, mostramos como una herramienta crowdsourcing puede ayudar a la comunidad de usuarios a inferir que páginas aplican prácticas de PD. Con el objetivo de mitigar esta cada vez más común práctica, publicamos y evaluamos una herramienta que permite al usuario deducir si está siendo víctima de PD. Esta herramienta, con gran repercusión mediática, ha sido usada por multitud de usuarios alrededor del mundo, descubriendo así más ejemplos de discriminación. Por último remarcar que todos los metodos presentados en esta disertación están respaldados por rigurosos análisis y experimentos

    An independent-connection model for traffic matrices

    No full text
    Abstract A common assumption made in traffic matrix (TM) modeling and estimation is independence of a packet'snetwork ingress and egress. We argue that in real IP networks, this assumption should not and does no
    corecore