5,738 research outputs found
Hypersparse Neural Network Analysis of Large-Scale Internet Traffic
The Internet is transforming our society, necessitating a quantitative
understanding of Internet traffic. Our team collects and curates the largest
publicly available Internet traffic data containing 50 billion packets.
Utilizing a novel hypersparse neural network analysis of "video" streams of
this traffic using 10,000 processors in the MIT SuperCloud reveals a new
phenomena: the importance of otherwise unseen leaf nodes and isolated links in
Internet traffic. Our neural network approach further shows that a
two-parameter modified Zipf-Mandelbrot distribution accurately describes a wide
variety of source/destination statistics on moving sample windows ranging from
100,000 to 100,000,000 packets over collections that span years and continents.
The inferred model parameters distinguish different network streams and the
model leaf parameter strongly correlates with the fraction of the traffic in
different underlying network topologies. The hypersparse neural network
pipeline is highly adaptable and different network statistics and training
models can be incorporated with simple changes to the image filter functions.Comment: 11 pages, 10 figures, 3 tables, 60 citations; to appear in IEEE High
Performance Extreme Computing (HPEC) 201
The Internet's unexploited path diversity
The connectivity of the Internet at the Autonomous System level is influenced
by the network operator policies implemented. These in turn impose a direction
to the announcement of address advertisements and, consequently, to the paths
that can be used to reach back such destinations. We propose to use directed
graphs to properly represent how destinations propagate through the Internet
and the number of arc-disjoint paths to quantify this network's path diversity.
Moreover, in order to understand the effects that policies have on the
connectivity of the Internet, numerical analyses of the resulting directed
graphs were conducted. Results demonstrate that, even after policies have been
applied, there is still path diversity which the Border Gateway Protocol cannot
currently exploit.Comment: Submitted to IEEE Communications Letter
Exploring networks with traceroute-like probes: theory and simulations
Mapping the Internet generally consists in sampling the network from a
limited set of sources by using traceroute-like probes. This methodology, akin
to the merging of different spanning trees to a set of destination, has been
argued to introduce uncontrolled sampling biases that might produce statistical
properties of the sampled graph which sharply differ from the original ones. In
this paper we explore these biases and provide a statistical analysis of their
origin. We derive an analytical approximation for the probability of edge and
vertex detection that exploits the role of the number of sources and targets
and allows us to relate the global topological properties of the underlying
network with the statistical accuracy of the sampled graph. In particular, we
find that the edge and vertex detection probability depends on the betweenness
centrality of each element. This allows us to show that shortest path routed
sampling provides a better characterization of underlying graphs with broad
distributions of connectivity. We complement the analytical discussion with a
throughout numerical investigation of simulated mapping strategies in network
models with different topologies. We show that sampled graphs provide a fair
qualitative characterization of the statistical properties of the original
networks in a fair range of different strategies and exploration parameters.
Moreover, we characterize the level of redundancy and completeness of the
exploration process as a function of the topological properties of the network.
Finally, we study numerically how the fraction of vertices and edges discovered
in the sampled graph depends on the particular deployements of probing sources.
The results might hint the steps toward more efficient mapping strategies.Comment: This paper is related to cond-mat/0406404, with explorations of
different networks and complementary discussion
Measured impact of crooked traceroute
Data collected using traceroute-based algorithms underpins research into the Internet’s router-level topology, though it is possible to infer false links from this data. One source of false inference is the combination of per-flow load-balancing, in which more than one path is active from a given source to destination, and classic traceroute, which varies the UDP destination port number or ICMP checksum of successive probe packets, which can cause per-flow load-balancers to treat successive packets as distinct flows and forward them along different paths. Consequently, successive probe packets can solicit responses from unconnected routers, leading to the inference of false links. This paper examines the inaccuracies induced from such false inferences, both on macroscopic and ISP topology mapping. We collected macroscopic topology data to 365k destinations, with techniques that both do and do not try to capture load balancing phenomena.We then use alias resolution techniques to infer if a measurement artifact of classic traceroute induces a false router-level link. This technique detected that 2.71% and 0.76% of the links in our UDP and ICMP graphs were falsely inferred due to the presence of load-balancing. We conclude that most per-flow load-balancing does not induce false links when macroscopic topology is inferred using classic traceroute. The effect of false links on ISP topology mapping is possibly much worse, because the degrees of a tier-1 ISP’s routers derived from classic traceroute were inflated by a median factor of 2.9 as compared to those inferred with Paris traceroute
A critical look at power law modelling of the Internet
This paper takes a critical look at the usefulness of power law models of the
Internet. The twin focuses of the paper are Internet traffic and topology
generation. The aim of the paper is twofold. Firstly it summarises the state of
the art in power law modelling particularly giving attention to existing open
research questions. Secondly it provides insight into the failings of such
models and where progress needs to be made for power law research to feed
through to actual improvements in network performance.Comment: To appear Computer Communication
The Internet AS-Level Topology: Three Data Sources and One Definitive Metric
We calculate an extensive set of characteristics for Internet AS topologies
extracted from the three data sources most frequently used by the research
community: traceroutes, BGP, and WHOIS. We discover that traceroute and BGP
topologies are similar to one another but differ substantially from the WHOIS
topology. Among the widely considered metrics, we find that the joint degree
distribution appears to fundamentally characterize Internet AS topologies as
well as narrowly define values for other important metrics. We discuss the
interplay between the specifics of the three data collection mechanisms and the
resulting topology views. In particular, we show how the data collection
peculiarities explain differences in the resulting joint degree distributions
of the respective topologies. Finally, we release to the community the input
topology datasets, along with the scripts and output of our calculations. This
supplement should enable researchers to validate their models against real data
and to make more informed selection of topology data sources for their specific
needs.Comment: This paper is a revised journal version of cs.NI/050803
- …