32 research outputs found
Hypersparse Neural Network Analysis of Large-Scale Internet Traffic
The Internet is transforming our society, necessitating a quantitative
understanding of Internet traffic. Our team collects and curates the largest
publicly available Internet traffic data containing 50 billion packets.
Utilizing a novel hypersparse neural network analysis of "video" streams of
this traffic using 10,000 processors in the MIT SuperCloud reveals a new
phenomena: the importance of otherwise unseen leaf nodes and isolated links in
Internet traffic. Our neural network approach further shows that a
two-parameter modified Zipf-Mandelbrot distribution accurately describes a wide
variety of source/destination statistics on moving sample windows ranging from
100,000 to 100,000,000 packets over collections that span years and continents.
The inferred model parameters distinguish different network streams and the
model leaf parameter strongly correlates with the fraction of the traffic in
different underlying network topologies. The hypersparse neural network
pipeline is highly adaptable and different network statistics and training
models can be incorporated with simple changes to the image filter functions.Comment: 11 pages, 10 figures, 3 tables, 60 citations; to appear in IEEE High
Performance Extreme Computing (HPEC) 201
Going beyond diffServ in IP traffic classification
Quality of Service (QoS) management in IP networks today relies on static configuration of classes of service definitions and related forwarding priorities. Packets are actually classified according to the DiffServ architecture based on the RFC 4594, typically thanks to static configuration or filters matching packet features, at network access equipment. In this paper, we propose a dynamic classification procedure, referred to as Learning-powered DiffServ (L-DiffServ), able to detect the distinctive characteristics of traffic and to dynamically assign service classes to IP packets. The idea is to apply semi-unsupervised Machine Learning techniques, such as Linear Discriminant Analysis (LDA) and K-Means, with a proper customization to take into account the issues related to packet-level analysis, i.e. unbalanced distribution of traffic among classes and selection of proper IP header related features. The performance evaluation highlights that L-DiffServ is able to change dynamically the classification outcome, providing an higher number of classes than DiffServ. This last result represents the first step toward a more granular differentiation of IP traffic
Observations of IPv6 Addresses
IPv6 addresses are longer than IPv4 addresses, and are so capable of greater expression. Given an IPv6 address, conventions and standards allow us to draw conclusions about how IPv6 is being used on the node with that address.
We show a technique for analysing IPv6 addresses and apply it to a number of datasets. The datasets include addresses seen at a busy mirror server, at an IPv6-enabled TLD DNS server and when running traceroute across the production IPv6 network. The technique quantifies differences in these datasets that we intuitively expect, and shows that IPv6 is being used in different ways by different groups
Observations of IPv6 Addresses
IPv6 addresses are longer than IPv4 addresses, and are so capable of greater expression. Given an IPv6 address, conventions and standards allow us to draw conclusions about how IPv6 is being used on the node with that address.
We show a technique for analysing IPv6 addresses and apply it to a number of datasets. The datasets include addresses seen at a busy mirror server, at an IPv6-enabled TLD DNS server and when running traceroute across the production IPv6 network. The technique quantifies differences in these datasets that we intuitively expect, and shows that IPv6 is being used in different ways by different groups
Multi-Temporal Analysis and Scaling Relations of 100,000,000,000 Network Packets
Our society has never been more dependent on computer networks. Effective
utilization of networks requires a detailed understanding of the normal
background behaviors of network traffic. Large-scale measurements of networks
are computationally challenging. Building on prior work in interactive
supercomputing and GraphBLAS hypersparse hierarchical traffic matrices, we have
developed an efficient method for computing a wide variety of streaming network
quantities on diverse time scales. Applying these methods to 100,000,000,000
anonymized source-destination pairs collected at a network gateway reveals many
previously unobserved scaling relationships. These observations provide new
insights into normal network background traffic that could be used for anomaly
detection, AI feature engineering, and testing theoretical models of streaming
networks.Comment: 6 pages, 6 figures,3 tables, 49 references, accepted to IEEE HPEC
202
GNOSIS: Global Network Operations Status Information System
Monitoring the global state of a network is a continuing challenge for network operators and users. It has become still harder with increases in scale and heterogeneity. Monitoring requires status information for each node and to construct the global picture at a monitoring point. GNOSIS, the Global Network Operations Status Information System, achieves a global view by careful extraction and presentation of locally available node data. The GNOSIS model improves on the traditional polling model of monitoring schemes by 1.) collecting accurate data 2.) decreasing the granularity with which network applications can detect change in the network and 3.) displaying status information in near real-time.
We define the Network Snapshot as the basic unit of information capture and display in GNOSIS. A Network Snapshot is a visualization of locally available state collected during a common time interval. A sequence of these Network Snapshots over time represent the evolution of network state.
In this paper, we motivate the need for a network monitoring system that can detect global problems, in spite of both scale and heterogeneity. We present three design criteria, Accuracy, Continuity and Timeliness for a global monitoring system. Finally, we present the GNOSIS architecture and demonstrate how it better detects network problems which are currently of concern. The goal of GNOSIS is to present a stream of consistent, accurate local data in a timely manner