1,432 research outputs found
Hypersparse Neural Network Analysis of Large-Scale Internet Traffic
The Internet is transforming our society, necessitating a quantitative
understanding of Internet traffic. Our team collects and curates the largest
publicly available Internet traffic data containing 50 billion packets.
Utilizing a novel hypersparse neural network analysis of "video" streams of
this traffic using 10,000 processors in the MIT SuperCloud reveals a new
phenomena: the importance of otherwise unseen leaf nodes and isolated links in
Internet traffic. Our neural network approach further shows that a
two-parameter modified Zipf-Mandelbrot distribution accurately describes a wide
variety of source/destination statistics on moving sample windows ranging from
100,000 to 100,000,000 packets over collections that span years and continents.
The inferred model parameters distinguish different network streams and the
model leaf parameter strongly correlates with the fraction of the traffic in
different underlying network topologies. The hypersparse neural network
pipeline is highly adaptable and different network statistics and training
models can be incorporated with simple changes to the image filter functions.Comment: 11 pages, 10 figures, 3 tables, 60 citations; to appear in IEEE High
Performance Extreme Computing (HPEC) 201
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Metrics for Graph Comparison: A Practitioner's Guide
Comparison of graph structure is a ubiquitous task in data analysis and
machine learning, with diverse applications in fields such as neuroscience,
cyber security, social network analysis, and bioinformatics, among others.
Discovery and comparison of structures such as modular communities, rich clubs,
hubs, and trees in data in these fields yields insight into the generative
mechanisms and functional properties of the graph.
Often, two graphs are compared via a pairwise distance measure, with a small
distance indicating structural similarity and vice versa. Common choices
include spectral distances (also known as distances) and distances
based on node affinities. However, there has of yet been no comparative study
of the efficacy of these distance measures in discerning between common graph
topologies and different structural scales.
In this work, we compare commonly used graph metrics and distance measures,
and demonstrate their ability to discern between common topological features
found in both random graph models and empirical datasets. We put forward a
multi-scale picture of graph structure, in which the effect of global and local
structure upon the distance measures is considered. We make recommendations on
the applicability of different distance measures to empirical graph data
problem based on this multi-scale view. Finally, we introduce the Python
library NetComp which implements the graph distances used in this work
Mathematics and the Internet: A Source of Enormous Confusion and Great Potential
Graph theory models the Internet mathematically, and a number of plausible mathematically intersecting network models for the Internet have been developed and studied. Simultaneously, Internet researchers have developed methodology to use real data to validate, or invalidate, proposed Internet models. The authors look at these parallel developments, particularly as they apply to scale-free network models of the preferential attachment type
Statistical Network Analysis for Functional MRI: Summary Networks and Group Comparisons
Comparing weighted networks in neuroscience is hard, because the topological
properties of a given network are necessarily dependent on the number of edges
of that network. This problem arises in the analysis of both weighted and
unweighted networks. The term density is often used in this context, in order
to refer to the mean edge weight of a weighted network, or to the number of
edges in an unweighted one. Comparing families of networks is therefore
statistically difficult because differences in topology are necessarily
associated with differences in density. In this review paper, we consider this
problem from two different perspectives, which include (i) the construction of
summary networks, such as how to compute and visualize the mean network from a
sample of network-valued data points; and (ii) how to test for topological
differences, when two families of networks also exhibit significant differences
in density. In the first instance, we show that the issue of summarizing a
family of networks can be conducted by adopting a mass-univariate approach,
which produces a statistical parametric network (SPN). In the second part of
this review, we then highlight the inherent problems associated with the
comparison of topological functions of families of networks that differ in
density. In particular, we show that a wide range of topological summaries,
such as global efficiency and network modularity are highly sensitive to
differences in density. Moreover, these problems are not restricted to
unweighted metrics, as we demonstrate that the same issues remain present when
considering the weighted versions of these metrics. We conclude by encouraging
caution, when reporting such statistical comparisons, and by emphasizing the
importance of constructing summary networks.Comment: 16 pages, 5 figure
- …