42,596 research outputs found
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
DeepWalk: Online Learning of Social Representations
We present DeepWalk, a novel approach for learning latent representations of
vertices in a network. These latent representations encode social relations in
a continuous vector space, which is easily exploited by statistical models.
DeepWalk generalizes recent advancements in language modeling and unsupervised
feature learning (or deep learning) from sequences of words to graphs. DeepWalk
uses local information obtained from truncated random walks to learn latent
representations by treating walks as the equivalent of sentences. We
demonstrate DeepWalk's latent representations on several multi-label network
classification tasks for social networks such as BlogCatalog, Flickr, and
YouTube. Our results show that DeepWalk outperforms challenging baselines which
are allowed a global view of the network, especially in the presence of missing
information. DeepWalk's representations can provide scores up to 10%
higher than competing methods when labeled data is sparse. In some experiments,
DeepWalk's representations are able to outperform all baseline methods while
using 60% less training data. DeepWalk is also scalable. It is an online
learning algorithm which builds useful incremental results, and is trivially
parallelizable. These qualities make it suitable for a broad class of real
world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table
Networking - A Statistical Physics Perspective
Efficient networking has a substantial economic and societal impact in a
broad range of areas including transportation systems, wired and wireless
communications and a range of Internet applications. As transportation and
communication networks become increasingly more complex, the ever increasing
demand for congestion control, higher traffic capacity, quality of service,
robustness and reduced energy consumption require new tools and methods to meet
these conflicting requirements. The new methodology should serve for gaining
better understanding of the properties of networking systems at the macroscopic
level, as well as for the development of new principled optimization and
management algorithms at the microscopic level. Methods of statistical physics
seem best placed to provide new approaches as they have been developed
specifically to deal with non-linear large scale systems. This paper aims at
presenting an overview of tools and methods that have been developed within the
statistical physics community and that can be readily applied to address the
emerging problems in networking. These include diffusion processes, methods
from disordered systems and polymer physics, probabilistic inference, which
have direct relevance to network routing, file and frequency distribution, the
exploration of network structures and vulnerability, and various other
practical networking applications.Comment: (Review article) 71 pages, 14 figure
Uncovering the Temporal Dynamics of Diffusion Networks
Time plays an essential role in the diffusion of information, influence and
disease over networks. In many cases we only observe when a node copies
information, makes a decision or becomes infected -- but the connectivity,
transmission rates between nodes and transmission sources are unknown.
Inferring the underlying dynamics is of outstanding interest since it enables
forecasting, influencing and retarding infections, broadly construed. To this
end, we model diffusion processes as discrete networks of continuous temporal
processes occurring at different rates. Given cascade data -- observed
infection times of nodes -- we infer the edges of the global diffusion network
and estimate the transmission rates of each edge that best explain the observed
data. The optimization problem is convex. The model naturally (without
heuristics) imposes sparse solutions and requires no parameter tuning. The
problem decouples into a collection of independent smaller problems, thus
scaling easily to networks on the order of hundreds of thousands of nodes.
Experiments on real and synthetic data show that our algorithm both recovers
the edges of diffusion networks and accurately estimates their transmission
rates from cascade data.Comment: To appear in the 28th International Conference on Machine Learning
(ICML), 2011. Website: http://www.stanford.edu/~manuelgr/netrate
Evolutionary constraints on the complexity of genetic regulatory networks allow predictions of the total number of genetic interactions
Genetic regulatory networks (GRNs) have been widely studied, yet there is a
lack of understanding with regards to the final size and properties of these
networks, mainly due to no network currently being complete. In this study, we
analyzed the distribution of GRN structural properties across a large set of
distinct prokaryotic organisms and found a set of constrained characteristics
such as network density and number of regulators. Our results allowed us to
estimate the number of interactions that complete networks would have, a
valuable insight that could aid in the daunting task of network curation,
prediction, and validation. Using state-of-the-art statistical approaches, we
also provided new evidence to settle a previously stated controversy that
raised the possibility of complete biological networks being random and
therefore attributing the observed scale-free properties to an artifact
emerging from the sampling process during network discovery. Furthermore, we
identified a set of properties that enabled us to assess the consistency of the
connectivity distribution for various GRNs against different alternative
statistical distributions. Our results favor the hypothesis that highly
connected nodes (hubs) are not a consequence of network incompleteness.
Finally, an interaction coverage computed for the GRNs as a proxy for
completeness revealed that high-throughput based reconstructions of GRNs could
yield biased networks with a low average clustering coefficient, showing that
classical targeted discovery of interactions is still needed.Comment: 28 pages, 5 figures, 12 pages supplementary informatio
Enabling Social Applications via Decentralized Social Data Management
An unprecedented information wealth produced by online social networks,
further augmented by location/collocation data, is currently fragmented across
different proprietary services. Combined, it can accurately represent the
social world and enable novel socially-aware applications. We present
Prometheus, a socially-aware peer-to-peer service that collects social
information from multiple sources into a multigraph managed in a decentralized
fashion on user-contributed nodes, and exposes it through an interface
implementing non-trivial social inferences while complying with user-defined
access policies. Simulations and experiments on PlanetLab with emulated
application workloads show the system exhibits good end-to-end response time,
low communication overhead and resilience to malicious attacks.Comment: 27 pages, single ACM column, 9 figures, accepted in Special Issue of
Foundations of Social Computing, ACM Transactions on Internet Technolog
- …