18,460 research outputs found
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
LINE: Large-scale Information Network Embedding
This paper studies the problem of embedding very large information networks
into low-dimensional vector spaces, which is useful in many tasks such as
visualization, node classification, and link prediction. Most existing graph
embedding methods do not scale for real world information networks which
usually contain millions of nodes. In this paper, we propose a novel network
embedding method called the "LINE," which is suitable for arbitrary types of
information networks: undirected, directed, and/or weighted. The method
optimizes a carefully designed objective function that preserves both the local
and global network structures. An edge-sampling algorithm is proposed that
addresses the limitation of the classical stochastic gradient descent and
improves both the effectiveness and the efficiency of the inference. Empirical
experiments prove the effectiveness of the LINE on a variety of real-world
information networks, including language networks, social networks, and
citation networks. The algorithm is very efficient, which is able to learn the
embedding of a network with millions of vertices and billions of edges in a few
hours on a typical single machine. The source code of the LINE is available
online.Comment: WWW 201
A Very Brief Introduction to Machine Learning With Applications to Communication Systems
Given the unprecedented availability of data and computing resources, there
is widespread renewed interest in applying data-driven machine learning methods
to problems for which the development of conventional engineering solutions is
challenged by modelling or algorithmic deficiencies. This tutorial-style paper
starts by addressing the questions of why and when such techniques can be
useful. It then provides a high-level introduction to the basics of supervised
and unsupervised learning. For both supervised and unsupervised learning,
exemplifying applications to communication networks are discussed by
distinguishing tasks carried out at the edge and at the cloud segments of the
network at different layers of the protocol stack
Stochastic Online Shortest Path Routing: The Value of Feedback
This paper studies online shortest path routing over multi-hop networks. Link
costs or delays are time-varying and modeled by independent and identically
distributed random processes, whose parameters are initially unknown. The
parameters, and hence the optimal path, can only be estimated by routing
packets through the network and observing the realized delays. Our aim is to
find a routing policy that minimizes the regret (the cumulative difference of
expected delay) between the path chosen by the policy and the unknown optimal
path. We formulate the problem as a combinatorial bandit optimization problem
and consider several scenarios that differ in where routing decisions are made
and in the information available when making the decisions. For each scenario,
we derive a tight asymptotic lower bound on the regret that has to be satisfied
by any online routing policy. These bounds help us to understand the
performance improvements we can expect when (i) taking routing decisions at
each hop rather than at the source only, and (ii) observing per-link delays
rather than end-to-end path delays. In particular, we show that (i) is of no
use while (ii) can have a spectacular impact. Three algorithms, with a
trade-off between computational complexity and performance, are proposed. The
regret upper bounds of these algorithms improve over those of the existing
algorithms, and they significantly outperform state-of-the-art algorithms in
numerical experiments.Comment: 18 page
- …