490 research outputs found
A survey of statistical network models
Networks are ubiquitous in science and have become a focal point for
discussion in everyday life. Formal statistical models for the analysis of
network data have emerged as a major topic of interest in diverse areas of
study, and most of these involve a form of graphical representation.
Probability models on graphs date back to 1959. Along with empirical studies in
social psychology and sociology from the 1960s, these early works generated an
active network community and a substantial literature in the 1970s. This effort
moved into the statistical literature in the late 1970s and 1980s, and the past
decade has seen a burgeoning network literature in statistical physics and
computer science. The growth of the World Wide Web and the emergence of online
networking communities such as Facebook, MySpace, and LinkedIn, and a host of
more specialized professional network communities has intensified interest in
the study of networks and network data. Our goal in this review is to provide
the reader with an entry point to this burgeoning literature. We begin with an
overview of the historical development of statistical network modeling and then
we introduce a number of examples that have been studied in the network
literature. Our subsequent discussion focuses on a number of prominent static
and dynamic network models and their interconnections. We emphasize formal
model descriptions, and pay special attention to the interpretation of
parameters and their estimation. We end with a description of some open
problems and challenges for machine learning and statistics.Comment: 96 pages, 14 figures, 333 reference
Community Detection on Evolving Graphs
Clustering is a fundamental step in many information-retrieval and data-mining applications. Detecting clusters in graphs is also a key tool for finding the community structure in social and behavioral networks. In many of these applications, the input graph evolves over time in a continual and decentralized manner, and, to maintain a good clustering, the clustering algorithm needs to repeatedly probe the graph. Furthermore, there are often limitations on the frequency of such probes, either imposed explicitly by the online platform (e.g., in the case of crawling proprietary social networks like twitter) or implicitly because of resource limitations (e.g., in the case of crawling the web). In this paper, we study a model of clustering on evolving graphs that captures this aspect of the problem. Our model is based on the classical stochastic block model, which has been used to assess rigorously the quality of various static clustering methods. In our model, the algorithm is supposed to reconstruct the planted clustering, given the ability to query for small pieces of local information about the graph, at a limited rate. We design and analyze clustering algorithms that work in this model, and show asymptotically tight upper and lower bounds on their accuracy. Finally, we perform simulations, which demonstrate that our main asymptotic results hold true also in practice
Graphs in machine learning: an introduction
Graphs are commonly used to characterise interactions between objects of
interest. Because they are based on a straightforward formalism, they are used
in many scientific fields from computer science to historical sciences. In this
paper, we give an introduction to some methods relying on graphs for learning.
This includes both unsupervised and supervised methods. Unsupervised learning
algorithms usually aim at visualising graphs in latent spaces and/or clustering
the nodes. Both focus on extracting knowledge from graph topologies. While most
existing techniques are only applicable to static graphs, where edges do not
evolve through time, recent developments have shown that they could be extended
to deal with evolving networks. In a supervised context, one generally aims at
inferring labels or numerical values attached to nodes using both the graph
and, when they are available, node characteristics. Balancing the two sources
of information can be challenging, especially as they can disagree locally or
globally. In both contexts, supervised and un-supervised, data can be
relational (augmented with one or several global graphs) as described above, or
graph valued. In this latter case, each object of interest is given as a full
graph (possibly completed by other characteristics). In this context, natural
tasks include graph clustering (as in producing clusters of graphs rather than
clusters of nodes in a single graph), graph classification, etc. 1 Real
networks One of the first practical studies on graphs can be dated back to the
original work of Moreno [51] in the 30s. Since then, there has been a growing
interest in graph analysis associated with strong developments in the modelling
and the processing of these data. Graphs are now used in many scientific
fields. In Biology [54, 2, 7], for instance, metabolic networks can describe
pathways of biochemical reactions [41], while in social sciences networks are
used to represent relation ties between actors [66, 56, 36, 34]. Other examples
include powergrids [71] and the web [75]. Recently, networks have also been
considered in other areas such as geography [22] and history [59, 39]. In
machine learning, networks are seen as powerful tools to model problems in
order to extract information from data and for prediction purposes. This is the
object of this paper. For more complete surveys, we refer to [28, 62, 49, 45].
In this section, we introduce notations and highlight properties shared by most
real networks. In Section 2, we then consider methods aiming at extracting
information from a unique network. We will particularly focus on clustering
methods where the goal is to find clusters of vertices. Finally, in Section 3,
techniques that take a series of networks into account, where each network i
- …