3,530 research outputs found
Leveraging Node Attributes for Incomplete Relational Data
Relational data are usually highly incomplete in practice, which inspires us
to leverage side information to improve the performance of community detection
and link prediction. This paper presents a Bayesian probabilistic approach that
incorporates various kinds of node attributes encoded in binary form in
relational models with Poisson likelihood. Our method works flexibly with both
directed and undirected relational networks. The inference can be done by
efficient Gibbs sampling which leverages sparsity of both networks and node
attributes. Extensive experiments show that our models achieve the
state-of-the-art link prediction results, especially with highly incomplete
relational data.Comment: Appearing in ICML 201
Learning Edge Representations via Low-Rank Asymmetric Projections
We propose a new method for embedding graphs while preserving directed edge
information. Learning such continuous-space vector representations (or
embeddings) of nodes in a graph is an important first step for using network
information (from social networks, user-item graphs, knowledge bases, etc.) in
many machine learning tasks.
Unlike previous work, we (1) explicitly model an edge as a function of node
embeddings, and we (2) propose a novel objective, the "graph likelihood", which
contrasts information from sampled random walks with non-existent edges.
Individually, both of these contributions improve the learned representations,
especially when there are memory constraints on the total size of the
embeddings. When combined, our contributions enable us to significantly improve
the state-of-the-art by learning more concise representations that better
preserve the graph structure.
We evaluate our method on a variety of link-prediction task including social
networks, collaboration networks, and protein interactions, showing that our
proposed method learn representations with error reductions of up to 76% and
55%, on directed and undirected graphs. In addition, we show that the
representations learned by our method are quite space efficient, producing
embeddings which have higher structure-preserving accuracy but are 10 times
smaller
Modeling homophily and stochastic equivalence in symmetric relational data
This article discusses a latent variable model for inference and prediction
of symmetric relational data.
The model, based on the idea of the eigenvalue decomposition, represents the
relationship between two nodes as the weighted inner-product of node-specific
vectors of latent characteristics. This ``eigenmodel'' generalizes other
popular latent variable models, such as latent class and distance models: It is
shown mathematically that any latent class or distance model has a
representation as an eigenmodel, but not vice-versa. The practical implications
of this are examined in the context of three real datasets, for which the
eigenmodel has as good or better out-of-sample predictive performance than the
other two models.Comment: 12 pages, 4 figures, 1 tabl
The Strength of Arcs and Edges in Interaction Networks: Elements of a Model-Based Approach
When analyzing interaction networks, it is common to interpret the amount of
interaction between two nodes as the strength of their relationship. We argue
that this interpretation may not be appropriate, since the interaction between
a pair of nodes could potentially be explained only by characteristics of the
nodes that compose the pair and, however, not by pair-specific features. In
interaction networks, where edges or arcs are count-valued, the above scenario
corresponds to a model of independence for the expected interaction in the
network, and consequently we propose the notions of arc strength, and edge
strength to be understood as departures from this model of independence. We
discuss how our notion of arc/edge strength can be used as a guidance to study
network structure, and in particular we develop a latent arc strength
stochastic blockmodel for directed interaction networks. We illustrate our
approach studying the interaction between the Kolkata users of the myGamma
mobile network.Comment: 23 pages, 5 figures, 4 table
Consistency of adjacency spectral embedding for the mixed membership stochastic blockmodel
The mixed membership stochastic blockmodel is a statistical model for a
graph, which extends the stochastic blockmodel by allowing every node to
randomly choose a different community each time a decision of whether to form
an edge is made. Whereas spectral analysis for the stochastic blockmodel is
increasingly well established, theory for the mixed membership case is
considerably less developed. Here we show that adjacency spectral embedding
into , followed by fitting the minimum volume enclosing convex
-polytope to the principal components, leads to a consistent estimate
of a -community mixed membership stochastic blockmodel. The key is to
identify a direct correspondence between the mixed membership stochastic
blockmodel and the random dot product graph, which greatly facilitates
theoretical analysis. Specifically, a norm and central
limit theorem for the random dot product graph are exploited to respectively
show consistency and partially correct the bias of the procedure.Comment: 12 pages, 6 figure
An efficient and principled method for detecting communities in networks
A fundamental problem in the analysis of network data is the detection of
network communities, groups of densely interconnected nodes, which may be
overlapping or disjoint. Here we describe a method for finding overlapping
communities based on a principled statistical approach using generative network
models. We show how the method can be implemented using a fast, closed-form
expectation-maximization algorithm that allows us to analyze networks of
millions of nodes in reasonable running times. We test the method both on
real-world networks and on synthetic benchmarks and find that it gives results
competitive with previous methods. We also show that the same approach can be
used to extract nonoverlapping community divisions via a relaxation method, and
demonstrate that the algorithm is competitively fast and accurate for the
nonoverlapping problem.Comment: 14 pages, 5 figures, 1 tabl
- …