3,341 research outputs found
Estimating Graphlet Statistics via Lifting
Exploratory analysis over network data is often limited by the ability to
efficiently calculate graph statistics, which can provide a model-free
understanding of the macroscopic properties of a network. We introduce a
framework for estimating the graphlet count---the number of occurrences of a
small subgraph motif (e.g. a wedge or a triangle) in the network. For massive
graphs, where accessing the whole graph is not possible, the only viable
algorithms are those that make a limited number of vertex neighborhood queries.
We introduce a Monte Carlo sampling technique for graphlet counts, called {\em
Lifting}, which can simultaneously sample all graphlets of size up to
vertices for arbitrary . This is the first graphlet sampling method that can
provably sample every graphlet with positive probability and can sample
graphlets of arbitrary size . We outline variants of lifted graphlet counts,
including the ordered, unordered, and shotgun estimators, random walk starts,
and parallel vertex starts. We prove that our graphlet count updates are
unbiased for the true graphlet count and have a controlled variance for all
graphlets. We compare the experimental performance of lifted graphlet counts to
the state-of-the art graphlet sampling procedures: Waddling and the pairwise
subgraph random walk
Neural Collective Entity Linking
Entity Linking aims to link entity mentions in texts to knowledge bases, and
neural models have achieved recent success in this task. However, most existing
methods rely on local contexts to resolve entities independently, which may
usually fail due to the data sparsity of local information. To address this
issue, we propose a novel neural model for collective entity linking, named as
NCEL. NCEL applies Graph Convolutional Network to integrate both local
contextual features and global coherence information for entity linking. To
improve the computation efficiency, we approximately perform graph convolution
on a subgraph of adjacent entity mentions instead of those in the entire text.
We further introduce an attention scheme to improve the robustness of NCEL to
data noise and train the model on Wikipedia hyperlinks to avoid overfitting and
domain bias. In experiments, we evaluate NCEL on five publicly available
datasets to verify the linking performance as well as generalization ability.
We also conduct an extensive analysis of time complexity, the impact of key
modules, and qualitative results, which demonstrate the effectiveness and
efficiency of our proposed method.Comment: 12 pages, 3 figures, COLING201
Matrices of forests, analysis of networks, and ranking problems
The matrices of spanning rooted forests are studied as a tool for analysing
the structure of networks and measuring their properties. The problems of
revealing the basic bicomponents, measuring vertex proximity, and ranking from
preference relations / sports competitions are considered. It is shown that the
vertex accessibility measure based on spanning forests has a number of
desirable properties. An interpretation for the stochastic matrix of
out-forests in terms of information dissemination is given.Comment: 8 pages. This article draws heavily from arXiv:math/0508171.
Published in Proceedings of the First International Conference on Information
Technology and Quantitative Management (ITQM 2013). This version contains
some corrections and addition
Inference, Learning, and Population Size: Projectivity for SRL Models
A subtle difference between propositional and relational data is that in many
relational models, marginal probabilities depend on the population or domain
size. This paper connects the dependence on population size to the classic
notion of projectivity from statistical theory: Projectivity implies that
relational predictions are robust with respect to changes in domain size. We
discuss projectivity for a number of common SRL systems, and identify syntactic
fragments that are guaranteed to yield projective models. The syntactic
conditions are restrictive, which suggests that projectivity is difficult to
achieve in SRL, and care must be taken when working with different domain
sizes
Subgraphs in preferential attachment models
We consider subgraph counts in general preferential attachment models with
power-law degree exponent . For all subgraphs , we find the scaling
of the expected number of subgraphs as a power of the number of vertices. We
prove our results on the expected number of subgraphs by defining an
optimization problem that finds the optimal subgraph structure in terms of the
indices of the vertices that together span it and by using the representation
of the preferential attachment model as a P\'olya urn model
- …