307,024 research outputs found
EvalNE : a framework for evaluating network embeddings on link prediction
In this paper, we present EvalNE, a Python toolbox for evaluating network embedding methods on
link prediction tasks. Link prediction is one of the most popular choices for evaluating the quality of
network embeddings. However, the complexity of this task requires a carefully designed evaluation
pipeline to provide consistent, reproducible and comparable results. EvalNE simplifies this process
by providing automation and abstraction of tasks such as hyper-parameter tuning and model validation,
edge sampling and negative edge sampling, computation of edge embeddings from node
embeddings, and evaluation metrics. The toolbox allows for the evaluation of any off-the-shelf embedding
method without the need to write extra code. Moreover, it can also be used for evaluating
link prediction methods and integrates several link prediction heuristics as baselines. Finally, demonstrating
the usefulness of EvalNE in practice, we conduct an extensive analysis where we replicate
the experimental sections of several influential papers in the community
Toward link predictability of complex networks
The organization of real networks usually embodies both regularities and irregularities, and, in principle, the former can be modeled. The extent to which the formation of a network can be explained coincides with our ability to predict missing links. To understand network organization, we should be able to estimate link predictability. We assume that the regularity of a network is reflected in the consistency of structural features before and after a random removal of a small set of links. Based on the perturbation of the adjacency matrix, we propose a universal structural consistency index that is free of prior knowledge of network organization. Extensive experiments on disparate real-world networks demonstrate that (i) structural consistency is a good estimation of link predictability and (ii) a derivative algorithm outperforms state-of-the-art link prediction methods in both accuracy and robustness. This analysis has further applications in evaluating link prediction algorithms and monitoring sudden changes in evolving network mechanisms. It will provide unique fundamental insights into the above-mentioned academic research fields, and will foster the development of advanced information filtering technologies of interest to information technology practitioners
Evaluating Overfit and Underfit in Models of Network Community Structure
A common data mining task on networks is community detection, which seeks an
unsupervised decomposition of a network into structural groups based on
statistical regularities in the network's connectivity. Although many methods
exist, the No Free Lunch theorem for community detection implies that each
makes some kind of tradeoff, and no algorithm can be optimal on all inputs.
Thus, different algorithms will over or underfit on different inputs, finding
more, fewer, or just different communities than is optimal, and evaluation
methods that use a metadata partition as a ground truth will produce misleading
conclusions about general accuracy. Here, we present a broad evaluation of over
and underfitting in community detection, comparing the behavior of 16
state-of-the-art community detection algorithms on a novel and structurally
diverse corpus of 406 real-world networks. We find that (i) algorithms vary
widely both in the number of communities they find and in their corresponding
composition, given the same input, (ii) algorithms can be clustered into
distinct high-level groups based on similarities of their outputs on real-world
networks, and (iii) these differences induce wide variation in accuracy on link
prediction and link description tasks. We introduce a new diagnostic for
evaluating overfitting and underfitting in practice, and use it to roughly
divide community detection methods into general and specialized learning
algorithms. Across methods and inputs, Bayesian techniques based on the
stochastic block model and a minimum description length approach to
regularization represent the best general learning approach, but can be
outperformed under specific circumstances. These results introduce both a
theoretically principled approach to evaluate over and underfitting in models
of network community structure and a realistic benchmark by which new methods
may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table
Network Model Selection for Task-Focused Attributed Network Inference
Networks are models representing relationships between entities. Often these
relationships are explicitly given, or we must learn a representation which
generalizes and predicts observed behavior in underlying individual data (e.g.
attributes or labels). Whether given or inferred, choosing the best
representation affects subsequent tasks and questions on the network. This work
focuses on model selection to evaluate network representations from data,
focusing on fundamental predictive tasks on networks. We present a modular
methodology using general, interpretable network models, task neighborhood
functions found across domains, and several criteria for robust model
selection. We demonstrate our methodology on three online user activity
datasets and show that network model selection for the appropriate network task
vs. an alternate task increases performance by an order of magnitude in our
experiments
Predicting the relevance of distributional semantic similarity with contextual information
International audienceUsing distributional analysis methods to compute semantic proximity links between words has become commonplace in NLP. The resulting relations are often noisy or difficult to interpret in general. This paper focuses on the issues of evaluating a distributional resource and filtering the relations it contains, but instead of considering it in abstracto, we focus on pairs of words in context. In a discourse , we are interested in knowing if the semantic link between two items is a by-product of textual coherence or is irrelevant. We first set up a human annotation of semantic links with or without contex-tual information to show the importance of the textual context in evaluating the relevance of semantic similarity, and to assess the prevalence of actual semantic relations between word tokens. We then built an experiment to automatically predict this relevance , evaluated on the reliable reference data set which was the outcome of the first annotation. We show that in-document information greatly improve the prediction made by the similarity level alone
Leveraging Friendship Networks for Dynamic Link Prediction in Social Interaction Networks
On-line social networks (OSNs) often contain many different types of
relationships between users. When studying the structure of OSNs such as
Facebook, two of the most commonly studied networks are friendship and
interaction networks. The link prediction problem in friendship networks has
been heavily studied. There has also been prior work on link prediction in
interaction networks, independent of friendship networks. In this paper, we
study the predictive power of combining friendship and interaction networks. We
hypothesize that, by leveraging friendship networks, we can improve the
accuracy of link prediction in interaction networks. We augment several
interaction link prediction algorithms to incorporate friendships and predicted
friendships. From experiments on Facebook data, we find that incorporating
friendships into interaction link prediction algorithms results in higher
accuracy, but incorporating predicted friendships does not when compared to
incorporating current friendships.Comment: To appear in ICWSM 2018. This version corrects some minor errors in
Table 1. MATLAB code available at
https://github.com/IdeasLabUT/Friendship-Interaction-Predictio
- …