307,024 research outputs found

    EvalNE : a framework for evaluating network embeddings on link prediction

    Get PDF
    In this paper, we present EvalNE, a Python toolbox for evaluating network embedding methods on link prediction tasks. Link prediction is one of the most popular choices for evaluating the quality of network embeddings. However, the complexity of this task requires a carefully designed evaluation pipeline to provide consistent, reproducible and comparable results. EvalNE simplifies this process by providing automation and abstraction of tasks such as hyper-parameter tuning and model validation, edge sampling and negative edge sampling, computation of edge embeddings from node embeddings, and evaluation metrics. The toolbox allows for the evaluation of any off-the-shelf embedding method without the need to write extra code. Moreover, it can also be used for evaluating link prediction methods and integrates several link prediction heuristics as baselines. Finally, demonstrating the usefulness of EvalNE in practice, we conduct an extensive analysis where we replicate the experimental sections of several influential papers in the community

    Toward link predictability of complex networks

    Get PDF
    The organization of real networks usually embodies both regularities and irregularities, and, in principle, the former can be modeled. The extent to which the formation of a network can be explained coincides with our ability to predict missing links. To understand network organization, we should be able to estimate link predictability. We assume that the regularity of a network is reflected in the consistency of structural features before and after a random removal of a small set of links. Based on the perturbation of the adjacency matrix, we propose a universal structural consistency index that is free of prior knowledge of network organization. Extensive experiments on disparate real-world networks demonstrate that (i) structural consistency is a good estimation of link predictability and (ii) a derivative algorithm outperforms state-of-the-art link prediction methods in both accuracy and robustness. This analysis has further applications in evaluating link prediction algorithms and monitoring sudden changes in evolving network mechanisms. It will provide unique fundamental insights into the above-mentioned academic research fields, and will foster the development of advanced information filtering technologies of interest to information technology practitioners

    Evaluating Overfit and Underfit in Models of Network Community Structure

    Full text link
    A common data mining task on networks is community detection, which seeks an unsupervised decomposition of a network into structural groups based on statistical regularities in the network's connectivity. Although many methods exist, the No Free Lunch theorem for community detection implies that each makes some kind of tradeoff, and no algorithm can be optimal on all inputs. Thus, different algorithms will over or underfit on different inputs, finding more, fewer, or just different communities than is optimal, and evaluation methods that use a metadata partition as a ground truth will produce misleading conclusions about general accuracy. Here, we present a broad evaluation of over and underfitting in community detection, comparing the behavior of 16 state-of-the-art community detection algorithms on a novel and structurally diverse corpus of 406 real-world networks. We find that (i) algorithms vary widely both in the number of communities they find and in their corresponding composition, given the same input, (ii) algorithms can be clustered into distinct high-level groups based on similarities of their outputs on real-world networks, and (iii) these differences induce wide variation in accuracy on link prediction and link description tasks. We introduce a new diagnostic for evaluating overfitting and underfitting in practice, and use it to roughly divide community detection methods into general and specialized learning algorithms. Across methods and inputs, Bayesian techniques based on the stochastic block model and a minimum description length approach to regularization represent the best general learning approach, but can be outperformed under specific circumstances. These results introduce both a theoretically principled approach to evaluate over and underfitting in models of network community structure and a realistic benchmark by which new methods may be evaluated and compared.Comment: 22 pages, 13 figures, 3 table

    Network Model Selection for Task-Focused Attributed Network Inference

    Full text link
    Networks are models representing relationships between entities. Often these relationships are explicitly given, or we must learn a representation which generalizes and predicts observed behavior in underlying individual data (e.g. attributes or labels). Whether given or inferred, choosing the best representation affects subsequent tasks and questions on the network. This work focuses on model selection to evaluate network representations from data, focusing on fundamental predictive tasks on networks. We present a modular methodology using general, interpretable network models, task neighborhood functions found across domains, and several criteria for robust model selection. We demonstrate our methodology on three online user activity datasets and show that network model selection for the appropriate network task vs. an alternate task increases performance by an order of magnitude in our experiments

    Predicting the relevance of distributional semantic similarity with contextual information

    Get PDF
    International audienceUsing distributional analysis methods to compute semantic proximity links between words has become commonplace in NLP. The resulting relations are often noisy or difficult to interpret in general. This paper focuses on the issues of evaluating a distributional resource and filtering the relations it contains, but instead of considering it in abstracto, we focus on pairs of words in context. In a discourse , we are interested in knowing if the semantic link between two items is a by-product of textual coherence or is irrelevant. We first set up a human annotation of semantic links with or without contex-tual information to show the importance of the textual context in evaluating the relevance of semantic similarity, and to assess the prevalence of actual semantic relations between word tokens. We then built an experiment to automatically predict this relevance , evaluated on the reliable reference data set which was the outcome of the first annotation. We show that in-document information greatly improve the prediction made by the similarity level alone

    Leveraging Friendship Networks for Dynamic Link Prediction in Social Interaction Networks

    Full text link
    On-line social networks (OSNs) often contain many different types of relationships between users. When studying the structure of OSNs such as Facebook, two of the most commonly studied networks are friendship and interaction networks. The link prediction problem in friendship networks has been heavily studied. There has also been prior work on link prediction in interaction networks, independent of friendship networks. In this paper, we study the predictive power of combining friendship and interaction networks. We hypothesize that, by leveraging friendship networks, we can improve the accuracy of link prediction in interaction networks. We augment several interaction link prediction algorithms to incorporate friendships and predicted friendships. From experiments on Facebook data, we find that incorporating friendships into interaction link prediction algorithms results in higher accuracy, but incorporating predicted friendships does not when compared to incorporating current friendships.Comment: To appear in ICWSM 2018. This version corrects some minor errors in Table 1. MATLAB code available at https://github.com/IdeasLabUT/Friendship-Interaction-Predictio
    • …
    corecore