20,965 research outputs found

    Entropy-based approach to missing-links prediction

    Get PDF
    Link-prediction is an active research field within network theory, aiming at uncovering missing connections or predicting the emergence of future relationships from the observed network structure. This paper represents our contribution to the stream of research concerning missing links prediction. Here, we propose an entropy-based method to predict a given percentage of missing links, by identifying them with the most probable non-observed ones. The probability coefficients are computed by solving opportunely defined null-models over the accessible network structure. Upon comparing our likelihood-based, local method with the most popular algorithms over a set of economic, financial and food networks, we find ours to perform best, as pointed out by a number of statistical indicators (e.g. the precision, the area under the ROC curve, etc.). Moreover, the entropy-based formalism adopted in the present paper allows us to straightforwardly extend the link-prediction exercise to directed networks as well, thus overcoming one of the main limitations of current algorithms. The higher accuracy achievable by employing these methods - together with their larger flexibility - makes them strong competitors of available link-prediction algorithms

    Conditional network embeddings

    Get PDF
    Network Embeddings (NEs) map the nodes of a given network into dd-dimensional Euclidean space Rd\mathbb{R}^d. Ideally, this mapping is such that 'similar' nodes are mapped onto nearby points, such that the NE can be used for purposes such as link prediction (if 'similar' means being 'more likely to be connected') or classification (if 'similar' means 'being more likely to have the same label'). In recent years various methods for NE have been introduced, all following a similar strategy: defining a notion of similarity between nodes (typically some distance measure within the network), a distance measure in the embedding space, and a loss function that penalizes large distances for similar nodes and small distances for dissimilar nodes. A difficulty faced by existing methods is that certain networks are fundamentally hard to embed due to their structural properties: (approximate) multipartiteness, certain degree distributions, assortativity, etc. To overcome this, we introduce a conceptual innovation to the NE literature and propose to create \emph{Conditional Network Embeddings} (CNEs); embeddings that maximally add information with respect to given structural properties (e.g. node degrees, block densities, etc.). We use a simple Bayesian approach to achieve this, and propose a block stochastic gradient descent algorithm for fitting it efficiently. We demonstrate that CNEs are superior for link prediction and multi-label classification when compared to state-of-the-art methods, and this without adding significant mathematical or computational complexity. Finally, we illustrate the potential of CNE for network visualization

    Benchmarking Network Embedding Models for Link Prediction: Are We Making Progress?

    Get PDF
    Network embedding methods map a network's nodes to vectors in an embedding space, in such a way that these representations are useful for estimating some notion of similarity or proximity between pairs of nodes in the network. The quality of these node representations is then showcased through results of downstream prediction tasks. Commonly used benchmark tasks such as link prediction, however, present complex evaluation pipelines and an abundance of design choices. This, together with a lack of standardized evaluation setups can obscure the real progress in the field. In this paper, we aim to shed light on the state-of-the-art of network embedding methods for link prediction and show, using a consistent evaluation pipeline, that only thin progress has been made over the last years. The newly conducted benchmark that we present here, including 17 embedding methods, also shows that many approaches are outperformed even by simple heuristics. Finally, we argue that standardized evaluation tools can repair this situation and boost future progress in this field
    corecore