Search CORE

60 research outputs found

Uncovering missing links with cold ends

Author: Adamic
Albert
Amaral
Barabási
Biernacki
Boccaletti
Butts
Cohen
Costa
Dorogovtsev
Getoor
Guimerà
Hanely
Jaccard
Kossinets
Leicht
Liben-Nowell
Linyuan Lü
Liu
Liu
Liu
Lovász
Lü
Lü
Lü
Molloy
Neal
Newman
Newman
Newman
Newman
Ou
Qian-Ming Zhang
Ravasz
Salton
Stumpf
Sørensen
Tao Zhou
von Mering
Wang
Watts
Yan
Yu
Yu-Xiao Zhu
Zeng
Zhang
Zhou
Zhou
Publication venue: 'Elsevier BV'
Publication date: 02/10/2011
Field of study

To evaluate the performance of prediction of missing links, the known data are randomly divided into two parts, the training set and the probe set. We argue that this straightforward and standard method may lead to terrible bias, since in real biological and information networks, missing links are more likely to be links connecting low-degree nodes. We therefore study how to uncover missing links with low-degree nodes, namely links in the probe set are of lower degree products than a random sampling. Experimental analysis on ten local similarity indices and four disparate real networks reveals a surprising result that the Leicht-Holme-Newman index [E. A. Leicht, P. Holme, and M. E. J. Newman, Phys. Rev. E 73, 026120 (2006)] performs the best, although it was known to be one of the worst indices if the probe set is a random sampling of all links. We further propose an parameter-dependent index, which considerably improves the prediction accuracy. Finally, we show the relevance of the proposed index on three real sampling methods.Comment: 16 pages, 5 figures, 6 table

arXiv.org e-Print Archive

Crossref

RERO DOC Digital Library

Handling oversampling in dynamic networks using link prediction

Author: A-L Barabási
J Zhu
LA Adamic
MH Hansen
N Eagle
P Baldi
P Erdős
P Holme
P Sarkar
V Freschi
Y Liu
Publication venue
Publication date: 11/08/2015
Field of study

Oversampling is a common characteristic of data representing dynamic networks. It introduces noise into representations of dynamic networks, but there has been little work so far to compensate for it. Oversampling can affect the quality of many important algorithmic problems on dynamic networks, including link prediction. Link prediction seeks to predict edges that will be added to the network given previous snapshots. We show that not only does oversampling affect the quality of link prediction, but that we can use link prediction to recover from the effects of oversampling. We also introduce a novel generative model of noise in dynamic networks that represents oversampling. We demonstrate the results of our approach on both synthetic and real-world data.Comment: ECML/PKDD 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Effective and Efficient Similarity Index for Link Prediction of Complex Networks

Author: A. Popescul
B. Gallagher
C. D. Manning
Ci-Hang Jin
D. Lin
F. Lorrain
G. H. Golub
G. Jeh
G. Salton
G. Salton
J. A. Hanely
J. Zhu
K. Yu
L. Getoor
L. Lü
Linyuan Lü
M. Bilgic
P. Jaccard
S. Geisser
T. Murata
T. Sørensen
T. Zhou
Tao Zhou
Z. Huang
Publication venue: 'American Physical Society (APS)'
Publication date: 26/08/2009
Field of study

Predictions of missing links of incomplete networks like protein-protein interaction networks or very likely but not yet existent links in evolutionary networks like friendship networks in web society can be considered as a guideline for further experiments or valuable information for web users. In this paper, we introduce a local path index to estimate the likelihood of the existence of a link between two nodes. We propose a network model with controllable density and noise strength in generating links, as well as collect data of six real networks. Extensive numerical simulations on both modeled networks and real networks demonstrated the high effectiveness and efficiency of the local path index compared with two well-known and widely used indices, the common neighbors and the Katz index. Indeed, the local path index provides competitively accurate predictions as the Katz index while requires much less CPU time and memory space, which is therefore a strong candidate for potential practical applications in data mining of huge-size networks.Comment: 8 pages, 5 figures, 3 table

arXiv.org e-Print Archive

Crossref

RERO DOC Digital Library

Automatic Metadata Generation using Associative Networks

Author: de Lin S.
Han H.
Herbert Van De Sompel
Johan Bollen
Mao S.
Marko A. Rodriguez
Rorvig M.
Yang H.-C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/03/2009
Field of study

In spite of its tremendous value, metadata is generally sparse and incomplete, thereby hampering the effectiveness of digital information services. Many of the existing mechanisms for the automated creation of metadata rely primarily on content analysis which can be costly and inefficient. The automatic metadata generation system proposed in this article leverages resource relationships generated from existing metadata as a medium for propagation from metadata-rich to metadata-poor resources. Because of its independence from content analysis, it can be applied to a wide variety of resource media types and is shown to be computationally inexpensive. The proposed method operates through two distinct phases. Occurrence and co-occurrence algorithms first generate an associative network of repository resources leveraging existing repository metadata. Second, using the associative network as a substrate, metadata associated with metadata-rich resources is propagated to metadata-poor resources by means of a discrete-form spreading activation algorithm. This article discusses the general framework for building associative networks, an algorithm for disseminating metadata through such networks, and the results of an experiment and validation of the proposed method using a standard bibliographic dataset

arXiv.org e-Print Archive

Crossref