60 research outputs found
Uncovering missing links with cold ends
To evaluate the performance of prediction of missing links, the known data
are randomly divided into two parts, the training set and the probe set. We
argue that this straightforward and standard method may lead to terrible bias,
since in real biological and information networks, missing links are more
likely to be links connecting low-degree nodes. We therefore study how to
uncover missing links with low-degree nodes, namely links in the probe set are
of lower degree products than a random sampling. Experimental analysis on ten
local similarity indices and four disparate real networks reveals a surprising
result that the Leicht-Holme-Newman index [E. A. Leicht, P. Holme, and M. E. J.
Newman, Phys. Rev. E 73, 026120 (2006)] performs the best, although it was
known to be one of the worst indices if the probe set is a random sampling of
all links. We further propose an parameter-dependent index, which considerably
improves the prediction accuracy. Finally, we show the relevance of the
proposed index on three real sampling methods.Comment: 16 pages, 5 figures, 6 table
Handling oversampling in dynamic networks using link prediction
Oversampling is a common characteristic of data representing dynamic
networks. It introduces noise into representations of dynamic networks, but
there has been little work so far to compensate for it. Oversampling can affect
the quality of many important algorithmic problems on dynamic networks,
including link prediction. Link prediction seeks to predict edges that will be
added to the network given previous snapshots. We show that not only does
oversampling affect the quality of link prediction, but that we can use link
prediction to recover from the effects of oversampling. We also introduce a
novel generative model of noise in dynamic networks that represents
oversampling. We demonstrate the results of our approach on both synthetic and
real-world data.Comment: ECML/PKDD 201
Effective and Efficient Similarity Index for Link Prediction of Complex Networks
Predictions of missing links of incomplete networks like protein-protein
interaction networks or very likely but not yet existent links in evolutionary
networks like friendship networks in web society can be considered as a
guideline for further experiments or valuable information for web users. In
this paper, we introduce a local path index to estimate the likelihood of the
existence of a link between two nodes. We propose a network model with
controllable density and noise strength in generating links, as well as collect
data of six real networks. Extensive numerical simulations on both modeled
networks and real networks demonstrated the high effectiveness and efficiency
of the local path index compared with two well-known and widely used indices,
the common neighbors and the Katz index. Indeed, the local path index provides
competitively accurate predictions as the Katz index while requires much less
CPU time and memory space, which is therefore a strong candidate for potential
practical applications in data mining of huge-size networks.Comment: 8 pages, 5 figures, 3 table
Automatic Metadata Generation using Associative Networks
In spite of its tremendous value, metadata is generally sparse and
incomplete, thereby hampering the effectiveness of digital information
services. Many of the existing mechanisms for the automated creation of
metadata rely primarily on content analysis which can be costly and
inefficient. The automatic metadata generation system proposed in this article
leverages resource relationships generated from existing metadata as a medium
for propagation from metadata-rich to metadata-poor resources. Because of its
independence from content analysis, it can be applied to a wide variety of
resource media types and is shown to be computationally inexpensive. The
proposed method operates through two distinct phases. Occurrence and
co-occurrence algorithms first generate an associative network of repository
resources leveraging existing repository metadata. Second, using the
associative network as a substrate, metadata associated with metadata-rich
resources is propagated to metadata-poor resources by means of a discrete-form
spreading activation algorithm. This article discusses the general framework
for building associative networks, an algorithm for disseminating metadata
through such networks, and the results of an experiment and validation of the
proposed method using a standard bibliographic dataset
- …