15,940 research outputs found
Distributed Holistic Clustering on Linked Data
Link discovery is an active field of research to support data integration in
the Web of Data. Due to the huge size and number of available data sources,
efficient and effective link discovery is a very challenging task. Common
pairwise link discovery approaches do not scale to many sources with very large
entity sets. We here propose a distributed holistic approach to link many data
sources based on a clustering of entities that represent the same real-world
object. Our clustering approach provides a compact and fused representation of
entities, and can identify errors in existing links as well as many new links.
We support a distributed execution of the clustering approach to achieve faster
execution times and scalability for large real-world data sets. We provide a
novel gold standard for multi-source clustering, and evaluate our methods with
respect to effectiveness and efficiency for large data sets from the geographic
and music domains
Taming computational complexity: efficient and parallel SimRank optimizations on undirected graphs
SimRank has been considered as one of the promising link-based ranking algorithms to evaluate similarities of web documents in many modern search engines. In this paper, we investigate the optimization problem of SimRank similarity computation on undirected web graphs. We first present a novel algorithm to estimate the SimRank between vertices in O(n3+ Kn2) time, where n is the number of vertices, and K is the number of iterations. In comparison, the most efficient implementation of SimRank algorithm in [1] takes O(K n3 ) time in the worst case. To efficiently handle large-scale computations, we also propose a parallel implementation of the SimRank algorithm on multiple processors. The experimental evaluations on both synthetic and real-life data sets demonstrate the better computational time and parallel efficiency of our proposed techniques
Effective Mechanism for Social Recommendation of News
Recommendation systems represent an important tool for news distribution on
the Internet. In this work we modify a recently proposed social recommendation
model in order to deal with no explicit ratings of users on news. The model
consists of a network of users which continually adapts in order to achieve an
efficient news traffic. To optimize network's topology we propose different
stochastic algorithms that are scalable with respect to the network's size.
Agent-based simulations reveal the features and the performance of these
algorithms. To overcome the resultant drawbacks of each method we introduce two
improved algorithms and show that they can optimize network's topology almost
as fast and effectively as other not-scalable methods that make use of much
more information
- …