46,069 research outputs found
A Review on Computing Semantic Similarity of Concepts in Knowledge Graphs
Semantic similarity is a metric defined over a set of documents or terms, where the idea of distance between them is based on the likeness of their meaning or semantic content as opposed to similarity which can be estimated regarding their syntactical representation (e.g. their string format). One of the drawbacks of conventional knowledge-based approaches (e.g. path or lch) in addressing such task is that the semantic similarity of any two concepts with the same path length is the same (uniform distance problem).To propose a weighted path length (wpath) method to combine both path length and IC in measuring the semantic similarity between concepts. The IC of two concepts� LCS is used to weight their shortest path length so that those concept pairs having same path length can have different semantic similarity score if they have different LCS
SiGMa: Simple Greedy Matching for Aligning Large Knowledge Bases
The Internet has enabled the creation of a growing number of large-scale
knowledge bases in a variety of domains containing complementary information.
Tools for automatically aligning these knowledge bases would make it possible
to unify many sources of structured knowledge and answer complex queries.
However, the efficient alignment of large-scale knowledge bases still poses a
considerable challenge. Here, we present Simple Greedy Matching (SiGMa), a
simple algorithm for aligning knowledge bases with millions of entities and
facts. SiGMa is an iterative propagation algorithm which leverages both the
structural information from the relationship graph as well as flexible
similarity measures between entity properties in a greedy local search, thus
making it scalable. Despite its greedy nature, our experiments indicate that
SiGMa can efficiently match some of the world's largest knowledge bases with
high precision. We provide additional experiments on benchmark datasets which
demonstrate that SiGMa can outperform state-of-the-art approaches both in
accuracy and efficiency.Comment: 10 pages + 2 pages appendix; 5 figures -- initial preprin
- …