2 research outputs found
Ranking Triples using Entity Links in a Large Web Crawl - The Chicory Triple Scorer at WSDM Cup 2017
This paper describes the participation of team Chicory in the Triple Ranking
Challenge of the WSDM Cup 2017. Our approach deploys a large collection of
entity tagged web data to estimate the correctness of the relevance relation
expressed by the triples, in combination with a baseline approach using
Wikipedia abstracts following [1]. Relevance estimations are drawn from
ClueWeb12 annotated by Google's entity linker, available publicly as the FACC1
dataset. Our implementation is automatically generated from a so-called 'search
strategy' that specifies declaratively how the input data are combined into a
final ranking of triples.Comment: Triple Scorer at WSDM Cup 2017, see arXiv:1712.0808
Overview of the Triple Scoring Task at the WSDM Cup 2017
This paper provides an overview of the triple scoring task at the WSDM Cup
2017, including a description of the task and the dataset, an overview of the
participating teams and their results, and a brief account of the methods
employed. In a nutshell, the task was to compute relevance scores for
knowledge-base triples from relations, where such scores make sense. Due to the
way the ground truth was constructed, scores were required to be integers from
the range 0..7. For example, reasonable scores for the triples "Tim Burton
profession Director" and "Tim Burton profession Actor" would be 7 and 2,
respectively, because Tim Burton is well-known as a director, but he acted only
in a few lesser known movies.
The triple scoring task attracted considerable interest, with 52 initial
registrations and 21 teams who submitted a valid run before the deadline. The
winning team achieved an accuracy of 87%, that is, for that fraction of the
triples from the test set (which was revealed only after the deadline) the
difference to the score from the ground truth was at most 2. The best result
for the average difference from the test set scores was 1.50