An important editing policy in Wikipedia is to provide citations for added
statements in Wikipedia pages, where statements can be arbitrary pieces of
text, ranging from a sentence to a paragraph. In many cases citations are
either outdated or missing altogether.
In this work we address the problem of finding and updating news citations
for statements in entity pages. We propose a two-stage supervised approach for
this problem. In the first step, we construct a classifier to find out whether
statements need a news citation or other kinds of citations (web, book,
journal, etc.). In the second step, we develop a news citation algorithm for
Wikipedia statements, which recommends appropriate citations from a given news
collection. Apart from IR techniques that use the statement to query the news
collection, we also formalize three properties of an appropriate citation,
namely: (i) the citation should entail the Wikipedia statement, (ii) the
statement should be central to the citation, and (iii) the citation should be
from an authoritative source.
We perform an extensive evaluation of both steps, using 20 million articles
from a real-world news collection. Our results are quite promising, and show
that we can perform this task with high precision and at scale