7,700 research outputs found
Correcting Knowledge Base Assertions
The usefulness and usability of knowledge bases (KBs) is often limited by quality issues. One common issue is the presence of erroneous assertions, often caused by lexical or semantic confusion. We study the problem of correcting such assertions, and present a general correction framework which combines lexical matching, semantic embedding, soft constraint mining and semantic consistency checking. The framework is evaluated using DBpedia and an enterprise medical KB
Why We Read Wikipedia
Wikipedia is one of the most popular sites on the Web, with millions of users
relying on it to satisfy a broad range of information needs every day. Although
it is crucial to understand what exactly these needs are in order to be able to
meet them, little is currently known about why users visit Wikipedia. The goal
of this paper is to fill this gap by combining a survey of Wikipedia readers
with a log-based analysis of user activity. Based on an initial series of user
surveys, we build a taxonomy of Wikipedia use cases along several dimensions,
capturing users' motivations to visit Wikipedia, the depth of knowledge they
are seeking, and their knowledge of the topic of interest prior to visiting
Wikipedia. Then, we quantify the prevalence of these use cases via a
large-scale user survey conducted on live Wikipedia with almost 30,000
responses. Our analyses highlight the variety of factors driving users to
Wikipedia, such as current events, media coverage of a topic, personal
curiosity, work or school assignments, or boredom. Finally, we match survey
responses to the respondents' digital traces in Wikipedia's server logs,
enabling the discovery of behavioral patterns associated with specific use
cases. For instance, we observe long and fast-paced page sequences across
topics for users who are bored or exploring randomly, whereas those using
Wikipedia for work or school spend more time on individual articles focused on
topics such as science. Our findings advance our understanding of reader
motivations and behavior on Wikipedia and can have implications for developers
aiming to improve Wikipedia's user experience, editors striving to cater to
their readers' needs, third-party services (such as search engines) providing
access to Wikipedia content, and researchers aiming to build tools such as
recommendation engines.Comment: Published in WWW'17; v2 fixes caption of Table
The Evolution of Wikipedia's Norm Network
Social norms have traditionally been difficult to quantify. In any particular
society, their sheer number and complex interdependencies often limit a
system-level analysis. One exception is that of the network of norms that
sustain the online Wikipedia community. We study the fifteen-year evolution of
this network using the interconnected set of pages that establish, describe,
and interpret the community's norms. Despite Wikipedia's reputation for
\textit{ad hoc} governance, we find that its normative evolution is highly
conservative. The earliest users create norms that both dominate the network
and persist over time. These core norms govern both content and interpersonal
interactions using abstract principles such as neutrality, verifiability, and
assume good faith. As the network grows, norm neighborhoods decouple
topologically from each other, while increasing in semantic coherence. Taken
together, these results suggest that the evolution of Wikipedia's norm network
is akin to bureaucratic systems that predate the information age.Comment: 22 pages, 9 figures. Matches published version. Data available at
http://bit.ly/wiki_nor
Collaboratively Patching Linked Data
Today's Web of Data is noisy. Linked Data often needs extensive preprocessing
to enable efficient use of heterogeneous resources. While consistent and valid
data provides the key to efficient data processing and aggregation we are
facing two main challenges: (1st) Identification of erroneous facts and
tracking their origins in dynamically connected datasets is a difficult task,
and (2nd) efforts in the curation of deficient facts in Linked Data are
exchanged rather rarely. Since erroneous data often is duplicated and
(re-)distributed by mashup applications it is not only the responsibility of a
few original publishers to keep their data tidy, but progresses to be a mission
for all distributers and consumers of Linked Data too. We present a new
approach to expose and to reuse patches on erroneous data to enhance and to add
quality information to the Web of Data. The feasibility of our approach is
demonstrated by example of a collaborative game that patches statements in
DBpedia data and provides notifications for relevant changes.Comment: 2nd International Workshop on Usage Analysis and the Web of Data
(USEWOD2012) in the 21st International World Wide Web Conference (WWW2012),
Lyon, France, April 17th, 201
Recommended from our members
Lexical patterns, features and knowledge resources for coreference resolution in clinical notes
Generation of entity coreference chains provides a means to extract linked narrative events from clinical notes, but despite being a well-researched topic in natural language processing, general- purpose coreference tools perform poorly on clinical texts. This paper presents a knowledge-centric and pattern-based approach to resolving coreference across a wide variety of clinical records comprising discharge summaries, progress notes, pathology, radiology and surgical reports from two corpora (Ontology Development and Information Extraction (ODIE) and i2b2/VA). In addition, a method for generating coreference chains using progressively pruned linked lists is demonstrated that reduces the search space and facilitates evaluation by a number of metrics. Independent evaluation results show an F-measure for each corpus of 79.2% and 87.5%, respectively, which offers performance at least as good as human annotators, greatly increased performance over general- purpose tools, and improvement on previously reported clinical coreference systems. The system uses a number of open-source components that are available to download
- …