26,808 research outputs found
A matter of words: NLP for quality evaluation of Wikipedia medical articles
Automatic quality evaluation of Web information is a task with many fields of
applications and of great relevance, especially in critical domains like the
medical one. We move from the intuition that the quality of content of medical
Web documents is affected by features related with the specific domain. First,
the usage of a specific vocabulary (Domain Informativeness); then, the adoption
of specific codes (like those used in the infoboxes of Wikipedia articles) and
the type of document (e.g., historical and technical ones). In this paper, we
propose to leverage specific domain features to improve the results of the
evaluation of Wikipedia medical articles. In particular, we evaluate the
articles adopting an "actionable" model, whose features are related to the
content of the articles, so that the model can also directly suggest strategies
for improving a given article quality. We rely on Natural Language Processing
(NLP) and dictionaries-based techniques in order to extract the bio-medical
concepts in a text. We prove the effectiveness of our approach by classifying
the medical articles of the Wikipedia Medicine Portal, which have been
previously manually labeled by the Wiki Project team. The results of our
experiments confirm that, by considering domain-oriented features, it is
possible to obtain sensible improvements with respect to existing solutions,
mainly for those articles that other approaches have less correctly classified.
Other than being interesting by their own, the results call for further
research in the area of domain specific features suitable for Web data quality
assessment
Dynamics of Content Quality in Collaborative Knowledge Production
We explore the dynamics of user performance in collaborative knowledge
production by studying the quality of answers to questions posted on Stack
Exchange. We propose four indicators of answer quality: answer length, the
number of code lines and hyperlinks to external web content it contains, and
whether it is accepted by the asker as the most helpful answer to the question.
Analyzing millions of answers posted over the period from 2008 to 2014, we
uncover regular short-term and long-term changes in quality. In the short-term,
quality deteriorates over the course of a single session, with each successive
answer becoming shorter, with fewer code lines and links, and less likely to be
accepted. In contrast, performance improves over the long-term, with more
experienced users producing higher quality answers. These trends are not a
consequence of data heterogeneity, but rather have a behavioral origin. Our
findings highlight the complex interplay between short-term deterioration in
performance, potentially due to mental fatigue or attention depletion, and
long-term performance improvement due to learning and skill acquisition, and
its impact on the quality of user-generated content
Recommended from our members
The organisational impact of open educational resources
The open educational resource (OER) movement has been growing rapidly since 2001, stimulated by funding from benefactors such as the Hewlett Foundation and UNESCO, and providing educational content freely to institutions and learners across the World. Individuals and organisations are motivated by a variety of drivers to produce OERs, both altruistic and self-interested. There are parallels with the open source movement where authors and others combine their efforts to provide a product which they and others can use freely and adapt to their own purposes. There are many different ways in which OER initiatives are organised and an infinite range of possibilities for how the OERs themselves are constituted. If institutions are to develop sustainable OER initiatives they need to build successful change management initiatives, developing models for the production and quality assurance of OERs, licensing them through appropriate mechanisms such as the Creative Commons, and considering how the resources will be discovered and used by learners
Overview of VideoCLEF 2009: New perspectives on speech-based multimedia content enrichment
VideoCLEF 2009 offered three tasks related to enriching video content for improved multimedia access in a multilingual environment. For each task, video data (Dutch-language television, predominantly documentaries) accompanied by speech recognition transcripts were provided.
The Subject Classification Task involved automatic tagging of videos with subject theme labels. The best performance was achieved by approaching subject tagging as an information retrieval task and using both speech recognition transcripts and archival metadata. Alternatively, classifiers were trained using either the training data provided or data collected from Wikipedia or via general Web search. The Affect Task involved detecting narrative peaks, defined as points where viewers perceive heightened dramatic tension. The task was carried out on the âBeeldenstormâ collection containing 45 short-form documentaries on the visual arts. The best runs exploited affective vocabulary and audience directed speech. Other approaches included using topic changes, elevated speaking pitch, increased speaking intensity and radical visual changes. The Linking Task, also called âFinding Related Resources Across Languages,â involved linking video to material on the same subject in a different language.
Participants were provided with a list of multimedia anchors (short video segments) in the Dutch-language âBeeldenstormâ collection and were expected to return target pages drawn from English-language Wikipedia. The best performing methods used the transcript of the
speech spoken during the multimedia anchor to build a query to search an index of the Dutch language Wikipedia. The Dutch Wikipedia pages returned were used to identify related English pages. Participants also experimented with pseudo-relevance feedback, query translation and methods that targeted proper names
- âŠ