1,335 research outputs found
Explicit diversification of event aspects for temporal summarization
During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness
Table-to-text Generation by Structure-aware Seq2seq Learning
Table-to-text generation aims to generate a description for a factual table
which can be viewed as a set of field-value records. To encode both the content
and the structure of a table, we propose a novel structure-aware seq2seq
architecture which consists of field-gating encoder and description generator
with dual attention. In the encoding phase, we update the cell memory of the
LSTM unit by a field gate and its corresponding field value in order to
incorporate field information into table representation. In the decoding phase,
dual attention mechanism which contains word level attention and field level
attention is proposed to model the semantic relevance between the generated
description and the table. We conduct experiments on the \texttt{WIKIBIO}
dataset which contains over 700k biographies and corresponding infoboxes from
Wikipedia. The attention visualizations and case studies show that our model is
capable of generating coherent and informative descriptions based on the
comprehensive understanding of both the content and the structure of a table.
Automatic evaluations also show our model outperforms the baselines by a great
margin. Code for this work is available on
https://github.com/tyliupku/wiki2bio.Comment: Accepted by AAAI201
First Women, Second Sex: Gender Bias in Wikipedia
Contributing to history has never been as easy as it is today. Anyone with
access to the Web is able to play a part on Wikipedia, an open and free
encyclopedia. Wikipedia, available in many languages, is one of the most
visited websites in the world and arguably one of the primary sources of
knowledge on the Web. However, not everyone is contributing to Wikipedia from a
diversity point of view; several groups are severely underrepresented. One of
those groups is women, who make up approximately 16% of the current contributor
community, meaning that most of the content is written by men. In addition,
although there are specific guidelines of verifiability, notability, and
neutral point of view that must be adhered by Wikipedia content, these
guidelines are supervised and enforced by men.
In this paper, we propose that gender bias is not about participation and
representation only, but also about characterization of women. We approach the
analysis of gender bias by defining a methodology for comparing the
characterizations of men and women in biographies in three aspects: meta-data,
language, and network structure. Our results show that, indeed, there are
differences in characterization and structure. Some of these differences are
reflected from the off-line world documented by Wikipedia, but other
differences can be attributed to gender bias in Wikipedia content. We
contextualize these differences in feminist theory and discuss their
implications for Wikipedia policy.Comment: 10 pages, ACM style. Author's version of a paper to be presented at
ACM Hypertext 201
A matter of words: NLP for quality evaluation of Wikipedia medical articles
Automatic quality evaluation of Web information is a task with many fields of
applications and of great relevance, especially in critical domains like the
medical one. We move from the intuition that the quality of content of medical
Web documents is affected by features related with the specific domain. First,
the usage of a specific vocabulary (Domain Informativeness); then, the adoption
of specific codes (like those used in the infoboxes of Wikipedia articles) and
the type of document (e.g., historical and technical ones). In this paper, we
propose to leverage specific domain features to improve the results of the
evaluation of Wikipedia medical articles. In particular, we evaluate the
articles adopting an "actionable" model, whose features are related to the
content of the articles, so that the model can also directly suggest strategies
for improving a given article quality. We rely on Natural Language Processing
(NLP) and dictionaries-based techniques in order to extract the bio-medical
concepts in a text. We prove the effectiveness of our approach by classifying
the medical articles of the Wikipedia Medicine Portal, which have been
previously manually labeled by the Wiki Project team. The results of our
experiments confirm that, by considering domain-oriented features, it is
possible to obtain sensible improvements with respect to existing solutions,
mainly for those articles that other approaches have less correctly classified.
Other than being interesting by their own, the results call for further
research in the area of domain specific features suitable for Web data quality
assessment
The future of work: Towards a progressive agenda for all. EPC Issue Paper 9 DECEMBER 2019
Europe’s labour markets and the world of work in general are being transformed by the megatrends of globalisation, the fragmentation of the production and value chain, demographic ageing, new societal aspirations and the digitalisation of the economy. This Issue Paper presents the findings and policy recommendations of “The future of work – Towards a progressive agenda for all”, a European Policy Centre research project. Its main objectives were to expand public knowledge about these profound changes and to reverse the negative narrative often associated with this topic. It aimed to show how human decisions and the right policies can mitigate upcoming disruptions and provide European and national policymakers with a comprehensive toolkit for a progressive agenda for the new world of work
- …