1,335 research outputs found

    Explicit diversification of event aspects for temporal summarization

    Get PDF
    During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness

    Table-to-text Generation by Structure-aware Seq2seq Learning

    Full text link
    Table-to-text generation aims to generate a description for a factual table which can be viewed as a set of field-value records. To encode both the content and the structure of a table, we propose a novel structure-aware seq2seq architecture which consists of field-gating encoder and description generator with dual attention. In the encoding phase, we update the cell memory of the LSTM unit by a field gate and its corresponding field value in order to incorporate field information into table representation. In the decoding phase, dual attention mechanism which contains word level attention and field level attention is proposed to model the semantic relevance between the generated description and the table. We conduct experiments on the \texttt{WIKIBIO} dataset which contains over 700k biographies and corresponding infoboxes from Wikipedia. The attention visualizations and case studies show that our model is capable of generating coherent and informative descriptions based on the comprehensive understanding of both the content and the structure of a table. Automatic evaluations also show our model outperforms the baselines by a great margin. Code for this work is available on https://github.com/tyliupku/wiki2bio.Comment: Accepted by AAAI201

    First Women, Second Sex: Gender Bias in Wikipedia

    Full text link
    Contributing to history has never been as easy as it is today. Anyone with access to the Web is able to play a part on Wikipedia, an open and free encyclopedia. Wikipedia, available in many languages, is one of the most visited websites in the world and arguably one of the primary sources of knowledge on the Web. However, not everyone is contributing to Wikipedia from a diversity point of view; several groups are severely underrepresented. One of those groups is women, who make up approximately 16% of the current contributor community, meaning that most of the content is written by men. In addition, although there are specific guidelines of verifiability, notability, and neutral point of view that must be adhered by Wikipedia content, these guidelines are supervised and enforced by men. In this paper, we propose that gender bias is not about participation and representation only, but also about characterization of women. We approach the analysis of gender bias by defining a methodology for comparing the characterizations of men and women in biographies in three aspects: meta-data, language, and network structure. Our results show that, indeed, there are differences in characterization and structure. Some of these differences are reflected from the off-line world documented by Wikipedia, but other differences can be attributed to gender bias in Wikipedia content. We contextualize these differences in feminist theory and discuss their implications for Wikipedia policy.Comment: 10 pages, ACM style. Author's version of a paper to be presented at ACM Hypertext 201

    A matter of words: NLP for quality evaluation of Wikipedia medical articles

    Get PDF
    Automatic quality evaluation of Web information is a task with many fields of applications and of great relevance, especially in critical domains like the medical one. We move from the intuition that the quality of content of medical Web documents is affected by features related with the specific domain. First, the usage of a specific vocabulary (Domain Informativeness); then, the adoption of specific codes (like those used in the infoboxes of Wikipedia articles) and the type of document (e.g., historical and technical ones). In this paper, we propose to leverage specific domain features to improve the results of the evaluation of Wikipedia medical articles. In particular, we evaluate the articles adopting an "actionable" model, whose features are related to the content of the articles, so that the model can also directly suggest strategies for improving a given article quality. We rely on Natural Language Processing (NLP) and dictionaries-based techniques in order to extract the bio-medical concepts in a text. We prove the effectiveness of our approach by classifying the medical articles of the Wikipedia Medicine Portal, which have been previously manually labeled by the Wiki Project team. The results of our experiments confirm that, by considering domain-oriented features, it is possible to obtain sensible improvements with respect to existing solutions, mainly for those articles that other approaches have less correctly classified. Other than being interesting by their own, the results call for further research in the area of domain specific features suitable for Web data quality assessment

    The future of work: Towards a progressive agenda for all. EPC Issue Paper 9 DECEMBER 2019

    Get PDF
    Europe’s labour markets and the world of work in general are being transformed by the megatrends of globalisation, the fragmentation of the production and value chain, demographic ageing, new societal aspirations and the digitalisation of the economy. This Issue Paper presents the findings and policy recommendations of “The future of work – Towards a progressive agenda for all”, a European Policy Centre research project. Its main objectives were to expand public knowledge about these profound changes and to reverse the negative narrative often associated with this topic. It aimed to show how human decisions and the right policies can mitigate upcoming disruptions and provide European and national policymakers with a comprehensive toolkit for a progressive agenda for the new world of work
    • …
    corecore