13 research outputs found

    Leveraging syntactic parsing to improve event annotation matching

    Get PDF
    Detecting event mentions is the first step in event extraction from text and annotating them is a notoriously difficult task. Evaluating annotator consistency is crucial when building datasets for mention detection. When event mentions are allowed to cover many tokens, annotators may disagree on their span, which means that overlapping annotations may then refer to the same event or to different events. This paper explores different fuzzy matching functions which aim to resolve this ambiguity. The functions extract the sets of syntactic heads present in the annotations, use the Dice coefficient to measure the similarity between sets and return a judgment based on a given threshold. The functions are tested against the judgments of a human evaluator and a comparison is made between sets of tokens and sets of syntactic heads. The best-performing function is a head-based function that is found to agree with the human evaluator in 89% of cases

    News diversity and recommendation systems : setting the interdisciplinary scene

    Get PDF
    Concerns about selective exposure and filter bubbles in the digital news environment trigger questions regarding how news recommender systems can become more citizen-oriented and facilitate – rather than limit – normative aims of journalism. Accordingly, this chapter presents building blocks for the construction of such a news algorithm as they are being developed by the Ghent University interdisciplinary research project #NewsDNA, of which the primary aim is to actually build, evaluate and test a diversity-enhancing news recommender. As such, the deployment of artificial intelligence could support the media in providing people with information and stimulating public debate, rather than undermine their role in that respect. To do so, it combines insights from computer sciences (news recommender systems), law (right to receive information), communication sciences (conceptualisations of news diversity), and computational linguistics (automated content extraction from text). To gather feedback from scholars of different backgrounds, this research has been presented and discussed during the 2019 IFIP summer school workshop on ‘co-designing a personalised news diversity algorithmic model based on news consumers’ agency and fine-grained content modelling’. This contribution also reflects the results of that dialogue

    News topic classification as a first step towards diverse news recommendation

    Get PDF
    When developing an algorithm that uses news diversity as a key driver for personalized news recommendation it is crucial to focus on means to cluster news articles in a fine-grained manner, ideally by leveraging the content of the text. In this paper we investigate semantic classification of news articles in an unfiltered news stream. We first present an analysis of the EventDNA corpus: a collection of Dutch-language news articles annotated with event data according to a predefined typology. We found that the types assigned as features of events do not allow for such a semantic classification and investigate the IPTC News Media Topics standard as an alternative. By mapping event types with manually-assigned IPTC topics, we observe that a more diversified picture emerges, which leads us to conclude that the IPTC classification is a useful proxy. Based on a historical data sample of Dutch news articles covering the year 2018, we then perform a series of machine learning experiments in order to automatically predict the top two levels of the IPTC taxonomy. Various multi-label classification models are built with BERTje using a bottom-up and top-down approach. The results reveal that the top-down approach yields the best results, with an overall macro F-1 score of 86.4% and a Jaccard accuracy of 89.2% for the level-one topics and one of 83.7% and 87.5% for the level-two predictions

    Joint event extraction based on hierarchical event schemas from framenet

    Get PDF
    Event extraction is useful for many practical applications, such as news summarization and information retrieval. However, the popular automatic context extraction (ACE) event extraction program only defines very limited and coarse event schemas, which may not be suitable for practical applications. FrameNet is a linguistic corpus that defines complete semantic frames and frame-to-frame relations. As frames in FrameNet share highly similar structures with event schemas in ACE and many frames actually express events, we propose to redefine the event schemas based on FrameNet. Specifically, we extract frames expressing event information from FrameNet and leverage the frame-to-frame relations to build a hierarchy of event schemas that are more fine-grained and have much wider coverage than ACE. Based on the new event schemas, we propose a joint event extraction approach that leverages the hierarchical structure of event schemas and frame-to-frame relations in FrameNet. The extensive experiments have verified the advantages of our hierarchical event schemas and the effectiveness of our event extraction model. We further apply the results of our event extraction model on news summarization. The results show that the summarization approach based on our event extraction model achieves significant better performance than several state-of-the-art summarization approaches, which also demonstrates that the hierarchical event schemas and event extraction model are promising to be used in the practical applications

    Dialogue-Based Relation Extraction

    Full text link
    We present the first human-annotated dialogue-based relation extraction (RE) dataset DialogRE, aiming to support the prediction of relation(s) between two arguments that appear in a dialogue. We further offer DialogRE as a platform for studying cross-sentence RE as most facts span multiple sentences. We argue that speaker-related information plays a critical role in the proposed task, based on an analysis of similarities and differences between dialogue-based and traditional RE tasks. Considering the timeliness of communication in a dialogue, we design a new metric to evaluate the performance of RE methods in a conversational setting and investigate the performance of several representative RE methods on DialogRE. Experimental results demonstrate that a speaker-aware extension on the best-performing model leads to gains in both the standard and conversational evaluation settings. DialogRE is available at https://dataset.org/dialogre/.Comment: To appear in ACL 202

    One, no one and one hundred thousand events: Defining and processing events in an inter-disciplinary perspective

    Get PDF
    We present an overview of event definition and processing spanning 25 years of research in NLP. We first provide linguistic background to the notion of event, and then present past attempts to formalize this concept in annotation standards to foster the development of benchmarks for event extraction systems. This ranges from MUC-3 in 1991 to the Time and Space Track challenge at SemEval 2015. Besides, we shed light on other disciplines in which the notion of event plays a crucial role, with a focus on the historical domain. Our goal is to provide a comprehensive study on event definitions and investigate which potential past efforts in the NLP community may have in a different research domain. We present the results of a questionnaire, where the notion of event for historians is put in relation to the NLP perspective

    A Comparison of the Events and Relations Across ACE, ERE, TAC-KBP, and FrameNet Annotation Standards

    No full text

    Novel Event Detection and Classification for Historical Texts

    Get PDF
    Event processing is an active area of research in the Natural Language Processing community but resources and automatic systems developed so far have mainly addressed contemporary texts. However, the recognition and elaboration of events is a crucial step when dealing with historical texts particularly in the current era of massive digitization of historical sources: research in this domain can lead to the development of methodologies and tools that can assist historians in enhancing their work, while having an impact also on the field of Natural Language Processing. Our work aims at shedding light on the complex concept of events when dealing with historical texts. More specifically, we introduce new annotation guidelines for event mentions and types, categorised into 22 classes. Then, we annotate a historical corpus accordingly, and compare two approaches for automatic event detection and classification following this novel scheme. We believe that this work can foster research in a field of inquiry so far underestimated in the area of Temporal Information Processing. To this end, we release new annotation guidelines, a corpus and new models for automatic annotation
    corecore