17 research outputs found

    LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation

    Full text link
    Successfully training a deep neural network demands a huge corpus of labeled data. However, each label only provides limited information to learn from and collecting the requisite number of labels involves massive human effort. In this work, we introduce LEAN-LIFE, a web-based, Label-Efficient AnnotatioN framework for sequence labeling and classification tasks, with an easy-to-use UI that not only allows an annotator to provide the needed labels for a task, but also enables LearnIng From Explanations for each labeling decision. Such explanations enable us to generate useful additional labeled data from unlabeled instances, bolstering the pool of available training data. On three popular NLP tasks (named entity recognition, relation extraction, sentiment analysis), we find that using this enhanced supervision allows our models to surpass competitive baseline F1 scores by more than 5-10 percentage points, while using 2X times fewer labeled instances. Our framework is the first to utilize this enhanced supervision technique and does so for three important tasks -- thus providing improved annotation recommendations to users and an ability to build datasets of (data, label, explanation) triples instead of the regular (data, label) pair.Comment: Accepted to the ACL 2020 (demo). The first two authors contributed equally. Project page: http://inklab.usc.edu/leanlife

    AutoTriggER: Named Entity Recognition with Auxiliary Trigger Extraction

    Full text link
    Deep neural models for low-resource named entity recognition (NER) have shown impressive results by leveraging distant super-vision or other meta-level information (e.g. explanation). However, the costs of acquiring such additional information are generally prohibitive, especially in domains where existing resources (e.g. databases to be used for distant supervision) may not exist. In this paper, we present a novel two-stage framework (AutoTriggER) to improve NER performance by automatically generating and leveraging "entity triggers" which are essentially human-readable clues in the text that can help guide the model to make better decisions. Thus, the framework is able to both create and leverage auxiliary supervision by itself. Through experiments on three well-studied NER datasets, we show that our automatically extracted triggers are well-matched to human triggers, and AutoTriggER improves performance over a RoBERTa-CRFarchitecture by nearly 0.5 F1 points on average and much more in a low resource setting.Comment: 10 pages, 12 figures, Best paper at TrustNLP@NAACL 2021 and presented at WeaSuL@ICLR 202

    ICEWS Dictionaries

    No full text
    The ICEWS dictionaries contain both named individuals or groups, known as actors, and generic individuals or groups, known as agents. Actors are known by a specific name, such as 'Free Syrian Army' or 'Goodluck Johnathan', while agents are known by a generic improper noun, such as 'insurgents' or 'students'. Both actors and agents have time-dependent affiliations with another actor (in the case of an individual being a member of an organization, for example), a country or other autonomous region, or with a general sector/role, such as 'Military' or 'Government'. Also included in the dictionaries are aliases that an actor or agent might be known by. In the case of actors, these are typically alternate spellings of a person's name, while for agents they are typically synonyms. Additional information about the ICEWS program can be found at http://www.icews.com/. Follow our Twitter handle for data updates and other news: @icew

    Language Model Priming for Cross-Lingual Event Extraction

    No full text
    We present a novel, language-agnostic approach to "priming" language models for the task of event extraction, providing particularly effective performance in low-resource and zero-shot cross-lingual settings. With priming, we augment the input to the transformer stack's language model differently depending on the question(s) being asked of the model at runtime. For instance, if the model is being asked to identify arguments for the trigger "protested", we will provide that trigger as part of the input to the language model, allowing it to produce different representations for candidate arguments than when it is asked about arguments for the trigger "arrest" elsewhere in the same sentence. We show that by enabling the language model to better compensate for the deficits of sparse and noisy training data, our approach improves both trigger and argument detection and classification significantly over the state of the art in a zero-shot cross-lingual setting

    Selecting ontopic sentences from natural language corpora

    No full text
    We describe a system that examines input sentences with respect to arbitrary topics formulated as natural language expressions. It extracts predicate-argument structures from text intervals and links them into semantically organized proposition trees. By instantiating trees constructed for topic descriptions in trees representing input sentences or parts thereof, we are able to assess degree of “topicality ” for each sentence. The presented strategy was used in the BBN distillation system for the GALE Year 1 evaluation and achieved outstanding results compared to other systems and human participants. Index Terms: machine learning, question answering, topicality. 1

    ICEWS Coded Event Data

    No full text
    Event data consists of coded interactions between socio-political actors (i.e., cooperative or hostile actions between individuals, groups, sectors and nation states). Events are automatically identified and extracted from news articles by the BBN ACCENT event coder. These events are essentially triples consisting of a source actor, an event type (according to the CAMEO taxonomy of events), and a target actor. Geographical-temporal metadata are also extracted and associated with the relevant events within a news article. We plan to update this data on a periodic basis. Additional event data may be made available For Official Use Only (FOUO), government sponsored research activities

    ICEWS Automated Daily Event Data

    No full text
    Event data consists of coded interactions between socio-political actors (i.e., cooperative or hostile actions between individuals, groups, sectors and nation states). Events are automatically identified and extracted from news articles by the BBN ACCENT event coder. These events are essentially triples consisting of a source actor, an event type (according to the CAMEO taxonomy of events), and a target actor. Geographical-temporal metadata are also extracted and associated with the relevant events within a news article. We plan to update this data on a periodic basis. Additional event data may be made available For Official Use Only (FOUO), government sponsored research activities
    corecore