Search CORE

17 research outputs found

LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation

Author: Boschee Elizabeth
Chen Jamin
Khanna Rahul
Lee Dong-Ho
Lee Seyeon
Lin Bill Yuchen
Neves Leonardo
Ren Xiang
Ye Qinyuan
Publication venue
Publication date: 01/01/2020
Field of study

Successfully training a deep neural network demands a huge corpus of labeled data. However, each label only provides limited information to learn from and collecting the requisite number of labels involves massive human effort. In this work, we introduce LEAN-LIFE, a web-based, Label-Efficient AnnotatioN framework for sequence labeling and classification tasks, with an easy-to-use UI that not only allows an annotator to provide the needed labels for a task, but also enables LearnIng From Explanations for each labeling decision. Such explanations enable us to generate useful additional labeled data from unlabeled instances, bolstering the pool of available training data. On three popular NLP tasks (named entity recognition, relation extraction, sentiment analysis), we find that using this enhanced supervision allows our models to surpass competitive baseline F1 scores by more than 5-10 percentage points, while using 2X times fewer labeled instances. Our framework is the first to utilize this enhanced supervision technique and does so for three important tasks -- thus providing improved annotation recommendations to users and an ability to build datasets of (data, label, explanation) triples instead of the regular (data, label) pair.Comment: Accepted to the ACL 2020 (demo). The first two authors contributed equally. Project page: http://inklab.usc.edu/leanlife

arXiv.org e-Print Archive

Crossref

AutoTriggER: Named Entity Recognition with Auxiliary Trigger Extraction

Author: Agarwal Mahak
Allan James
Boschee Elizabeth
Lee Dong-Ho
Lin Bill Yuchen
Morstatter Fred
Pujara Jay
Ren Xiang
Sarwar Sheikh Muhammad
Selvam Ravi Kiran
Publication venue
Publication date: 14/10/2021
Field of study

Deep neural models for low-resource named entity recognition (NER) have shown impressive results by leveraging distant super-vision or other meta-level information (e.g. explanation). However, the costs of acquiring such additional information are generally prohibitive, especially in domains where existing resources (e.g. databases to be used for distant supervision) may not exist. In this paper, we present a novel two-stage framework (AutoTriggER) to improve NER performance by automatically generating and leveraging "entity triggers" which are essentially human-readable clues in the text that can help guide the model to make better decisions. Thus, the framework is able to both create and leverage auxiliary supervision by itself. Through experiments on three well-studied NER datasets, we show that our automatically extracted triggers are well-matched to human triggers, and AutoTriggER improves performance over a RoBERTa-CRFarchitecture by nearly 0.5 F1 points on average and much more in a low resource setting.Comment: 10 pages, 12 figures, Best paper at TrustNLP@NAACL 2021 and presented at WeaSuL@ICLR 202

arXiv.org e-Print Archive

ICEWS Dictionaries

Author: Boschee Elizabeth
Lautenschlager Jennifer
Shellman Steve
Shilliday Andrew
Publication venue: Harvard Dataverse
Publication date: 01/01/2014
Field of study

The ICEWS dictionaries contain both named individuals or groups, known as actors, and generic individuals or groups, known as agents. Actors are known by a specific name, such as 'Free Syrian Army' or 'Goodluck Johnathan', while agents are known by a generic improper noun, such as 'insurgents' or 'students'. Both actors and agents have time-dependent affiliations with another actor (in the case of an individual being a member of an organization, for example), a country or other autonomous region, or with a general sector/role, such as 'Military' or 'Government'. Also included in the dictionaries are aliases that an actor or agent might be known by. In the case of actors, these are typically alternate spellings of a person's name, while for agents they are typically synonyms. Additional information about the ICEWS program can be found at http://www.icews.com/. Follow our Twitter handle for data updates and other news: @icew

Harvard Dataverse Network

Language Model Priming for Cross-Lingual Event Extraction

Author: Agarwal Shantanu
Boschee Elizabeth
Fincke Steven
Miller Scott
Publication venue: Association for the Advancement of Artificial Intelligence
Publication date: 28/06/2022
Field of study

We present a novel, language-agnostic approach to "priming" language models for the task of event extraction, providing particularly effective performance in low-resource and zero-shot cross-lingual settings. With priming, we augment the input to the transformer stack's language model differently depending on the question(s) being asked of the model at runtime. For instance, if the model is being asked to identify arguments for the trigger "protested", we will provide that trigger as part of the input to the language model, allowing it to produce different representations for candidate arguments than when it is asked about arguments for the trigger "arrest" elsewhere in the same sentence. We show that by enabling the language model to better compensate for the deficits of sparse and noisy training data, our approach improves both trigger and argument detection and classification significantly over the state of the art in a zero-shot cross-lingual setting

Association for the Advancement of Artificial Intelligence: AAAI Publications

Selecting ontopic sentences from natural language corpora

Author: Elizabeth Boschee
Marjorie Freedman
Michael Levit
Publication venue
Publication date
Field of study

We describe a system that examines input sentences with respect to arbitrary topics formulated as natural language expressions. It extracts predicate-argument structures from text intervals and links them into semantically organized proposition trees. By instantiating trees constructed for topic descriptions in trees representing input sentences or parts thereof, we are able to assess degree of “topicality ” for each sentence. The presented strategy was used in the BBN distillation system for the GALE Year 1 evaluation and achieved outstanding results compared to other systems and human participants. Index Terms: machine learning, question answering, topicality. 1

CiteSeerX

ICEWS Coded Event Data

Author: Boschee Elizabeth
Lautenschlager Jennifer
O'Brien Sean
Shellman Steve
Starz James
Ward Michael
Publication venue: Harvard Dataverse
Publication date: 01/01/2015
Field of study

Event data consists of coded interactions between socio-political actors (i.e., cooperative or hostile actions between individuals, groups, sectors and nation states). Events are automatically identified and extracted from news articles by the BBN ACCENT event coder. These events are essentially triples consisting of a source actor, an event type (according to the CAMEO taxonomy of events), and a target actor. Geographical-temporal metadata are also extracted and associated with the relevant events within a news article. We plan to update this data on a periodic basis. Additional event data may be made available For Official Use Only (FOUO), government sponsored research activities

Harvard Dataverse Network

ICEWS Automated Daily Event Data

Author: Boschee Elizabeth
Lautenschlager Jennifer
O'Brien Sean
Shellman Steve
Starz James
Publication venue: Harvard Dataverse
Publication date
Field of study

Harvard Dataverse Network