17 research outputs found
LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation
Successfully training a deep neural network demands a huge corpus of labeled
data. However, each label only provides limited information to learn from and
collecting the requisite number of labels involves massive human effort. In
this work, we introduce LEAN-LIFE, a web-based, Label-Efficient AnnotatioN
framework for sequence labeling and classification tasks, with an easy-to-use
UI that not only allows an annotator to provide the needed labels for a task,
but also enables LearnIng From Explanations for each labeling decision. Such
explanations enable us to generate useful additional labeled data from
unlabeled instances, bolstering the pool of available training data. On three
popular NLP tasks (named entity recognition, relation extraction, sentiment
analysis), we find that using this enhanced supervision allows our models to
surpass competitive baseline F1 scores by more than 5-10 percentage points,
while using 2X times fewer labeled instances. Our framework is the first to
utilize this enhanced supervision technique and does so for three important
tasks -- thus providing improved annotation recommendations to users and an
ability to build datasets of (data, label, explanation) triples instead of the
regular (data, label) pair.Comment: Accepted to the ACL 2020 (demo). The first two authors contributed
equally. Project page: http://inklab.usc.edu/leanlife
AutoTriggER: Named Entity Recognition with Auxiliary Trigger Extraction
Deep neural models for low-resource named entity recognition (NER) have shown
impressive results by leveraging distant super-vision or other meta-level
information (e.g. explanation). However, the costs of acquiring such additional
information are generally prohibitive, especially in domains where existing
resources (e.g. databases to be used for distant supervision) may not exist. In
this paper, we present a novel two-stage framework (AutoTriggER) to improve NER
performance by automatically generating and leveraging "entity triggers" which
are essentially human-readable clues in the text that can help guide the model
to make better decisions. Thus, the framework is able to both create and
leverage auxiliary supervision by itself. Through experiments on three
well-studied NER datasets, we show that our automatically extracted triggers
are well-matched to human triggers, and AutoTriggER improves performance over a
RoBERTa-CRFarchitecture by nearly 0.5 F1 points on average and much more in a
low resource setting.Comment: 10 pages, 12 figures, Best paper at TrustNLP@NAACL 2021 and presented
at WeaSuL@ICLR 202
ICEWS Dictionaries
The ICEWS dictionaries contain both named individuals or groups, known as actors, and generic individuals or groups, known as agents. Actors are known by a specific name, such as 'Free Syrian Army' or 'Goodluck Johnathan', while agents are known by a generic improper noun, such as 'insurgents' or 'students'. Both actors and agents have time-dependent affiliations with another actor (in the case of an individual being a member of an organization, for example), a country or other autonomous region, or with a general sector/role, such as 'Military' or 'Government'. Also included in the dictionaries are aliases that an actor or agent might be known by. In the case of actors, these are typically alternate spellings of a person's name, while for agents they are typically synonyms. Additional information about the ICEWS program can be found at http://www.icews.com/. Follow our Twitter handle for data updates and other news: @icew
Language Model Priming for Cross-Lingual Event Extraction
We present a novel, language-agnostic approach to "priming" language models for the task of event extraction, providing particularly effective performance in low-resource and zero-shot cross-lingual settings. With priming, we augment the input to the transformer stack's language model differently depending on the question(s) being asked of the model at runtime. For instance, if the model is being asked to identify arguments for the trigger "protested", we will provide that trigger as part of the input to the language model, allowing it to produce different representations for candidate arguments than when it is asked about arguments for the trigger "arrest" elsewhere in the same sentence. We show that by enabling the language model to better compensate for the deficits of sparse and noisy training data, our approach improves both trigger and argument detection and classification significantly over the state of the art in a zero-shot cross-lingual setting
Selecting ontopic sentences from natural language corpora
We describe a system that examines input sentences with respect to arbitrary topics formulated as natural language expressions. It extracts predicate-argument structures from text intervals and links them into semantically organized proposition trees. By instantiating trees constructed for topic descriptions in trees representing input sentences or parts thereof, we are able to assess degree of “topicality ” for each sentence. The presented strategy was used in the BBN distillation system for the GALE Year 1 evaluation and achieved outstanding results compared to other systems and human participants. Index Terms: machine learning, question answering, topicality. 1
ICEWS Coded Event Data
Event data consists of coded interactions between socio-political actors (i.e., cooperative or hostile actions between individuals, groups, sectors and nation states). Events are automatically identified and extracted from news articles by the BBN ACCENT event coder. These events are essentially triples consisting of a source actor, an event type (according to the CAMEO taxonomy of events), and a target actor. Geographical-temporal metadata are also extracted and associated with the relevant events within a news article. We plan to update this data on a periodic basis. Additional event data may be made available For Official Use Only (FOUO), government sponsored research activities
ICEWS Automated Daily Event Data
Event data consists of coded interactions between socio-political actors (i.e., cooperative or hostile actions between individuals, groups, sectors and nation states). Events are automatically identified and extracted from news articles by the BBN ACCENT event coder. These events are essentially triples consisting of a source actor, an event type (according to the CAMEO taxonomy of events), and a target actor. Geographical-temporal metadata are also extracted and associated with the relevant events within a news article. We plan to update this data on a periodic basis. Additional event data may be made available For Official Use Only (FOUO), government sponsored research activities