3 research outputs found
Empower Entity Set Expansion via Language Model Probing
Entity set expansion, aiming at expanding a small seed entity set with new
entities belonging to the same semantic class, is a critical task that benefits
many downstream NLP and IR applications, such as question answering, query
understanding, and taxonomy construction. Existing set expansion methods
bootstrap the seed entity set by adaptively selecting context features and
extracting new entities. A key challenge for entity set expansion is to avoid
selecting ambiguous context features which will shift the class semantics and
lead to accumulative errors in later iterations. In this study, we propose a
novel iterative set expansion framework that leverages automatically generated
class names to address the semantic drift issue. In each iteration, we select
one positive and several negative class names by probing a pre-trained language
model, and further score each candidate entity based on selected class names.
Experiments on two datasets show that our framework generates high-quality
class names and outperforms previous state-of-the-art methods significantly.Comment: ACL 202
Weakly Supervised Named Entity Tagging with Learnable Logical Rules
We study the problem of building entity tagging systems by using a few rules
as weak supervision. Previous methods mostly focus on disambiguation entity
types based on contexts and expert-provided rules, while assuming entity spans
are given. In this work, we propose a novel method TALLOR that bootstraps
high-quality logical rules to train a neural tagger in a fully automated
manner. Specifically, we introduce compound rules that are composed from simple
rules to increase the precision of boundary detection and generate more diverse
pseudo labels. We further design a dynamic label selection strategy to ensure
pseudo label quality and therefore avoid overfitting the neural tagger.
Experiments on three datasets demonstrate that our method outperforms other
weakly supervised methods and even rivals a state-of-the-art distantly
supervised tagger with a lexicon of over 2,000 terms when starting from only 20
simple rules. Our method can serve as a tool for rapidly building taggers in
emerging domains and tasks. Case studies show that learned rules can
potentially explain the predicted entities
SynSetExpan: An Iterative Framework for Joint Entity Set Expansion and Synonym Discovery
Entity set expansion and synonym discovery are two critical NLP tasks.
Previous studies accomplish them separately, without exploring their
interdependencies. In this work, we hypothesize that these two tasks are
tightly coupled because two synonymous entities tend to have similar
likelihoods of belonging to various semantic classes. This motivates us to
design SynSetExpan, a novel framework that enables two tasks to mutually
enhance each other. SynSetExpan uses a synonym discovery model to include
popular entities' infrequent synonyms into the set, which boosts the set
expansion recall. Meanwhile, the set expansion model, being able to determine
whether an entity belongs to a semantic class, can generate pseudo training
data to fine-tune the synonym discovery model towards better accuracy. To
facilitate the research on studying the interplays of these two tasks, we
create the first large-scale Synonym-Enhanced Set Expansion (SE2) dataset via
crowdsourcing. Extensive experiments on the SE2 dataset and previous benchmarks
demonstrate the effectiveness of SynSetExpan for both entity set expansion and
synonym discovery tasks.Comment: EMNLP 202