2 research outputs found
Entity Synonym Discovery via Multipiece Bilateral Context Matching
Being able to automatically discover synonymous entities in an open-world
setting benefits various tasks such as entity disambiguation or knowledge graph
canonicalization. Existing works either only utilize entity features, or rely
on structured annotations from a single piece of context where the entity is
mentioned. To leverage diverse contexts where entities are mentioned, in this
paper, we generalize the distributional hypothesis to a multi-context setting
and propose a synonym discovery framework that detects entity synonyms from
free-text corpora with considerations on effectiveness and robustness. As one
of the key components in synonym discovery, we introduce a neural network model
SYNONYMNET to determine whether or not two given entities are synonym with each
other. Instead of using entities features, SYNONYMNET makes use of multiple
pieces of contexts in which the entity is mentioned, and compares the
context-level similarity via a bilateral matching schema. Experimental results
demonstrate that the proposed model is able to detect synonym sets that are not
observed during training on both generic and domain-specific datasets:
Wiki+Freebase, PubMed+UMLS, and MedBook+MKG, with up to 4.16% improvement in
terms of Area Under the Curve and 3.19% in terms of Mean Average Precision
compared to the best baseline method.Comment: In IJCAI 2020 as a long paper. Code and data are available at
https://github.com/czhang99/SynonymNe
Automatic Context Pattern Generation for Entity Set Expansion
Entity Set Expansion (ESE) is a valuable task that aims to find entities of
the target semantic class described by given seed entities. Various NLP and IR
downstream applications have benefited from ESE due to its ability to discover
knowledge. Although existing bootstrapping methods have achieved great
progress, most of them still rely on manually pre-defined context patterns. A
non-negligible shortcoming of the pre-defined context patterns is that they
cannot be flexibly generalized to all kinds of semantic classes, and we call
this phenomenon as "semantic sensitivity". To address this problem, we devise a
context pattern generation module that utilizes autoregressive language models
(e.g., GPT-2) to automatically generate high-quality context patterns for
entities. In addition, we propose the GAPA, a novel ESE framework that
leverages the aforementioned GenerAted PAtterns to expand target entities.
Extensive experiments and detailed analyses on three widely used datasets
demonstrate the effectiveness of our method. All the codes of our experiments
will be available for reproducibility.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl