14 research outputs found
Mining Entity Synonyms with Efficient Neural Set Generation
Mining entity synonym sets (i.e., sets of terms referring to the same entity)
is an important task for many entity-leveraging applications. Previous work
either rank terms based on their similarity to a given query term, or treats
the problem as a two-phase task (i.e., detecting synonymy pairs, followed by
organizing these pairs into synonym sets). However, these approaches fail to
model the holistic semantics of a set and suffer from the error propagation
issue. Here we propose a new framework, named SynSetMine, that efficiently
generates entity synonym sets from a given vocabulary, using example sets from
external knowledge bases as distant supervision. SynSetMine consists of two
novel modules: (1) a set-instance classifier that jointly learns how to
represent a permutation invariant synonym set and whether to include a new
instance (i.e., a term) into the set, and (2) a set generation algorithm that
enumerates the vocabulary only once and applies the learned set-instance
classifier to detect all entity synonym sets in it. Experiments on three real
datasets from different domains demonstrate both effectiveness and efficiency
of SynSetMine for mining entity synonym sets.Comment: AAAI 2019 camera-ready versio
Automatic Synonym Discovery with Knowledge Bases
Recognizing entity synonyms from text has become a crucial task in many
entity-leveraging applications. However, discovering entity synonyms from
domain-specific text corpora (e.g., news articles, scientific papers) is rather
challenging. Current systems take an entity name string as input to find out
other names that are synonymous, ignoring the fact that often times a name
string can refer to multiple entities (e.g., "apple" could refer to both Apple
Inc and the fruit apple). Moreover, most existing methods require training data
manually created by domain experts to construct supervised-learning systems. In
this paper, we study the problem of automatic synonym discovery with knowledge
bases, that is, identifying synonyms for knowledge base entities in a given
domain-specific corpus. The manually-curated synonyms for each entity stored in
a knowledge base not only form a set of name strings to disambiguate the
meaning for each other, but also can serve as "distant" supervision to help
determine important features for the task. We propose a novel framework, called
DPE, to integrate two kinds of mutually-complementing signals for synonym
discovery, i.e., distributional features based on corpus-level statistics and
textual patterns based on local contexts. In particular, DPE jointly optimizes
the two kinds of signals in conjunction with distant supervision, so that they
can mutually enhance each other in the training stage. At the inference stage,
both signals will be utilized to discover synonyms for the given entities.
Experimental results prove the effectiveness of the proposed framework