2 research outputs found
TaxoExpan: Self-supervised Taxonomy Expansion with Position-Enhanced Graph Neural Network
Taxonomies consist of machine-interpretable semantics and provide valuable
knowledge for many web applications. For example, online retailers (e.g.,
Amazon and eBay) use taxonomies for product recommendation, and web search
engines (e.g., Google and Bing) leverage taxonomies to enhance query
understanding. Enormous efforts have been made on constructing taxonomies
either manually or semi-automatically. However, with the fast-growing volume of
web content, existing taxonomies will become outdated and fail to capture
emerging knowledge. Therefore, in many applications, dynamic expansions of an
existing taxonomy are in great demand. In this paper, we study how to expand an
existing taxonomy by adding a set of new concepts. We propose a novel
self-supervised framework, named TaxoExpan, which automatically generates a set
of pairs from the existing taxonomy as training
data. Using such self-supervision data, TaxoExpan learns a model to predict
whether a query concept is the direct hyponym of an anchor concept. We develop
two innovative techniques in TaxoExpan: (1) a position-enhanced graph neural
network that encodes the local structure of an anchor concept in the existing
taxonomy, and (2) a noise-robust training objective that enables the learned
model to be insensitive to the label noise in the self-supervision data.
Extensive experiments on three large-scale datasets from different domains
demonstrate both the effectiveness and the efficiency of TaxoExpan for taxonomy
expansion.Comment: WWW 202
Comparing Constraints for Taxonomic Organization
International audienceBuilding a taxonomy from the ground up involves several sub-tasks: selecting terms to include, predicting semantic relations between terms, and selecting a subset of relational instances to keep, given constraints on the taxonomy graph. Methods for this final step taxonomic organization vary both in terms of the constraints they impose, and whether they enable discovery of synonymous terms. It is hard to isolate the impact of these factors on the quality of the resulting taxonomy because organization methods are rarely compared directly. In this paper, we present a head-to-head comparison of six taxonomic organization algorithms that vary with respect to their structural and transitivity constraints, and treatment of synonymy. We find that while transitive algorithms out-perform their non-transitive counterparts, the top-performing transitive algorithm is prohibitively slow for taxonomies with as few as 50 entities. We propose a simple modification to a non-transitive optimum branching algorithm to explicitly incorporate synonymy, resulting in a method that is substantially faster than the best transitive algorithm while giving complementary performance