4,731 research outputs found

    Mining Entity Synonyms with Efficient Neural Set Generation

    Full text link
    Mining entity synonym sets (i.e., sets of terms referring to the same entity) is an important task for many entity-leveraging applications. Previous work either rank terms based on their similarity to a given query term, or treats the problem as a two-phase task (i.e., detecting synonymy pairs, followed by organizing these pairs into synonym sets). However, these approaches fail to model the holistic semantics of a set and suffer from the error propagation issue. Here we propose a new framework, named SynSetMine, that efficiently generates entity synonym sets from a given vocabulary, using example sets from external knowledge bases as distant supervision. SynSetMine consists of two novel modules: (1) a set-instance classifier that jointly learns how to represent a permutation invariant synonym set and whether to include a new instance (i.e., a term) into the set, and (2) a set generation algorithm that enumerates the vocabulary only once and applies the learned set-instance classifier to detect all entity synonym sets in it. Experiments on three real datasets from different domains demonstrate both effectiveness and efficiency of SynSetMine for mining entity synonym sets.Comment: AAAI 2019 camera-ready versio

    Automatic correction of part-of-speech corpora

    Get PDF
    In this study a simple method for automatic correction of part-ofspeech corpora is presented, which works as follows: Initially two or more already available part-of-speech taggers are applied on the data. Then a sample of differing outputs is taken to train a classifier to predict for each difference which of the taggers (if any) delivered the correct output. As classifiers we employed instance-based learning, a C4.5 decision tree and a Bayesian classifier. Their performances ranged from 59.1 % to 67.3 %. Training on the automatically corrected data finally lead to significant improvements in tagger performance
    • …
    corecore