203 research outputs found

    Opinion Expression Mining by Exploiting Keyphrase Extraction

    Get PDF

    Title-Guided Encoding for Keyphrase Generation

    Full text link
    Keyphrase generation (KG) aims to generate a set of keyphrases given a document, which is a fundamental task in natural language processing (NLP). Most previous methods solve this problem in an extractive manner, while recently, several attempts are made under the generative setting using deep neural networks. However, the state-of-the-art generative methods simply treat the document title and the document main body equally, ignoring the leading role of the title to the overall document. To solve this problem, we introduce a new model called Title-Guided Network (TG-Net) for automatic keyphrase generation task based on the encoder-decoder architecture with two new features: (i) the title is additionally employed as a query-like input, and (ii) a title-guided encoder gathers the relevant information from the title to each word in the document. Experiments on a range of KG datasets demonstrate that our model outperforms the state-of-the-art models with a large margin, especially for documents with either very low or very high title length ratios.Comment: AAAI 1

    Semi-Supervised Learning for Neural Keyphrase Generation

    Full text link
    We study the problem of generating keyphrases that summarize the key points for a given document. While sequence-to-sequence (seq2seq) models have achieved remarkable performance on this task (Meng et al., 2017), model training often relies on large amounts of labeled data, which is only applicable to resource-rich domains. In this paper, we propose semi-supervised keyphrase generation methods by leveraging both labeled data and large-scale unlabeled samples for learning. Two strategies are proposed. First, unlabeled documents are first tagged with synthetic keyphrases obtained from unsupervised keyphrase extraction methods or a selflearning algorithm, and then combined with labeled samples for training. Furthermore, we investigate a multi-task learning framework to jointly learn to generate keyphrases as well as the titles of the articles. Experimental results show that our semi-supervised learning-based methods outperform a state-of-the-art model trained with labeled data only.Comment: To appear in EMNLP 2018 (12 pages, 7 figures, 6 tables

    Persian Keyphrase Generation Using Sequence-to-Sequence Models

    Full text link
    Keyphrases are a very short summary of an input text and provide the main subjects discussed in the text. Keyphrase extraction is a useful upstream task and can be used in various natural language processing problems, for example, text summarization and information retrieval, to name a few. However, not all the keyphrases are explicitly mentioned in the body of the text. In real-world examples there are always some topics that are discussed implicitly. Extracting such keyphrases requires a generative approach, which is adopted here. In this paper, we try to tackle the problem of keyphrase generation and extraction from news articles using deep sequence-to-sequence models. These models significantly outperform the conventional methods such as Topic Rank, KPMiner, and KEA in the task of keyphrase extraction

    Evaluating anaphora and coreference resolution to improve automatic keyphrase extraction

    Get PDF
    In this paper we analyze the effectiveness of using linguistic knowledge from coreference and anaphora resolution for improving the performance for supervised keyphrase extraction. In order to verify the impact of these features, we de\ufb01ne a baseline keyphrase extraction system and evaluate its performance on a standard dataset using different machine learning algorithms. Then, we consider new sets of features by adding combinations of the linguistic features we propose and we evaluate the new performance of the system. We also use anaphora and coreference resolution to transform the documents, trying to simulate the cohesion process performed by the human mind. We found that our approach has a slightly positive impact on the performance of automatic keyphrase extraction, in particular when considering the ranking of the results

    Efficient Keyphrase Generation with GANs

    Get PDF
    Keyphrase Generation is the task of predicting keyphrases: short text sequences that convey the main semantic meaning of a document. In this paper, we introduce a keyphrase generation approach that makes use of a Generative Adversarial Networks (GANs) architecture. In our system, the Generator produces a sequence of keyphrases for an input document. The Discriminator, in turn, tries to distinguish between machine generated and human curated keyphrases. We propose a novel Discriminator architecture based on a BERT pretrained model fine-tuned for Sequence Classification. We train our proposed architecture using only a small subset of the standard available training dataset, amounting to less than 1% of the total, achieving a great level of data efficiency. The resulting model is evaluated on five public datasets, obtaining competitive and promising results with respect to four state-of-the-art generative models
    corecore