617 research outputs found

    Title-Guided Encoding for Keyphrase Generation

    Full text link
    Keyphrase generation (KG) aims to generate a set of keyphrases given a document, which is a fundamental task in natural language processing (NLP). Most previous methods solve this problem in an extractive manner, while recently, several attempts are made under the generative setting using deep neural networks. However, the state-of-the-art generative methods simply treat the document title and the document main body equally, ignoring the leading role of the title to the overall document. To solve this problem, we introduce a new model called Title-Guided Network (TG-Net) for automatic keyphrase generation task based on the encoder-decoder architecture with two new features: (i) the title is additionally employed as a query-like input, and (ii) a title-guided encoder gathers the relevant information from the title to each word in the document. Experiments on a range of KG datasets demonstrate that our model outperforms the state-of-the-art models with a large margin, especially for documents with either very low or very high title length ratios.Comment: AAAI 1

    A User-Centered Concept Mining System for Query and Document Understanding at Tencent

    Full text link
    Concepts embody the knowledge of the world and facilitate the cognitive processes of human beings. Mining concepts from web documents and constructing the corresponding taxonomy are core research problems in text understanding and support many downstream tasks such as query analysis, knowledge base construction, recommendation, and search. However, we argue that most prior studies extract formal and overly general concepts from Wikipedia or static web pages, which are not representing the user perspective. In this paper, we describe our experience of implementing and deploying ConcepT in Tencent QQ Browser. It discovers user-centered concepts at the right granularity conforming to user interests, by mining a large amount of user queries and interactive search click logs. The extracted concepts have the proper granularity, are consistent with user language styles and are dynamically updated. We further present our techniques to tag documents with user-centered concepts and to construct a topic-concept-instance taxonomy, which has helped to improve search as well as news feeds recommendation in Tencent QQ Browser. We performed extensive offline evaluation to demonstrate that our approach could extract concepts of higher quality compared to several other existing methods. Our system has been deployed in Tencent QQ Browser. Results from online A/B testing involving a large number of real users suggest that the Impression Efficiency of feeds users increased by 6.01% after incorporating the user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201

    Semi-Supervised Learning for Neural Keyphrase Generation

    Full text link
    We study the problem of generating keyphrases that summarize the key points for a given document. While sequence-to-sequence (seq2seq) models have achieved remarkable performance on this task (Meng et al., 2017), model training often relies on large amounts of labeled data, which is only applicable to resource-rich domains. In this paper, we propose semi-supervised keyphrase generation methods by leveraging both labeled data and large-scale unlabeled samples for learning. Two strategies are proposed. First, unlabeled documents are first tagged with synthetic keyphrases obtained from unsupervised keyphrase extraction methods or a selflearning algorithm, and then combined with labeled samples for training. Furthermore, we investigate a multi-task learning framework to jointly learn to generate keyphrases as well as the titles of the articles. Experimental results show that our semi-supervised learning-based methods outperform a state-of-the-art model trained with labeled data only.Comment: To appear in EMNLP 2018 (12 pages, 7 figures, 6 tables

    DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases

    Full text link
    Keyphrase extraction from documents is useful to a variety of applications such as information retrieval and document summarization. This paper presents an end-to-end method called DivGraphPointer for extracting a set of diversified keyphrases from a document. DivGraphPointer combines the advantages of traditional graph-based ranking methods and recent neural network-based approaches. Specifically, given a document, a word graph is constructed from the document based on word proximity and is encoded with graph convolutional networks, which effectively capture document-level word salience by modeling long-range dependency between words in the document and aggregating multiple appearances of identical words into one node. Furthermore, we propose a diversified point network to generate a set of diverse keyphrases out of the word graph in the decoding process. Experimental results on five benchmark data sets show that our proposed method significantly outperforms the existing state-of-the-art approaches.Comment: Accepted to SIGIR 201

    Keyphrase Generation: A Multi-Aspect Survey

    Full text link
    Extractive keyphrase generation research has been around since the nineties, but the more advanced abstractive approach based on the encoder-decoder framework and sequence-to-sequence learning has been explored only recently. In fact, more than a dozen of abstractive methods have been proposed in the last three years, producing meaningful keyphrases and achieving state-of-the-art scores. In this survey, we examine various aspects of the extractive keyphrase generation methods and focus mostly on the more recent abstractive methods that are based on neural networks. We pay particular attention to the mechanisms that have driven the perfection of the later. A huge collection of scientific article metadata and the corresponding keyphrases is created and released for the research community. We also present various keyphrase generation and text summarization research patterns and trends of the last two decades.Comment: 10 pages, 5 tables. Published in proceedings of FRUCT 2019, the 25th Conference of the Open Innovations Association FRUCT, Helsinki, Finlan
    corecore