249 research outputs found
DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases
Keyphrase extraction from documents is useful to a variety of applications
such as information retrieval and document summarization. This paper presents
an end-to-end method called DivGraphPointer for extracting a set of diversified
keyphrases from a document. DivGraphPointer combines the advantages of
traditional graph-based ranking methods and recent neural network-based
approaches. Specifically, given a document, a word graph is constructed from
the document based on word proximity and is encoded with graph convolutional
networks, which effectively capture document-level word salience by modeling
long-range dependency between words in the document and aggregating multiple
appearances of identical words into one node. Furthermore, we propose a
diversified point network to generate a set of diverse keyphrases out of the
word graph in the decoding process. Experimental results on five benchmark data
sets show that our proposed method significantly outperforms the existing
state-of-the-art approaches.Comment: Accepted to SIGIR 201
Persian Keyphrase Generation Using Sequence-to-Sequence Models
Keyphrases are a very short summary of an input text and provide the main
subjects discussed in the text. Keyphrase extraction is a useful upstream task
and can be used in various natural language processing problems, for example,
text summarization and information retrieval, to name a few. However, not all
the keyphrases are explicitly mentioned in the body of the text. In real-world
examples there are always some topics that are discussed implicitly. Extracting
such keyphrases requires a generative approach, which is adopted here. In this
paper, we try to tackle the problem of keyphrase generation and extraction from
news articles using deep sequence-to-sequence models. These models
significantly outperform the conventional methods such as Topic Rank, KPMiner,
and KEA in the task of keyphrase extraction
Keyphrase Generation: A Multi-Aspect Survey
Extractive keyphrase generation research has been around since the nineties,
but the more advanced abstractive approach based on the encoder-decoder
framework and sequence-to-sequence learning has been explored only recently. In
fact, more than a dozen of abstractive methods have been proposed in the last
three years, producing meaningful keyphrases and achieving state-of-the-art
scores. In this survey, we examine various aspects of the extractive keyphrase
generation methods and focus mostly on the more recent abstractive methods that
are based on neural networks. We pay particular attention to the mechanisms
that have driven the perfection of the later. A huge collection of scientific
article metadata and the corresponding keyphrases is created and released for
the research community. We also present various keyphrase generation and text
summarization research patterns and trends of the last two decades.Comment: 10 pages, 5 tables. Published in proceedings of FRUCT 2019, the 25th
Conference of the Open Innovations Association FRUCT, Helsinki, Finlan
Advances in Automatic Keyphrase Extraction
The main purpose of this thesis is to analyze and propose new improvements in the field of Automatic Keyphrase Extraction, i.e., the field of automatically detecting the key concepts in a document. We will discuss, in particular, supervised machine learning algorithms for keyphrase extraction, by first identifying their shortcomings and then proposing new techniques which exploit contextual information to overcome them.
Keyphrase extraction requires that the key concepts, or \emph{keyphrases}, appear verbatim in the body of the document. We will identify the fact that current algorithms do not use contextual information when detecting keyphrases as one of the main shortcomings of supervised keyphrase extraction. Instead, statistical and positional cues, like the frequency of the candidate keyphrase or its first appearance in the document, are mainly used to determine if a phrase appearing in a document is a keyphrase or not. For this reason, we will prove that a supervised keyphrase extraction algorithm, by using only statistical and positional features, is actually able to extract good keyphrases from documents written in languages that it has never seen.
The algorithm will be trained over a common dataset for the English language, a purpose-collected dataset for the Arabic language, and evaluated on the Italian, Romanian and Portuguese languages as well.
This result is then used as a starting point to develop new algorithms that use contextual information to increase the performance in automatic keyphrase extraction. The first algorithm that we present uses new linguistics features based on anaphora resolution, which is a field of natural language processing that exploits the relations between elements of the discourse as, e.g., pronouns. We evaluate several supervised AKE pipelines based on these features on the well-known SEMEVAL 2010 dataset, and we show that the performance increases when we add such features to a model that employs statistical and positional knowledge only.
Finally, we investigate the possibilities offered by the field of Deep Learning, by proposing six different deep neural networks that perform automatic keyphrase extraction. Such networks are based on bidirectional long-short term memory networks, or on convolutional neural networks, or on a combination of both of them, and on a neural language model which creates a vector representation of each word of the document. These networks are able to learn new features using the the whole document when extracting keyphrases, and they have the advantage of not needing a corpus after being trained to extract keyphrases from new documents. We show that with deep learning based architectures we are able to outperform several other keyphrase extraction algorithms, both supervised and not supervised, used in literature and that the best performances are obtained when we build an additional neural representation of the input document and we append it to the neural language model.
Both the anaphora-based and the deep-learning based approaches show that using contextual information, the performance in supervised algorithms for automatic keyphrase extraction improves. In fact, in the methods presented in this thesis, the algorithms which obtained the best performance are the ones receiving more contextual information, both about the relations of the potential keyphrase with other parts of the document, as in the anaphora based approach, and in the shape of a neural representation of the input document, as in the deep learning approach. In contrast, the approach of using statistical and positional knowledge only allows the building of language agnostic keyphrase extraction algorithms, at the cost of decreased precision and recall
- …