702 research outputs found
DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases
Keyphrase extraction from documents is useful to a variety of applications
such as information retrieval and document summarization. This paper presents
an end-to-end method called DivGraphPointer for extracting a set of diversified
keyphrases from a document. DivGraphPointer combines the advantages of
traditional graph-based ranking methods and recent neural network-based
approaches. Specifically, given a document, a word graph is constructed from
the document based on word proximity and is encoded with graph convolutional
networks, which effectively capture document-level word salience by modeling
long-range dependency between words in the document and aggregating multiple
appearances of identical words into one node. Furthermore, we propose a
diversified point network to generate a set of diverse keyphrases out of the
word graph in the decoding process. Experimental results on five benchmark data
sets show that our proposed method significantly outperforms the existing
state-of-the-art approaches.Comment: Accepted to SIGIR 201
SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications
We describe the SemEval task of extracting keyphrases and relations between
them from scientific documents, which is crucial for understanding which
publications describe which processes, tasks and materials. Although this was a
new task, we had a total of 26 submissions across 3 evaluation scenarios. We
expect the task and the findings reported in this paper to be relevant for
researchers working on understanding scientific content, as well as the broader
knowledge base population and information extraction communities
Keyphrase Based Evaluation of Automatic Text Summarization
The development of methods to deal with the informative contents of the text
units in the matching process is a major challenge in automatic summary
evaluation systems that use fixed n-gram matching. The limitation causes
inaccurate matching between units in a peer and reference summaries. The
present study introduces a new Keyphrase based Summary Evaluator KpEval for
evaluating automatic summaries. The KpEval relies on the keyphrases since they
convey the most important concepts of a text. In the evaluation process, the
keyphrases are used in their lemma form as the matching text unit. The system
was applied to evaluate different summaries of Arabic multi-document data set
presented at TAC2011. The results showed that the new evaluation technique
correlates well with the known evaluation systems: Rouge1, Rouge2, RougeSU4,
and AutoSummENG MeMoG. KpEval has the strongest correlation with AutoSummENG
MeMoG, Pearson and spearman correlation coefficient measures are 0.8840, 0.9667
respectively.Comment: 4 pages, 1 figure, 3 table
Multi-Task Learning of Keyphrase Boundary Classification
Keyphrase boundary classification (KBC) is the task of detecting keyphrases
in scientific articles and labelling them with respect to predefined types.
Although important in practice, this task is so far underexplored, partly due
to the lack of labelled data. To overcome this, we explore several auxiliary
tasks, including semantic super-sense tagging and identification of multi-word
expressions, and cast the task as a multi-task learning problem with deep
recurrent neural networks. Our multi-task models perform significantly better
than previous state of the art approaches on two scientific KBC datasets,
particularly for long keyphrases.Comment: ACL 201
KERT: Automatic Extraction and Ranking of Topical Keyphrases from Content-Representative Document Titles
We introduce KERT (Keyphrase Extraction and Ranking by Topic), a framework
for topical keyphrase generation and ranking. By shifting from the
unigram-centric traditional methods of unsupervised keyphrase extraction to a
phrase-centric approach, we are able to directly compare and rank phrases of
different lengths. We construct a topical keyphrase ranking function which
implements the four criteria that represent high quality topical keyphrases
(coverage, purity, phraseness, and completeness). The effectiveness of our
approach is demonstrated on two collections of content-representative titles in
the domains of Computer Science and Physics.Comment: 9 page
- …