352 research outputs found
DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases
Keyphrase extraction from documents is useful to a variety of applications
such as information retrieval and document summarization. This paper presents
an end-to-end method called DivGraphPointer for extracting a set of diversified
keyphrases from a document. DivGraphPointer combines the advantages of
traditional graph-based ranking methods and recent neural network-based
approaches. Specifically, given a document, a word graph is constructed from
the document based on word proximity and is encoded with graph convolutional
networks, which effectively capture document-level word salience by modeling
long-range dependency between words in the document and aggregating multiple
appearances of identical words into one node. Furthermore, we propose a
diversified point network to generate a set of diverse keyphrases out of the
word graph in the decoding process. Experimental results on five benchmark data
sets show that our proposed method significantly outperforms the existing
state-of-the-art approaches.Comment: Accepted to SIGIR 201
Adaptive text mining: Inferring structure from sequences
Text mining is about inferring structure from sequences representing natural language text, and may be defined as the process of analyzing text to extract information that is useful for particular purposes. Although hand-crafted heuristics are a common practical approach for extracting information from text, a general, and generalizable, approach requires adaptive techniques. This paper studies the way in which the adaptive techniques used in text compression can be applied to text mining. It develops several examples: extraction of hierarchical phrase structures from text, identification of keyphrases in documents, locating proper names and quantities of interest in a piece of text, text categorization, word segmentation, acronym extraction, and structure recognition. We conclude that compression forms a sound unifying principle that allows many text mining problems to be tacked adaptively
Interactive document summarisation.
This paper describes the Interactive Document Summariser (IDS), a dynamic document summarisation system, which can help users of digital libraries to access on-line documents more effectively. IDS provides dynamic control over summary characteristics, such as length and topic focus, so that changes made by the user are instantly reflected in an on-screen summary. A range of 'summary-in-context' views support seamless transitions between summaries and their source documents. IDS creates summaries by extracting keyphrases from a document with the Kea system, scoring sentences according to the keyphrases that they contain, and then extracting the highest scoring sentences. We report an evaluation of IDS summaries, in which human assessors identified suitable summary sentences in source documents, against which IDS summaries were judged. We found that IDS summaries were better than baseline summaries, and identify the characteristics of Kea keyphrases that lead to the best summaries
Human evaluation of Kea, an automatic keyphrasing system.
This paper describes an evaluation of the Kea automatic keyphrase extraction algorithm. Tools that automatically identify keyphrases are desirable because document keyphrases have numerous applications in digital library systems, but are costly and time consuming to manually assign. Keyphrase extraction algorithms are usually evaluated by comparison to author-specified keywords, but this methodology has several well-known shortcomings. The results presented in this paper are based on subjective evaluations of the quality and appropriateness of keyphrases by human assessors, and make a number of contributions. First, they validate previous evaluations of Kea that rely on author keywords. Second, they show Kea's performance is comparable to that of similar systems that have been evaluated by human assessors. Finally, they justify the use of author keyphrases as a performance metric by showing that authors generally choose good keywords
- …