6 research outputs found
Human evaluation of Kea, an automatic keyphrasing system.
This paper describes an evaluation of the Kea automatic keyphrase extraction algorithm. Tools that automatically identify keyphrases are desirable because document keyphrases have numerous applications in digital library systems, but are costly and time consuming to manually assign. Keyphrase extraction algorithms are usually evaluated by comparison to author-specified keywords, but this methodology has several well-known shortcomings. The results presented in this paper are based on subjective evaluations of the quality and appropriateness of keyphrases by human assessors, and make a number of contributions. First, they validate previous evaluations of Kea that rely on author keywords. Second, they show Kea's performance is comparable to that of similar systems that have been evaluated by human assessors. Finally, they justify the use of author keyphrases as a performance metric by showing that authors generally choose good keywords
Interactive document summarisation.
This paper describes the Interactive Document Summariser (IDS), a dynamic document summarisation system, which can help users of digital libraries to access on-line documents more effectively. IDS provides dynamic control over summary characteristics, such as length and topic focus, so that changes made by the user are instantly reflected in an on-screen summary. A range of 'summary-in-context' views support seamless transitions between summaries and their source documents. IDS creates summaries by extracting keyphrases from a document with the Kea system, scoring sentences according to the keyphrases that they contain, and then extracting the highest scoring sentences. We report an evaluation of IDS summaries, in which human assessors identified suitable summary sentences in source documents, against which IDS summaries were judged. We found that IDS summaries were better than baseline summaries, and identify the characteristics of Kea keyphrases that lead to the best summaries
Recommended from our members
Usability of a Keyphrase Browsing Tool Based on a Semantic Cloud Model
The goal of this research was to facilitate the scrutiny and utilization of Web search engine retrieval results. I used a graphical keyphrase browsing interface to visualize the conceptual information space of the results, presenting document characteristics that make document relevance determinations easier
Lexical cohesion analysis for topic segmentation, summarization and keyphrase extraction
Cataloged from PDF version of article.When we express some idea or story, it is inevitable to use words that are semantically
related to each other. When this phenomena is exploited from the aspect
of words in the language, it is possible to infer the level of semantic relationship
between words by observing their distribution and use in discourse. From the
aspect of discourse it is possible to model the structure of the document by observing
the changes in the lexical cohesion in order to attack high level natural
language processing tasks. In this research lexical cohesion is investigated from
both of these aspects by first building methods for measuring semantic relatedness
of word pairs and then using these methods in the tasks of topic segmentation,
summarization and keyphrase extraction.
Measuring semantic relatedness of words requires prior knowledge about the
words. Two different knowledge-bases are investigated in this research. The
first knowledge base is a manually built network of semantic relationships, while
the second relies on the distributional patterns in raw text corpora. In order to
discover which method is effective in lexical cohesion analysis, a comprehensive
comparison of state-of-the art methods in semantic relatedness is made.
For topic segmentation different methods using some form of lexical cohesion
are present in the literature. While some of these confine the relationships only
to word repetition or strong semantic relationships like synonymy, no other work
uses the semantic relatedness measures that can be calculated for any two word
pairs in the vocabulary. Our experiments suggest that topic segmentation performance
improves methods using both classical relationships and word repetition.
Furthermore, the experiments compare the performance of different semantic relatedness
methods in a high level task. The detected topic segments are used in summarization, and achieves better results compared to a lexical chains based
method that uses WordNet.
Finally, the use of lexical cohesion analysis in keyphrase extraction is investigated.
Previous research shows that keyphrases are useful tools in document
retrieval and navigation. While these point to a relation between keyphrases and
document retrieval performance, no other work uses this relationship to identify
keyphrases of a given document. We aim to establish a link between the problems
of query performance prediction (QPP) and keyphrase extraction. To this end,
features used in QPP are evaluated in keyphrase extraction using a Naive Bayes
classifier. Our experiments indicate that these features improve the effectiveness
of keyphrase extraction in documents of different length. More importantly,
commonly used features of frequency and first position in text perform poorly
on shorter documents, whereas QPP features are more robust and achieve better
results.Ercan, GönençPh.D
Design and Evaluation of Phrasier, an Interactive System for Linking Documents Using Keyphrases
When documents are collected together from diverse sources they are unlikely to contain useful hypertext links to support browsing amongst them. Manual, or semi-automated link creation is often infeasibly time-consuming for large document collections. We present Phrasier, an interactive system which automatically introduces links to related material into documents as the user browses and queries a digital library collection. Suitable links are identified using keyphrases that are identified within document text and support both topic-based and interdocument navigation. Previews of link destinations are provided to reduce unproductive link traversals, and important segments of document text are identified and highlighted to support skimming of viewed documents. Evaluation has shown that PhrasierÕs keyphrase-based linking mechanism produces sparse hypertexts, although similar documents tend to have short paths between them. A study using human assessors in a simulated document retrieval task indicated that the generated links are perceived to be useful and relevant