80 research outputs found
Using the Annotated Bibliography as a Resource for Indicative Summarization
We report on a language resource consisting of 2000 annotated bibliography
entries, which is being analyzed as part of our research on indicative document
summarization. We show how annotated bibliographies cover certain aspects of
summarization that have not been well-covered by other summary corpora, and
motivate why they constitute an important form to study for information
retrieval. We detail our methodology for collecting the corpus, and overview
our document feature markup that we introduced to facilitate summary analysis.
We present the characteristics of the corpus, methods of collection, and show
its use in finding the distribution of types of information included in
indicative summaries and their relative ordering within the summaries.Comment: 8 pages, 3 figure
Recommended from our members
Evaluation of the DEFINDER System for Fully Automatic Glossary Construction
In this paper we present a quantitative and qualitative evaluation of DEFINDER, a rule-based system that mines consumer-oriented full text articles in order to extract definitions and the terms they define. The quantitative evaluation shows that in terms of precision and recall as measured against human performance, DEFINDER obtained 87% and 75% respectively, thereby revealing the incompleteness of existing resources and the ability of DEFINDER to address these gaps. Our basis for comparison is definitions from on-line dictionaries, including the UMLS Metathesaurus. Qualitative evaluation shows that the definitions extracted by our system are ranked higher in terms of user-centered criteria of usability and readability than are definitions from on-line specialized dictionaries. The output of DEFINDER can be used to enhance these dictionaries. DEFINDER output is being incorporated in a system to clarify technical terms for non-specialist users in understandable non-technical language
Recommended from our members
Evaluation of DEFINDER: A System to Mine Definitions from Consumer-oriented Medical Text
In this paper we present DEFINDER, a rule-based system that mines cons umer-oriented full text articles in order to extract definitions and the terms they define. This research is part of Digital Library Project at Columbia University, entitled PERSIVAL (PErsonalized Retrieval and Summarization of Image, Video and Language resources). One goal of the project is to present information to patients in language they can understand. A key component of this stage is to provide accurate and readable lay definitions for technical terms, which may be present in articles of intermediate complexity. The focus of this short paper is on quantitative and qualitative evaluation of the DEFINDER system. Our basis for comparison was definitions from Unified Medical Language System (UMLS), On-line Medical Dictionary (OMD) and Glossary of Popular and Technical Medical Terms (GPTMT). Quantitative evaluations show that DEFINDER obtained 87% precision and 75% recall and reveal the incompleteness of existing resources and the ability of DEFINDER to address gaps. Qualitative evaluation shows that the definitions extracted by our system are ranked higher in terms of user-based criteria of usability and readability than definitions from on-line specialized dictionaries. Thus the output of DEFINDER can be used to enhance existing specialized dictionaries, and also as a key feature in summarizing technical articles for non-specialist users
Recommended from our members
A method for automatically building and evaluating dictionary resources
This paper describes a method toward automatically building dictionaries from text. We present DEFINDER, a rule-based system for extraction of definitions from on-line consumer-oriented medical articles. We provide an extensive evaluation on three dimensions: i) performance of the definition extraction technique in terms of precision and recall, ii) quality of the built dictionary as judged both by specialists and lay users, iii) coverage of existing on-line dictionaries. The corpus we used for the study is publicly available. A major contribution of the paper is the range of quantitative and qualitative evaluation methods
Cybersecurity - What's Language got to do with it?
A new opportunity to explore and leverage the power of computational
linguistic methods and analysis in ensuring effective Cybersecurity is
presented. This White Paper discusses some of the specific emerging
research opportunities, covering human language technologies such as
language identification, topic modeling, and information extraction for
keyword recognition
Resources for Evaluation of Summarization Techniques
We report on two corpora to be used in the evaluation of component systems
for the tasks of (1) linear segmentation of text and (2) summary-directed
sentence extraction. We present characteristics of the corpora, methods used in
the collection of user judgments, and an overview of the application of the
corpora to evaluating the component system. Finally, we discuss the problems
and issues with construction of the test set which apply broadly to the
construction of evaluation resources for language technologies.Comment: LaTeX source, 5 pages, US Letter, uses lrec98.st
Using librarian techniques in automatic text summarization for information retrieval
A current application of automatic text summarization is to provide an overview of relevant documents coming from an information retrieval (IR) system. This paper examines how Centrifuser, one such summarization system, was designed with respect to methods used in the library community. We have reviewed these librarian expert techniques to assist information seekers and codified them into eight distinct strategies. We detail how we have operationalized six of these strategies in Centrifuser by computing an informative extract, indicative differences between documents, as well as navigational links to narrow or broaden a user's query. We conclude the paper with results from a preliminary evaluation
Recommended from our members
Tackling the Internet Glossary Glut: Automatic Extraction and Evaluation of Genus Phrases
This paper addresses the problem of developing methods to be used in the identification and extraction of meaningful semantic components from large online glossaries. We present two sets of results. First, we report on the algorithm, ParseGloss, which was used to analyze definitions, and extract the main concept, or genus phrase. We ran the system on over 12,000 online glossary entries. Second, we present a method to evaluate our results, using human judgments on a collection of definitions from six different sources. This paper discusses our approach to the evaluation process, since the creation of a standard for evaluation is in itself a contribution to the field. The methods we have developed have required addressing the significant challenges of abstracting a single gold standard from multiple naive, human judgments on a highly subjective task. Once the method for creating the standard was developed, we then established the gold standard data. We report on our performance in running ParseGloss over this controlled collection of definitions. Our first set of results presents precision and recall on system performance. Our second results are presented in terms of techniques for determining agreement between human subjects. Success in the ParseGloss algorithm will contribute to the automatic creation of ontologies
Recommended from our members
GIST-IT: Summarizing Email Using Linguistic Knowledge and Machine
We present a system for the automatic extraction of salient information from email messages, thus providing the gist of their meaning. Dealing with email raises several challenges that we address in this paper: heterogeneous data in terms of length and topic. Our method combines shallow linguistic processing with machine learning to extract phrasal units that are representative of email content. The GIST-IT application is fully implemented and embedded in an active mailbox platform. Evaluation was performed over three machine learning paradigms
Applying natural language generation to indicative summarization
The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a generation perspective, by first analyzing its required content via published guidelines and corpus analysis. We show how these summaries can be factored into a set of document features, and how an implemented content planner uses the topicality document feature to create indicative multidocument query-based summaries
- …