Search CORE

80 research outputs found

Using the Annotated Bibliography as a Resource for Indicative Summarization

Author: Kan Min-Yen
Klavans Judith L.
McKeown Kathleen R.
Publication venue
Publication date: 01/01/2002
Field of study

We report on a language resource consisting of 2000 annotated bibliography entries, which is being analyzed as part of our research on indicative document summarization. We show how annotated bibliographies cover certain aspects of summarization that have not been well-covered by other summary corpora, and motivate why they constitute an important form to study for information retrieval. We detail our methodology for collecting the corpus, and overview our document feature markup that we introduced to facilitate summary analysis. We present the characteristics of the corpus, methods of collection, and show its use in finding the distribution of types of information included in indicative summaries and their relative ordering within the summaries.Comment: 8 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Columbia University Academic Commons

Recommended from our members

Evaluation of the DEFINDER System for Fully Automatic Glossary Construction

Author: Klavans Judith L.
Muresan Smaranda
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2001
Field of study

In this paper we present a quantitative and qualitative evaluation of DEFINDER, a rule-based system that mines consumer-oriented full text articles in order to extract definitions and the terms they define. The quantitative evaluation shows that in terms of precision and recall as measured against human performance, DEFINDER obtained 87% and 75% respectively, thereby revealing the incompleteness of existing resources and the ability of DEFINDER to address these gaps. Our basis for comparison is definitions from on-line dictionaries, including the UMLS Metathesaurus. Qualitative evaluation shows that the definitions extracted by our system are ranked higher in terms of user-centered criteria of usability and readability than are definitions from on-line specialized dictionaries. The output of DEFINDER can be used to enhance these dictionaries. DEFINDER output is being incorporated in a system to clarify technical terms for non-specialist users in understandable non-technical language

Columbia University Academic Commons

PubMed Central

Recommended from our members

Evaluation of DEFINDER: A System to Mine Definitions from Consumer-oriented Medical Text

Author: Klavans Judith L.
Muresan Smaranda
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2001
Field of study

In this paper we present DEFINDER, a rule-based system that mines cons umer-oriented full text articles in order to extract definitions and the terms they define. This research is part of Digital Library Project at Columbia University, entitled PERSIVAL (PErsonalized Retrieval and Summarization of Image, Video and Language resources). One goal of the project is to present information to patients in language they can understand. A key component of this stage is to provide accurate and readable lay definitions for technical terms, which may be present in articles of intermediate complexity. The focus of this short paper is on quantitative and qualitative evaluation of the DEFINDER system. Our basis for comparison was definitions from Unified Medical Language System (UMLS), On-line Medical Dictionary (OMD) and Glossary of Popular and Technical Medical Terms (GPTMT). Quantitative evaluations show that DEFINDER obtained 87% precision and 75% recall and reveal the incompleteness of existing resources and the ability of DEFINDER to address gaps. Qualitative evaluation shows that the definitions extracted by our system are ranked higher in terms of user-based criteria of usability and readability than definitions from on-line specialized dictionaries. Thus the output of DEFINDER can be used to enhance existing specialized dictionaries, and also as a key feature in summarizing technical articles for non-specialist users

Columbia University Academic Commons

Recommended from our members

A method for automatically building and evaluating dictionary resources

Author: Klavans Judith L.
Muresan Smaranda
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2002
Field of study

This paper describes a method toward automatically building dictionaries from text. We present DEFINDER, a rule-based system for extraction of definitions from on-line consumer-oriented medical articles. We provide an extensive evaluation on three dimensions: i) performance of the definition extraction technique in terms of precision and recall, ii) quality of the built dictionary as judged both by specialists and lay users, iii) coverage of existing on-line dictionaries. The corpus we used for the study is publicly available. A major contribution of the paper is the range of quantitative and qualitative evaluation methods

Columbia University Academic Commons

Cybersecurity - What's Language got to do with it?

Author: Klavans Judith L.
Publication venue
Publication date: 18/09/2015
Field of study

A new opportunity to explore and leverage the power of computational linguistic methods and analysis in ensuring effective Cybersecurity is presented. This White Paper discusses some of the specific emerging research opportunities, covering human language technologies such as language identification, topic modeling, and information extraction for keyword recognition

Digital Repository at the University of Maryland

Resources for Evaluation of Summarization Techniques

Author: Kan Min-Yen
Klavans Judith L.
Lee Susan
McKeown Kathleen R.
Publication venue
Publication date: 01/01/1998
Field of study

We report on two corpora to be used in the evaluation of component systems for the tasks of (1) linear segmentation of text and (2) summary-directed sentence extraction. We present characteristics of the corpora, methods used in the collection of user judgments, and an overview of the application of the corpora to evaluating the component system. Finally, we discuss the problems and issues with construction of the test set which apply broadly to the construction of evaluation resources for language technologies.Comment: LaTeX source, 5 pages, US Letter, uses lrec98.st

arXiv.org e-Print Archive

CiteSeerX

Using librarian techniques in automatic text summarization for information retrieval

Author: Kan Min-yen
Klavans Judith L.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2002
Field of study

A current application of automatic text summarization is to provide an overview of relevant documents coming from an information retrieval (IR) system. This paper examines how Centrifuser, one such summarization system, was designed with respect to methods used in the library community. We have reviewed these librarian expert techniques to assist information seekers and codified them into eight distinct strategies. We detail how we have operationalized six of these strategies in Centrifuser by computing an informative extract, indicative differences between documents, as well as navigational links to narrow or broaden a user's query. We conclude the paper with results from a preliminary evaluation

Crossref

Columbia University Academic Commons

Recommended from our members

Tackling the Internet Glossary Glut: Automatic Extraction and Evaluation of Genus Phrases

Author: Klavans Judith L.
Passonneau Rebecca
Popper Samuel
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2003
Field of study

This paper addresses the problem of developing methods to be used in the identification and extraction of meaningful semantic components from large online glossaries. We present two sets of results. First, we report on the algorithm, ParseGloss, which was used to analyze definitions, and extract the main concept, or genus phrase. We ran the system on over 12,000 online glossary entries. Second, we present a method to evaluate our results, using human judgments on a collection of definitions from six different sources. This paper discusses our approach to the evaluation process, since the creation of a standard for evaluation is in itself a contribution to the field. The methods we have developed have required addressing the significant challenges of abstracting a single gold standard from multiple naive, human judgments on a highly subjective task. Once the method for creating the standard was developed, we then established the gold standard data. We report on our performance in running ParseGloss over this controlled collection of definitions. Our first set of results presents precision and recall on system performance. Our second results are presented in terms of techniques for determining agreement between human subjects. Success in the ParseGloss algorithm will contribute to the automatic creation of ontologies

Columbia University Academic Commons

Recommended from our members

GIST-IT: Summarizing Email Using Linguistic Knowledge and Machine

Author: Klavans Judith L.
Muresan Smaranda
Tzoukermann Evelyne
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2001
Field of study

We present a system for the automatic extraction of salient information from email messages, thus providing the gist of their meaning. Dealing with email raises several challenges that we address in this paper: heterogeneous data in terms of length and topic. Our method combines shallow linguistic processing with machine learning to extract phrasal units that are representative of email content. The GIST-IT application is fully implemented and embedded in an active mailbox platform. Evaluation was performed over three machine learning paradigms

Columbia University Academic Commons

Applying natural language generation to indicative summarization

Author: Kan Min-yen
Klavans Judith L.
McKeown Kathleen
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2001
Field of study

The task of creating indicative summaries that help a searcher decide whether to read a particular document is a difficult task. This paper examines the indicative summarization task from a generation perspective, by first analyzing its required content via published guidelines and corpus analysis. We show how these summaries can be factored into a set of document features, and how an implemented content planner uses the topicality document feature to create indicative multidocument query-based summaries

arXiv.org e-Print Archive

CiteSeerX

Crossref

Columbia University Academic Commons