Search CORE

4,607 research outputs found

Improving Term Extraction with Terminological Resources

Author: C.G. Chute
T.G.O. Consortium
T.G.O. Consortium
Y. Tsuruoka
Publication venue
Publication date: 01/01/2006
Field of study

Studies of different term extractors on a corpus of the biomedical domain revealed decreasing performances when applied to highly technical texts. The difficulty or impossibility of customising them to new domains is an additional limitation. In this paper, we propose to use external terminologies to influence generic linguistic data in order to augment the quality of the extraction. The tool we implemented exploits testified terms at different steps of the process: chunking, parsing and extraction of term candidates. Experiments reported here show that, using this method, more term candidates can be acquired with a higher level of reliability. We further describe the extraction process involving endogenous disambiguation implemented in the term extractor YaTeA

arXiv.org e-Print Archive

Crossref

HAL-Paris 13

Using COTS Search Engines and Custom Query Strategies at CLEF

Author: Barrière Caroline
Foster George
Jarmasz Mario
Nadeau David
St-Jacques Claude
Publication venue
Publication date: 01/01/2004
Field of study

This paper presents a system for bilingual information retrieval using commercial off-the-shelf search engines (COTS). Several custom query construction, expansion and translation strategies are compared. We present the experiments and the corresponding results for the CLEF 2004 event

Extraction of Keyphrases from Text: Evaluation of Four Algorithms

Author: Turney Peter
Publication venue
Publication date: 01/01/1997
Field of study

This report presents an empirical evaluation of four algorithms for automatically extracting keywords and keyphrases from documents. The four algorithms are compared using five different collections of documents. For each document, we have a target set of keyphrases, which were generated by hand. The target keyphrases were generated for human readers; they were not tailored for any of the four keyphrase extraction algorithms. Each of the algorithms was evaluated by the degree to which the algorithms keyphrases matched the manually generated keyphrases. The four algorithms were (1) the AutoSummarize feature in Microsofts Word 97, (2) an algorithm based on Eric Brills part-of-speech tagger, (3) the Summarize feature in Veritys Search 97, and (4) NRCs Extractor algorithm. For all five document collections, NRCs Extractor yields the best match with the manually generated keyphrases

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

Learning to Extract Keyphrases from Text

Author: Turney Peter
Publication venue
Publication date: 01/01/1999
Field of study

Many academic journals ask their authors to provide a list of about five to fifteen key words, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a surprisingly wide variety of tasks for which keyphrases are useful, as we discuss in this paper. Recent commercial software, such as Microsoft?s Word 97 and Verity?s Search 97, includes algorithms that automatically extract keyphrases from documents. In this paper, we approach the problem of automatically extracting keyphrases from text as a supervised learning task. We treat a document as a set of phrases, which the learning algorithm must learn to classify as positive or negative examples of keyphrases. Our first set of experiments applies the C4.5 decision tree induction algorithm to this learning task. The second set of experiments applies the GenEx algorithm to the task. We developed the GenEx algorithm specifically for this task. The third set of experiments examines the performance of GenEx on the task of metadata generation, relative to the performance of Microsoft?s Word 97. The fourth and final set of experiments investigates the performance of GenEx on the task of highlighting, relative to Verity?s Search 97. The experimental results support the claim that a specialized learning algorithm (GenEx) can generate better keyphrases than a general-purpose learning algorithm (C4.5) and the non-learning algorithms that are used in commercial software (Word 97 and Search 97)

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

Generating indicative-informative summaries with SumUM

Author: Benbrahim Mohamed
Guy Lapalme
Horacio Saggion
Jing Hongyan
Johnson Frances C
Jordan Michael P
Radev Dragomir R
Teufel S.
Tombros Anastasios
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2002
Field of study

We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies

CiteSeerX

Crossref

White Rose Research Online