2,702 research outputs found
Abstracts and Abstracting in Knowledge Discovery
published or submitted for publicatio
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Indexing with WordNet synsets can improve Text Retrieval
The classical, vector space model for text retrieval is shown to give better
results (up to 29% better in our experiments) if WordNet synsets are chosen as
the indexing space, instead of word forms. This result is obtained for a
manually disambiguated test collection (of queries and documents) derived from
the Semcor semantic concordance. The sensitivity of retrieval performance to
(automatic) disambiguation errors when indexing documents is also measured.
Finally, it is observed that if queries are not disambiguated, indexing by
synsets performs (at best) only as good as standard word indexing.Comment: 7 pages, LaTeX2e, 3 eps figures, uses epsfig, colacl.st
Template Mining for Information Extraction from Digital Documents
published or submitted for publicatio
Automatic summarising: factors and directions
This position paper suggests that progress with automatic summarising demands
a better research methodology and a carefully focussed research strategy. In
order to develop effective procedures it is necessary to identify and respond
to the context factors, i.e. input, purpose, and output factors, that bear on
summarising and its evaluation. The paper analyses and illustrates these
factors and their implications for evaluation. It then argues that this
analysis, together with the state of the art and the intrinsic difficulty of
summarising, imply a nearer-term strategy concentrating on shallow, but not
surface, text analysis and on indicative summarising. This is illustrated with
current work, from which a potentially productive research programme can be
developed
Machine Learning of Generic and User-Focused Summarization
A key problem in text summarization is finding a salience function which
determines what information in the source should be included in the summary.
This paper describes the use of machine learning on a training corpus of
documents and their abstracts to discover salience functions which describe
what combination of features is optimal for a given summarization task. The
method addresses both "generic" and user-focused summaries.Comment: In Proceedings of the Fifteenth National Conference on AI (AAAI-98),
p. 821-82
- …