804 research outputs found
Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization
In Automatic Text Summarization, preprocessing is an important phase to
reduce the space of textual representation. Classically, stemming and
lemmatization have been widely used for normalizing words. However, even using
normalization on large texts, the curse of dimensionality can disturb the
performance of summarizers. This paper describes a new method for normalization
of words to further reduce the space of representation. We propose to reduce
each word to its initial letters, as a form of Ultra-stemming. The results show
that Ultra-stemming not only preserve the content of summaries produced by this
representation, but often the performances of the systems can be dramatically
improved. Summaries on trilingual corpora were evaluated automatically with
Fresa. Results confirm an increase in the performance, regardless of summarizer
system used.Comment: 22 pages, 12 figures, 9 table
Accurate user directed summarization from existing tools
This paper describes a set of experimental
results produced from the TIPSTER
SUMMAC initiative on user directed
summaries: document summaries generated in
the context of an information need expressed
as a query. The summarizer that was
evaluated was based on a set of existing
statistical techniques that had been applied
successfully to the INQUERY retrieval system.
The techniques proved to have a wider utility,
however, as the summarizer was one of the
better performing systems in the SUMMAC
evaluation. The design of this summarizer is
presented with a range of evaluations: both
those provided by SUMMAC as well as a set of
preliminary, more informal, evaluations that
examined additional aspects of the summaries.
Amongst other conclusions, the results reveal
that users can judge the relevance of
documents from their summary almost as
accurately as if they had had access to the
document’s full text
Text Summarization Techniques: A Brief Survey
In recent years, there has been a explosion in the amount of text data from a
variety of sources. This volume of text is an invaluable source of information
and knowledge which needs to be effectively summarized to be useful. In this
review, the main approaches to automatic text summarization are described. We
review the different processes for summarization and describe the effectiveness
and shortcomings of the different methods.Comment: Some of references format have update
Recommended from our members
Politeness and bias in dialogue summarization: two exploratory studies
In this chapter, two empirical pilot studies on the role of politeness in dialogue summarization are described. In these studies, a collection of four dialogues was used. Each dialogue was automatically generated by the NECA system and the politeness of the dialogue participants was
systematically manipulated. Subjects were divided into groups who had to summarize the dialogues from a particular dialogue participant’s point of view or the point of view of an impartial observer. In the first study, there were no other constraints. In the second study, the summarizers were restricted to summaries whose length did not exceed 10% of the number of words in the dialogue that was being summarized. Amongst other things, it was found that the politeness of the interaction is
included more often in summaries of dialogues that deviate from what would be considered normal or unmarked. A comparison of the results of the two studies suggests that the extent to which politeness is reported is not affected by how long a summary is allowed to be. It was also found that the point of view of the summarizer influences which information is included in the summary and how it is presented. This finding did not seem to be affected by the constraint in our second study on the summary length
- …