Article thumbnail

ABSTRACT

By Reinaldo Viana Alvares and Rubem Mondaini

Abstract

In this work we stress the importance of the qualitative evaluation approach of stemming algorithms. The evaluation methods of stemming errors are usually depicted in a quantitative analysis [2]. The application of stemming algorithms to a previously chosen word leads to the isolation of its stem. The stem is considered a concise representation of a word and should be seen as its smallest and unambiguous root [1]. It should also be sufficiently broad in order to capture its meaning as well as its multiple variations [3]. A basic example could be represented by the words biology, biologist and biological, all of which can be represented by the correspondent stem biolog. The results of algorithm applications may be classified as understemming and overstemming. The first group corresponds to the prediction of a larger stem as the correct one. The prediction of the second group is a smaller stem. This will also imply a generation of different or equal stems for synonymous words, respectively [4]. Our aim is the proposal of a quantitative method for the assessment of stemming algorithms. The figure below hopes to convey an example of the proposed method

Year: 2015
OAI identifier: oai:CiteSeerX.psu:10.1.1.547.5515
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://www.sbmac.org.br/evento... (external link)
  • http://www.sbmac.org.br/evento... (external link)
  • http://citeseerx.ist.psu.edu/v... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.