Institute of Electrical and Electronics Engineers (IEEE) Inc.
Abstract
There is a growing interest in the use of context-awareness as a technique for developing pervasive computing applications that are
flexible and adaptable for users. In this context, however, information retrieval (IR) is often defined in terms of location and delivery
of documents to a user to satisfy their information need. In most cases, morphological variants of words have similar semantic
interpretations and can be considered as equivalent for the purpose of IR applications. Consequently, document indexing will also be
more meaningful if semantically related root words are used instead of stems. The popular Porter’s stemmer was studied with the aim
to produce intelligible stems. In this paper, we propose Context-Aware Stemming (CAS) algorithm, which is a modified version of
the extensively used Porter’s stemmer. Considering only generated meaningful stemming words as the stemmer output, the results
show that the modified algorithm significantly reduces the error rate of Porter’s algorithm from 76.7% to 6.7% without compromising
the efficacy of Porter’s algorithm