141,942 research outputs found
Two-Stage Document Length Normalization for Information Retrieval
The standard approach for term frequency normalization is based only on the
document length. However, it does not distinguish the verbosity from the scope,
these being the two main factors determining the document length. Because the
verbosity and scope have largely different effects on the increase in term
frequency, the standard approach can easily suffer from insufficient or
excessive penalization depending on the specific type of long document. To
overcome these problems, this paper proposes two-stage normalization by
performing verbosity and scope normalization separately, and by employing
different penalization functions. In verbosity normalization, each document is
pre-normalized by dividing the term frequency by the verbosity of the document.
In scope normalization, an existing retrieval model is applied in a
straightforward manner to the pre-normalized document, finally leading us to
formulate our proposed verbosity normalized (VN) retrieval model. Experimental
results carried out on standard TREC collections demonstrate that the VN model
leads to marginal but statistically significant improvements over standard
retrieval models.Comment: 40 pages (to appear in ACM TOIS
INTRODUCTION
This is the introduction to the content of the jounrnal\u27s special issue (vol. 4 no. 1 / January 2013) celebrating the tenth anniversary of the International Society for Comparative Studies of Chinese and Western Philosophy (ISCWP), which includes five peer-reviewed articles by ISCWP members
- …
