1 research outputs found

    Weighting tags and paths in XML documents according to their topic generalization

    No full text
    Text-centric (or document-centric) XML document retrieval aims to rank search results according to their relevance to a given query. To do this, most existing methods mainly rely on content terms and often ignore an important factor - the XML tags and paths, which are useful in determining the important contents of a document. In some previous studies, each unique tag/path is assigned a weight based on domain (expert) knowledge. However, such a manual assignment is both inefficient and subjective. In this paper, we propose an automatic method to infer the weights of tags/paths according to the topical relationship between the corresponding elements and the whole documents. The more the corresponding element can generalize the document's topic, the more the tag/path is considered to be important. We define a model based on Average Topic Generalization (ATG), which integrates several features used in previous studies. We evaluate the performance of the ATG-based model on two real data sets, the IEEECS collection and the Wikipedia collection, from two different perspectives: the correlation between the weights generated by ATG and those set by experts, and the performance of XML retrieval based on ATG. Experimental results show that the tag/path weights generated by ATG are highly correlated with the manually assigned weights, and the ATG model significantly improves XML retrieval effectiveness. (C) 2013 Elsevier Inc. All rights reserved
    corecore