Research Track Paper Web Object Indexing Using Domain Knowledge

Abstract

A web object is defined to represent any meaningful object embedded in web pages (e.g. images, music) or pointed to by hyperlinks (e.g. downloadable files). In many cases, users would like to search for information of a certain ‘object’, rather than a web page containing the query terms. To facilitate web object searching and organizing, in this paper, we propose a novel approach to web object indexing, by discovering its inherent structure information with existed domain knowledge. In our approach, first, Layered LSI spaces are built for a better representation of the hierarchically structured domain knowledge, in order to emphasize the specific semantics and term space in each layer of the domain knowledge. Meanwhile, the web object representation is constructed by hyperlink analysis, and further pruned to remove the noises. Then an optimal matching between the web object and the domain knowledge is performed, in order to pick out the structure attributes of the web object from the knowledge. Finally, the obtained structure attributes are used to re-organize and index the web objects. Our approach also indicates a new promising way to use trust-worthy Deep Web knowledge to help organize dispersive information of Surfac

    Similar works

    Full text

    thumbnail-image

    Available Versions