Skip to main content
Article thumbnail
Location of Repository

Query-related data extraction of hidden web documents

By Y. Hedley, M. Younas, A. James and M. Sanderson

Abstract

The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is\ud dynamically generated through querying databases — which are\ud referred to as Hidden Web databases. Documents returned in\ud response to a user query are typically presented using templategenerated Web pages. This paper proposes a novel approach that identifies Web page templates by analysing the textual contents and the adjacent tag structures of a document in order to extract query-related data. Preliminary results demonstrate that our approach effectively detects templates and retrieves data with high recall and precision

Publisher: ACM
Year: 2004
OAI identifier: oai:eprints.whiterose.ac.uk:4545

Suggested articles


To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.