The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is\ud dynamically generated through querying databases — which are\ud referred to as Hidden Web databases. Documents returned in\ud response to a user query are typically presented using templategenerated Web pages. This paper proposes a novel approach that identifies Web page templates by analysing the textual contents and the adjacent tag structures of a document in order to extract query-related data. Preliminary results demonstrate that our approach effectively detects templates and retrieves data with high recall and precision
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.