Search CORE

2 research outputs found

Query-related data extraction of hidden web documents

Author: Hedley Y.
James A.
Sanderson M.
Younas M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is dynamically generated through querying databases — which are referred to as Hidden Web databases. Documents returned in response to a user query are typically presented using templategenerated Web pages. This paper proposes a novel approach that identifies Web page templates by analysing the textual contents and the adjacent tag structures of a document in order to extract query-related data. Preliminary results demonstrate that our approach effectively detects templates and retrieves data with high recall and precision

Crossref

White Rose Research Online

Query-Related Data Extraction of Hidden Web Documents

Author: A. James
M. Younas
Y. L. Hedley
Publication venue
Publication date: 01/01/2004
Field of study

CiteSeerX