Downloading Deep Web Data from Real Web Services

Fu, Chong

thesis

Downloading Deep Web Data from Real Web Services

Authors: Chong Fu
Publication date: 1 January 2011
Publisher: 'University of Windsor Leddy Library'

Abstract

Data of deep web in general is stored in a database that is only accessible via web query forms or through web service interfaces. One challenge of deep web crawling is how to select meaningful queries to acquire data. There is substantial research on the selection of queries, such as the approach based on the set covering problem where greedy algorithm or its variation is used. These methods are not extensively studied in the context of real web services, which may impose new challenges for deep web crawling. This thesis studies several query selection methods on Microsoft’s Bing web service, especially the impact of the ranking of the returns in real data sources. Our results show that for unranked data sources, weighted method performed a little better then un-weighted set covering algorithm. For ranked data sources, document frequent estimation is necessary to harvest data more efficiently

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Scholarship at UWindsor

oai:scholar.uwindsor.ca:etd-13...

Last time updated on 01/12/2016