2 research outputs found
Hybrid focused crawling on the Surface and the Dark Web
Focused crawlers enable the automatic discovery of Web resources about a given topic by automatically navigating
through the Web link structure and selecting the hyperlinks to follow by estimating their relevance to the topic of
interest. This work proposes a generic focused crawling framework for discovering resources on any given topic
that reside on the Surface or the Dark Web. The proposed crawler is able to seamlessly navigate through the
Surface Web and several darknets present in the Dark Web (i.e., Tor, I2P, and Freenet) during a single crawl by
automatically adapting its crawling behavior and its classifier-guided hyperlink selection strategy based on the
destination network type and the strength of the local evidence present in the vicinity of a hyperlink. It investigates
11 hyperlink selection methods, among which a novel strategy proposed based on the dynamic linear combination
of a link-based and a parent Web page classifier. This hybrid focused crawler is demonstrated for the discovery of
Web resources containing recipes for producing homemade explosives. The evaluation experiments indicate the
effectiveness of the proposed focused crawler both for the Surface and the Dark Web