1 research outputs found

    An Architecture for Efficient Web Crawling

    Get PDF
    Virtual Integration systems require a crawling tool able to navigate and reach relevant pages in the Deep Web in an efficient way. Existing proposals in the crawling area fulfill some of these requirements, but most of them need to download pages in order to classify them as relevant or not. We propose a crawler supported by a web page classifier that uses solely a page URL to determine page relevance. Such a crawler is able to choose in each step only the URLs that lead to relevant pages, and therefore reduces the number of unnecessary pages downloaded, minimising bandwidth and making it efficient and suitable for virtual integration systems.Ministerio de Educaci贸n y Ciencia TIN2007-64119Junta de Andaluc铆a P07-TIC-2602Junta de Andaluc铆a P08- TIC-4100Ministerio de Ciencia e Innovaci贸n TIN2008-04718-EMinisterio de Ciencia e Innovaci贸n TIN2010-21744Ministerio de Econom铆a, Industria y Competitividad TIN2010-09809-EMinisterio de Ciencia e Innovaci贸n TIN2010-10811-EMinisterio de Ciencia e Innovaci贸n TIN2010-09988-
    corecore