1 research outputs found

    An Experiment to Test URL Features for Web Page Classification

    Get PDF
    Web page classification has been extensively researched, using different types of features that are extracted either from the page content, the page structure or from other pages that link to that page. Using features from the page itself implies having to download it before its classification. We present an experiment to proof that URL tokens contain information enough to extract features to classify web pages. A classifier based on these features is able to classify a web page without having to download it previously, avoiding unnecessary downloads.Ministerio de Educaci贸n y Ciencia TIN2007-64119Junta de Andaluc铆a P07-TIC-2602Junta de Andaluc铆a P08- TIC-4100Ministerio de Ciencia e Innovaci贸n TIN2010-09809-EMinisterio de Ciencia e Innovaci贸n TIN2010-21744Ministerio de Ciencia e Innovaci贸n TIN2008-04718-EMinisterio de Ciencia e Innovaci贸n TIN2010-10811-EMinisterio de Ciencia e Innovaci贸n TIN2010-09988-
    corecore