1 research outputs found
An Experiment to Test URL Features for Web Page Classification
Web page classification has been extensively researched, using different
types of features that are extracted either from the page content, the page structure
or from other pages that link to that page. Using features from the page itself
implies having to download it before its classification. We present an experiment
to proof that URL tokens contain information enough to extract features to classify
web pages. A classifier based on these features is able to classify a web page without
having to download it previously, avoiding unnecessary downloads.Ministerio de Educaci贸n y Ciencia TIN2007-64119Junta de Andaluc铆a P07-TIC-2602Junta de Andaluc铆a P08- TIC-4100Ministerio de Ciencia e Innovaci贸n TIN2010-09809-EMinisterio de Ciencia e Innovaci贸n TIN2010-21744Ministerio de Ciencia e Innovaci贸n TIN2008-04718-EMinisterio de Ciencia e Innovaci贸n TIN2010-10811-EMinisterio de Ciencia e Innovaci贸n TIN2010-09988-