A Focused Crawler in order to Get Semantic Web Resources (CSR)

Barbosa Santillán, Liliana Ibeth; Campos Quirarte, Juana Elizabeth; Castro Munguía, Aldo

research

A Focused Crawler in order to Get Semantic Web Resources (CSR)

Authors: Liliana Ibeth Barbosa Santillán
Juana Elizabeth Campos Quirarte
Aldo Castro Munguía
Publication date: 1 January 2013
Publisher: E.T.S. de Ingenieros Informáticos (UPM)

Abstract

This paper presents a Focused Crawler in order to Get Semantic Web Resources (CSR). Structured data web are available in formats such as Extensible Markup Language (XML), Resource Description Framework (RDF) and Ontology Web Language (OWL) that can be used for processing. One of the main challenges for performing a manual search and download semantic web resources is that this task consumes a lot of time. Our research work propose a focused crawler which allow to download these resources automatically and store them on disk in order to have a collection that will be used for data processing. CRS consists of three layers: (a) The User Interface Layer, (b) The Focus Crawler Layer and (c) The Base Crawler Layer. CSR uses as a selection policie the Shark-Search method. CSR was conducted with two experiments. The first one starts on December 15 2012 at 7:11 am and ends on December 16 2012 at 4:01 were obtained 448,123,537 bytes of data. The CSR ends by itself after to analyze 80,4375 seeds with an unlimited depth. CSR got 16,576 semantic resources files where the 89 % was RDF, the 10 % was XML and the 1% was OWL. The second one was based on the Web Data Commons work of the Research Group Data and Web Science at the University of Mannheim and the Institute AIFB at the Karlsruhe Institute of Technology. This began at 4:46 am of June 2 2013 and 1:37 am June 9 2013. After 162.51 hours of execution the result was 285,279 semantic resources where predominated the XML resources with 99 % and OWL and RDF with 1 % each one

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Archivo Digital UPM

oai:oa.upm.es:36867

Last time updated on 08/05/2016

Servicio de Coordinación de Bibliotecas de la Universidad Politécnica de Madrid

oai:oa.upm.es:36867

Last time updated on 10/02/2018