Search CORE

2 research outputs found

From HTML Documents to Web Tables and Rules

Author: Simon K.
Lausen G.
Boley Harold
Publication venue
Publication date: 01/03/2016
Field of study

We present a browser-extending Semantic Web extraction system that maps HTML documents to tables and, where possible, to rules. First, the basic data extractor ViPER distills and reorganizes semi-structured information into a tabular data structure, which can again be browsed and/or submitted to further machine processing. Second, exemplifying the latter, the extended knowledge extractor Rex ViPER mines the resulting tables for structural properties and functional dependencies. Rules are generated to obtain a more compact and manageable, often also enriched, knowledge representation. The resulting fully structured information, RuleML-serialized facts and rules, can be stored along with the orginal documents, queried by rule engines such as OO jDREW and FLORID, and interchanged between Web Services. Thus Rex ViPER contributes to automating the construction of a machine-processable Semantic Web.Nous pr\ue9sentons un syst\ue8me d'extraction du Web s\ue9mantique, qui joute des fonctionnalit\ue9s au navigateur et \ue9tablit des correspondances entre des documents HTML et des tables et, si possible, des r\ue8gles. Premi\ue8rement, l'extracteur des donn\ue9es de base ViPER distille une information semi-structur\ue9e et la r\ue9organise en une structure tabulaire de donn\ue9es, que l'on peut \ue9galement explorer et/ou soumettre \ue0 un traitement machine additionnel. Deuxi\ue8mement, pour illustrer ce traitement, l'extracteur \ue9tendu de connaissances Rex ViPER explore les tables obtenues pour y relever les propri\ue9t\ue9s structurelles et les d\ue9pendances fonctionnelles. Des r\ue8gles sont g\ue9n\ue9r\ue9es pour produire une repr\ue9sentation plus compacte et g\ue9rable, et souvent enrichie, des connaissances. L'information enti\ue8rement structur\ue9e ainsi que les faits et les r\ue8gles s\ue9rialis\ue9s par RuleML que l'on obtient peuvent \ueatre stock\ue9s avec les documents originaux, faire l'objet de requ\ueates \ue9mises par des moteurs de r\ue8gles, tels que OO jDREW et FLORID et \ueatre \ue9chang\ue9s entre des services Web. Ainsi, Rex ViPER contribue \ue0 l'automatisation de la construction d'un Web s\ue9mantique exploitable par machine.NRC publication: Ye

NRC Publications Archive

Irish Universities

DCU Online Research Access Service

ABSTRACT From HTML Documents to Web Tables and Rules

Author: Georg Lausen
Kai Simon
Publication venue
Publication date
Field of study

{ksimon,lausen} AT informatik.uni-freiburg.de We present a browser-extending Semantic Web extraction system that maps HTML documents to tables and, where possible, to rules. First, the basic data extractor ViPER distills and reorganizes semi-structured information into a tabular data structure, which can again be browsed and/or submitted to further machine processing. Second, exemplifying the latter, the extended knowledge extractor Rex ViPER mines the resulting tables for structural properties and functional dependencies. Rules are generated to obtain a more compact and manageable, often also enriched, knowledge representation. The resulting fully structured information, RuleML-serialized facts and rules, can be stored along with the orginal documents, queried by rule engines such as OO jDREW and FLORID, and interchanged between Web Services. Thus Rex ViPER contributes to automating the construction of a machine-processable Semantic Web

CiteSeerX