Search CORE

11 research outputs found

ORE - A Tool for Repairing and Enriching Knowledge Bases

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Statistical Learning for Inductive Query Answering on OWL Ontologies

Author: B. Schölkopf
C. Cumby
C. d’Amato
I.H. Witten
J. Lehmann
L. Iannone
N. Fanizzi
N. Fanizzi
N. Fanizzi
P. Hitzler
P. Zezula
S. Bloehdorn
T. Gärtner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Crossref

The Semantic Web - ISWC 2008

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Crossref

Comparison of Concept Learning Algorithms With Emphasis on Ontology Engineering for the Semantic Web

Author: Hellmann Sebastian
Publication venue
Publication date: 26/10/2017
Field of study

In the context of the Semantic Web, ontologies based on Description Logics are gaining more and more importance for knowledge representation on a large scale. While the need arises for high quality ontologies with large background knowledge to enable powerful machine reasoning, the acquisition of such knowledge is only advancing slowly, because of the lack of appropriate tools. Concept learning algorithms have made a great leap forward and can help to speed up knowledge acquisition in the form of induced concept descriptions. This work investigated whether concept learning algorithms have reached a level on which they can produce results that can be used in an ontology engineering process. Two learning algorithms (YinYang and DL-Learner) are investigated in detail and tested with benchmarks. A method that enables concept learning on large knowledge bases on a SPARQL endpoint is presented and the quality of learned concepts is evaluated in a real use case. A proposal is made to increase the complexity of learned concept descriptions by circumventing the Open World Assumption of Description Logics

Qucosa - Publikationsserver der Universität Leipzig

OWL-Miner: Concept Induction in OWL Knowledge Bases

Author: Ratcliffe David
Publication venue
Publication date: 01/01/2018
Field of study

The Resource Description Framework (RDF) and Web Ontology Language (OWL) have been widely used in recent years, and automated methods for the analysis of data and knowledge directly within these formalisms are of current interest. Concept induction is a technique for discovering descriptions of data, such as inducing OWL class expressions to describe RDF data. These class expressions capture patterns in the data which can be used to characterise interesting clusters or to act as classifica- tion rules over unseen data. The semantics of OWL is underpinned by Description Logics (DLs), a family of expressive and decidable fragments of first-order logic. Recently, methods of concept induction which are well studied in the field of Inductive Logic Programming have been applied to the related formalism of DLs. These methods have been developed for a number of purposes including unsuper- vised clustering and supervised classification. Refinement-based search is a concept induction technique which structures the search space of DL concept/OWL class expressions and progressively generalises or specialises candidate concepts to cover example data as guided by quality criteria such as accuracy. However, the current state-of-the-art in this area is limited in that such methods: were not primarily de- signed to scale over large RDF/OWL knowledge bases; do not support class lan- guages as expressive as OWL2-DL; or, are limited to one purpose, such as learning OWL classes for integration into ontologies. Our work addresses these limitations by increasing the efficiency of these learning methods whilst permitting a concept language up to the expressivity of OWL2-DL classes. We describe methods which support both classification (predictive induction) and subgroup discovery (descrip- tive induction), which, in this context, are fundamentally related. We have implemented our methods as the system called OWL-Miner and show by evaluation that our methods outperform state-of-the-art systems for DL learning in both the quality of solutions found and the speed in which they are computed. Furthermore, we achieve the best ever ten-fold cross validation accuracy results on the long-standing benchmark problem of carcinogenesis. Finally, we present a case study on ongoing work in the application of OWL-Miner to a real-world problem directed at improving the efficiency of biological macromolecular crystallisation

The Australian National University

Efficient Extraction and Query Benchmarking of Wikipedia Data

Author: Morsey Mohamed
Publication venue
Publication date: 12/04/2013
Field of study

Knowledge bases are playing an increasingly important role for integrating information between systems and over the Web. Today, most knowledge bases cover only specific domains, they are created by relatively small groups of knowledge engineers, and it is very cost intensive to keep them up-to-date as domains change. In parallel, Wikipedia has grown into one of the central knowledge sources of mankind and is maintained by thousands of contributors. The DBpedia (http://dbpedia.org) project makes use of this large collaboratively edited knowledge source by extracting structured content from it, interlinking it with other knowledge bases, and making the result publicly available. DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Furthermore, many companies and researchers use DBpedia and its public services to improve their applications and research approaches. However, the DBpedia release process is heavy-weight and the releases are sometimes based on several months old data. Hence, a strategy to keep DBpedia always in synchronization with Wikipedia is highly required. In this thesis we propose the DBpedia Live framework, which reads a continuous stream of updated Wikipedia articles, and processes it. DBpedia Live processes that stream on-the-fly to obtain RDF data and updates the DBpedia knowledge base with the newly extracted data. DBpedia Live also publishes the newly added/deleted facts in files, in order to enable synchronization between our DBpedia endpoint and other DBpedia mirrors. Moreover, the new DBpedia Live framework incorporates several significant features, e.g. abstract extraction, ontology changes, and changesets publication. Basically, knowledge bases, including DBpedia, are stored in triplestores in order to facilitate accessing and querying their respective data. Furthermore, the triplestores constitute the backbone of increasingly many Data Web applications. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in general. Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triplestore implementations. We introduce a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triplestores and, thus, settled on measuring performance against a relational database which had been converted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applications against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more useful to compare existing triplestores and provide results for the popular triplestore implementations Virtuoso, Sesame, Apache Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the performance of triplestores is by far less homogeneous than suggested by previous benchmarks. Further, one of the crucial tasks when creating and maintaining knowledge bases is validating their facts and maintaining the quality of their inherent data. This task include several subtasks, and in thesis we address two of those major subtasks, specifically fact validation and provenance, and data quality The subtask fact validation and provenance aim at providing sources for these facts in order to ensure correctness and traceability of the provided knowledge This subtask is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents. We present DeFacto (Deep Fact Validation), which is an algorithm for validating facts by finding trustworthy sources for it on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of webpages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact. On the other hand the subtask of data quality maintenance aims at evaluating and continuously improving the quality of data of the knowledge bases. We present a methodology for assessing the quality of knowledge bases’ data, which comprises of a manual and a semi-automatic process. The first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. In the manual process, the second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is accompanied by a tool wherein a user assesses an individual resource and evaluates each fact for correctness. The semi-automatic process involves the generation and verification of schema axioms. We report the results obtained by applying this methodology to DBpedia

Qucosa - Publikationsserver der Universität Leipzig

AIFB Jahresbericht. Institut für Angewandte Informatik und Formale Beschreibungsverfahren, Universität Karlsruhe (TH).08/09

Author: Oberweis Andreas
Schmeck Hartmut
Seese Detlef
Stucky Wolffried
Studer Rudi
Tai Stefan
Publication venue: AIFB
Publication date: 01/01/2009
Field of study

KITopen

Foundations of refinement operators for description logics

Author: Jens Lehmann
Pascal Hitzler
Publication venue
Publication date: 01/01/2007
Field of study

Abstract In order to leverage techniques from Inductive Logic Programming for the learning in description logics (DLs), which are the foundation of ontology languages in the Semantic Web, it is important to acquire a thorough understanding of the theoretical potential and limitations of using refinement operators within the description logic paradigm. In this paper, we present a comprehensive study which analyses desirable properties such operators should have. In particular, we show that ideal refinement operators in general do not exist, which is indicative of the hardness inherent in learning in DLs. We also show which combinations of desirable properties are theoretically possible, thus providing an important step towards the definition of practically applicable operators.

CiteSeerX

CORE

Foundations of Refinement Operators for Description Logics

Author: E.Y. Shapiro
F. Esposito
F.A. Lisi
J. Lehmann
J. Lehmann
J.-U. Kietz
L. Badea
L. Badea
L. Badea
N. Fanizzi
N. Fanizzi
P.R.J. Laag van der
S.-H. Nienhuys-Cheng
S.H. Nienhuys-Cheng
T. Mitchell
W.W. Cohen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Crossref