3,959 research outputs found
Inductive Logic Programming in Databases: from Datalog to DL+log
In this paper we address an issue that has been brought to the attention of
the database community with the advent of the Semantic Web, i.e. the issue of
how ontologies (and semantics conveyed by them) can help solving typical
database problems, through a better understanding of KR aspects related to
databases. In particular, we investigate this issue from the ILP perspective by
considering two database problems, (i) the definition of views and (ii) the
definition of constraints, for a database whose schema is represented also by
means of an ontology. Both can be reformulated as ILP problems and can benefit
from the expressive and deductive power of the KR framework DL+log. We
illustrate the application scenarios by means of examples. Keywords: Inductive
Logic Programming, Relational Databases, Ontologies, Description Logics, Hybrid
Knowledge Representation and Reasoning Systems. Note: To appear in Theory and
Practice of Logic Programming (TPLP).Comment: 30 pages, 3 figures, 2 tables
Applying semantic web technologies to knowledge sharing in aerospace engineering
This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale
Initiating organizational memories using ontology network analysis
One of the important problems in organizational memories is their initial set-up. It is difficult to choose the right information to include in an organizational memory, and the right information is also a prerequisite for maximizing the uptake and relevance of the memory content. To tackle this problem, most developers adopt heavy-weight solutions and rely on a faithful continuous interaction with users to create and improve its content. In this paper, we explore the use of an automatic, light-weight solution, drawn from the underlying ingredients of an organizational memory: ontologies. We have developed an ontology-based network analysis method which we applied to tackle the problem of identifying communities of practice in an organization. We use ontology-based network analysis as a means to provide content automatically for the initial set up of an organizational memory
Semi-automated Ontology Generation for Biocuration and Semantic Search
Background:
In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies â controlled, hierarchical vocabularies â are being developed.
Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing.
Motivation:
The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences.
Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods.
Results:
The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results.
To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org
Semi-automated Ontology Generation for Biocuration and Semantic Search
Background:
In the life sciences, the amount of literature and experimental data grows at a tremendous rate. In order to effectively access and integrate these data, biomedical ontologies â controlled, hierarchical vocabularies â are being developed.
Creating and maintaining such ontologies is a difficult, labour-intensive, manual process. Many computational methods which can support ontology construction have been proposed in the past. However, good, validated systems are largely missing.
Motivation:
The biocuration community plays a central role in the development of ontologies. Any method that can support their efforts has the potential to have a huge impact in the life sciences.
Recently, a number of semantic search engines were created that make use of biomedical ontologies for document retrieval. To transfer the technology to other knowledge domains, suitable ontologies need to be created. One area where ontologies may prove particularly useful is the search for alternative methods to animal testing, an area where comprehensive search is of special interest to determine the availability or unavailability of alternative methods.
Results:
The Dresden Ontology Generator for Directed Acyclic Graphs (DOG4DAG) developed in this thesis is a system which supports the creation and extension of ontologies by semi-automatically generating terms, definitions, and parent-child relations from text in PubMed, the web, and PDF repositories. The system is seamlessly integrated into OBO-Edit and Protégé, two widely used ontology editors in the life sciences. DOG4DAG generates terms by identifying statistically significant noun-phrases in text. For definitions and parent-child relations it employs pattern-based web searches. Each generation step has been systematically evaluated using manually validated benchmarks. The term generation leads to high quality terms also found in manually created ontologies. Definitions can be retrieved for up to 78% of terms, child ancestor relations for up to 54%. No other validated system exists that achieves comparable results.
To improve the search for information on alternative methods to animal testing an ontology has been developed that contains 17,151 terms of which 10% were newly created and 90% were re-used from existing resources. This ontology is the core of Go3R, the first semantic search engine in this field. When a user performs a search query with Go3R, the search engine expands this request using the structure and terminology of the ontology. The machine classification employed in Go3R is capable of distinguishing documents related to alternative methods from those which are not with an F-measure of 90% on a manual benchmark. Approximately 200,000 of the 19 million documents listed in PubMed were identified as relevant, either because a specific term was contained or due to the automatic classification. The Go3R search engine is available on-line under www.Go3R.org
OWL-Miner: Concept Induction in OWL Knowledge Bases
The Resource Description Framework (RDF) and Web Ontology
Language (OWL)
have been widely used in recent years, and automated methods for
the analysis of
data and knowledge directly within these formalisms are of
current interest. Concept
induction is a technique for discovering descriptions of data,
such as inducing OWL
class expressions to describe RDF data. These class expressions
capture patterns in
the data which can be used to characterise interesting clusters
or to act as classifica-
tion rules over unseen data. The semantics of OWL is underpinned
by Description
Logics (DLs), a family of expressive and decidable fragments of
first-order logic.
Recently, methods of concept induction which are well studied in
the field of
Inductive Logic Programming have been applied to the related
formalism of DLs.
These methods have been developed for a number of purposes
including unsuper-
vised clustering and supervised classification. Refinement-based
search is a concept
induction technique which structures the search space of DL
concept/OWL class
expressions and progressively generalises or specialises
candidate concepts to cover
example data as guided by quality criteria such as accuracy.
However, the current
state-of-the-art in this area is limited in that such methods:
were not primarily de-
signed to scale over large RDF/OWL knowledge bases; do not
support class lan-
guages as expressive as OWL2-DL; or, are limited to one purpose,
such as learning
OWL classes for integration into ontologies. Our work addresses
these limitations
by increasing the efficiency of these learning methods whilst
permitting a concept
language up to the expressivity of OWL2-DL classes. We describe
methods which
support both classification (predictive induction) and subgroup
discovery (descrip-
tive induction), which, in this context, are fundamentally
related.
We have implemented our methods as the system called OWL-Miner
and show
by evaluation that our methods outperform state-of-the-art
systems for DL learning
in both the quality of solutions found and the speed in which
they are computed.
Furthermore, we achieve the best ever ten-fold cross validation
accuracy results on
the long-standing benchmark problem of carcinogenesis. Finally,
we present a case
study on ongoing work in the application of OWL-Miner to a
real-world problem
directed at improving the efficiency of biological macromolecular
crystallisation
- âŠ