8 research outputs found

    Expressing Biological Problems with Logical Reasoning Languages

    Get PDF
    Biology represents a very challenging domain that is typically tackled by experts in the field, with few or no interactions with the Web knowledge and rules interoperation community. However, there has been a considerable growth of data regarding biological aspects in the last decades. Moreover, the COVID-19 pandemic has traced an unprecedented point in history, where tons of information have been collected in laboratories worldwide and deposited into open data banks. Inspired by the current needs and backed by a solid knowledge base (our extensional knowledge source) called CoV2K, we propose to express and resolve a series of problems related to the SARS-CoV-2 virus and its interpretation. We formulate our queries as rules in Vadalog (our knowledge representation and reasoning language) and input them to its related logic-based reasoning system. Four cases are presented that allow to explore 1) variants effects and how they are explained in scientific literature; 2) the most typical mutations of a variant; 3) the most likely acquisition of a new mutation by a given variant and the associated reported effects; 4) the most relevant mutations of the virus according to the community. Expressing biological problems using a logic formalism is a major challenge, due to the intrinsic complexity of the domain. The four use cases show that a logical formalism is effective in expressing relevant problems for understanding the current evolution of SARS-CoV-2 variants, an essential aspect of the COVID-19 pandemic

    Breaking the Negative Cycle: Exploring the Design Space of Stratification for First-Class Datalog Constraints

    Get PDF
    The ?_Dat calculus brings together the power of functional and declarative logic programming in one language. In ?_Dat, Datalog constraints are first-class values that can be constructed, passed around as arguments, returned, composed with other constraints, and solved. A significant part of the expressive power of Datalog comes from the use of negation. Stratified negation is a particularly simple and practical form of negation accessible to ordinary programmers. Stratification requires that Datalog programs must not use recursion through negation. For a Datalog program, this requirement is straightforward to check, but for a ?_Dat program, it is not so simple: A ?_Dat program constructs, composes, and solves Datalog programs at runtime. Hence stratification cannot readily be determined at compile-time. In this paper, we explore the design space of stratification for ?_Dat. We investigate strategies to ensure, at compile-time, that programs constructed at runtime are guaranteed to be stratified, and we argue that previous design choices in the Flix programming language have been suboptimal

    Ontology based data integration in life sciences

    Get PDF
    El objetivo de la tesis es el desarrollo de una solución práctica y estándar para la integración semántica de los datos y servicios biológicos. La tesis estudia escenarios diferentes en los cuales las ontologías pueden beneficiar el desarrollo de los servicios web, su búsqueda y su visibilidad. A pesar de que las ontologías son ampliamente utilizadas en la biología, su uso habitualmente se limita a la definición de las jerarquías taxonómicas. La tesis examina la utilidad de las ontologías para la integración de los datos en el desarrollo de los servicios web semánticos. Las ontologías que definen los tipos de datos biológicos tienen un gran valor para la integración de los datos, especialmente ante un cambio continuo de los estándares. La tesis evalúa la ontología BioMoby para la generación de los servicios web conforme con las especificaciones WS-I y los servicios REST. Otro aspecto muy importante de la tesis es el uso de las ontologías para la descripción de los servicios web. La tesis evalúa la ontología WSDL promovida por el consorcio W3C para la descripción de los servicios y su búsqueda. Finalmente, se considera la integración con las plataformas modernas de la ejecución de los flujos de trabajo como Taverna y Galaxy. A pesar de la creciente popularidad del formato JSON, los servicios web dependen mucho del XML. La herramienta OWL2XS facilita el desarrollo de los servicios web semánticos generando un esquema XML a partir de una ontología OWL 2. La integración de los servicios web es difícil de conseguir sin una adaptación de los estándares. La aplicación BioNemus genera de manera automática servicios web estándar a partir de las ontologías BioMoby. La representación semántica de los servicios web simplifica su búsqueda y anotación. El Registro Semántico de Servicios Web (BioSWR) está basado en la ontología WSDL del W3C y proporciona una representación en distintos formatos: OWL 2, WSDL 1.1, WSDL 2.0 y WADL. Para demostrar los beneficios de la descripción semántica de los servicios web se ha desarrollado un plugin para Taverna. También se ha implementado una nueva librería experimental que ha sido usada en la aplicación Galaxy Gears, la cual permite la integración de los servicios web en Galaxy. La tesis explora el alcance de la aplicación de las ontologías para la integración de los datos y los servicios biológicos, proporcionando un amplio conjunto de nuevas aplicaciones.The aim of this thesis is to develop standard and practical approaches for the semantic integration of biological data and services. The thesis considers various scenarios where ontologies may benefit bioinformatics web services development, integration and provenance. In spite of the broad use of ontologies in biology, their usage is usually limited to a definition of taxonomic hierarchies. This thesis examines the utility of ontologies for data integration in context of semantic web services development. The biological datatypes ontologies are very valuable for the data integration, especially in a context of continuous standards changes. The thesis evaluates the outdated BioMoby ontology for the generation of modern WS-I and RESTful web services. Another important aspect is the use of ontologies for the web services description. The thesis evaluates the W3C standard WSDL ontology for bioinformatics web services description and provenance. Finally, the integration with modern workflow execution platforms such as Taverna and Galaxy is also considered. Despite the growing popularity of JSON format, web services vastly depend on XML type system. The OWL2XS tool facilitates semantic web services development providing the automatic XML Schema generation from an appropriate OWL 2 datatype ontology. Web services integration is hardly achievable without a broad standard adoption. The BioNemus application automatically generates standard-based web services from BioMoby ontologies. Semantic representation of web services description simplifies web services search and annotation. Semantic Web Services Registry (BioSWR) is based on W3C WSDL ontology and provides a multifaceted web services view in different formats: OWL 2, WSDL 1.1, WSDL 2.0 and WADL. To demonstrate benefits of ontology-based web services descriptions, BioSWR Taverna OSGI plug-in has been developed. The new, experimental, Taverna WSDL generic library has been used in Galaxy Gears tool which allows integrating web services into the Galaxy workflows. The thesis explores the scopes of ontologies application for the biological data and services integration, providing a broad set of original tools

    SEMEDA (Semantic Meta-Database) : ontology based semantic integration of biological databases

    Get PDF
    Köhler J. SEMEDA (Semantic Meta-Database) : ontology based semantic integration of biological databases. Bielefeld (Germany): Bielefeld University; 2003.The work presented in this thesis is outlined in the following. The state of the art in the relevant disciplines is introduced and reviewed in chapter 2. This includes on the one hand the current state of molecular biological databases, their heterogeneity and the integration of molecular biological databases. On the other hand the current usage of ontologies in general and with special regard to database integration is described. The principles of semantic database integration as introduced in this thesis are new and suitable to be used also in other database integration systems, which have to deal with a high number of semantically heterogeneous databases. Therefore in Chapter 3 the newly introduced principles for ontology based semantic database integration are presented independent of their implementation. Chapter 4 introduces the requirements for the implementation of a semantic database integration system (SEMEDA). Several general requirements for the integration of molecular biological systems from the scientific literature are discussed with regard to the feasibility of their implementation in general and in SEMEDA. In addition, the requirements specific to semantic database integration are introduced. In addition how the BioDataServer is used to overcome "technical" heterogeneity, so that SEMEDA only has to deal with semantic heterogeneity is analysed. In chapter 5, an appropriate data structure for storing ontologies, database metadata and the semantic definitions as described in Chapter 3 is developed. Subsequently, it is discussed how this data structure can be edited and queried. In Chapter 6, SEMEDAs software design, implementation and system architecture is given. Chapter 7 describes the use of SEMEDA and its interfaces. The user interface SEMEDA-edit is used to collaboratively edit ontologies and to semantically define databases using ontologies. SEMEDA-query is the query interface that provides uniform access to heterogeneous databases. In addition, a set of procedures exists which can be used by external applications. In order to use SEMEDA to semantically define databases, an appropriate ontology is needed. Although SEMEDA allows building ontologies from the scratch, due to the fact that generating ontologies is a labour intensive time-consuming task, it would be preferable to use an existing ontology. Therefore, in chapter 8 several ontologies were evaluated for their usability in SEMEDA. The intention was to find out if a suitable ontology can be found and imported or whether it is more appropriate to build a custom ontology for SEMEDA. It turned out that the existing ontologies were not well suited for semantic database integration. In chapter 9 general and SEMEDA specific ontology design principles are introduced which were then followed to build a custom ontology for database integration. The structure of this custom ontology and some issues concerning its use for semantic database integration are explained. In chapter 10, the practical use of SEMEDA is described by two examples. The first section of this chapter shows how SEMEDA supports the building of user schemata for the BioDataServer. The second section describes how the clone database of the RZPD Berlin (Deutsches Ressourcenzentrum für Genomforschung GmbH) is connected to SEMEDA and thus linked to the other databases. In the discussion (chapter 11) SEMEDA is compared to existing database integration systems, especially other ontology based integration systems. It is further discussed how principles for semantic database integration apply to other database integration systems and how they might be implemented there. A database mirror is proposed to improve the overall performance of SEMEDA and the BioDataServer

    Integration of Logic and Probability in Terminological and Inductive Reasoning

    Get PDF
    This thesis deals with Statistical Relational Learning (SRL), a research area combining principles and ideas from three important subfields of Artificial Intelligence: machine learn- ing, knowledge representation and reasoning on uncertainty. Machine learning is the study of systems that improve their behavior over time with experience; the learning process typi- cally involves a search through various generalizations of the examples, in order to discover regularities or classification rules. A wide variety of machine learning techniques have been developed in the past fifty years, most of which used propositional logic as a (limited) represen- tation language. Recently, more expressive knowledge representations have been considered, to cope with a variable number of entities as well as the relationships that hold amongst them. These representations are mostly based on logic that, however, has limitations when reason- ing on uncertain domains. These limitations have been lifted allowing a multitude of different formalisms combining probabilistic reasoning with logics, databases or logic programming, where probability theory provides a formal basis for reasoning on uncertainty. In this thesis we consider in particular the proposals for integrating probability in Logic Programming, since the resulting probabilistic logic programming languages present very in- teresting computational properties. In Probabilistic Logic Programming, the so-called "dis- tribution semantics" has gained a wide popularity. This semantics was introduced for the PRISM language (1995) but is shared by many other languages: Independent Choice Logic, Stochastic Logic Programs, CP-logic, ProbLog and Logic Programs with Annotated Disjunc- tions (LPADs). A program in one of these languages defines a probability distribution over normal logic programs called worlds. This distribution is then extended to queries and the probability of a query is obtained by marginalizing the joint distribution of the query and the programs. The languages following the distribution semantics differ in the way they define the distribution over logic programs. The first part of this dissertation presents techniques for learning probabilistic logic pro- grams under the distribution semantics. Two problems are considered: parameter learning and structure learning, that is, the problems of inferring values for the parameters or both the structure and the parameters of the program from data. This work contributes an algorithm for parameter learning, EMBLEM, and two algorithms for structure learning (SLIPCASE and SLIPCOVER) of probabilistic logic programs (in particular LPADs). EMBLEM is based on the Expectation Maximization approach and computes the expectations directly on the Binary De- cision Diagrams that are built for inference. SLIPCASE performs a beam search in the space of LPADs while SLIPCOVER performs a beam search in the space of probabilistic clauses and a greedy search in the space of LPADs, improving SLIPCASE performance. All learning approaches have been evaluated in several relational real-world domains. The second part of the thesis concerns the field of Probabilistic Description Logics, where we consider a logical framework suitable for the Semantic Web. Description Logics (DL) are a family of formalisms for representing knowledge. Research in the field of knowledge repre- sentation and reasoning is usually focused on methods for providing high-level descriptions of the world that can be effectively used to build intelligent applications. Description Logics have been especially effective as the representation language for for- mal ontologies. Ontologies model a domain with the definition of concepts and their properties and relations. Ontologies are the structural frameworks for organizing information and are used in artificial intelligence, the Semantic Web, systems engineering, software engineering, biomedical informatics, etc. They should also allow to ask questions about the concepts and in- stances described, through inference procedures. Recently, the issue of representing uncertain information in these domains has led to probabilistic extensions of DLs. The contribution of this dissertation is twofold: (1) a new semantics for the Description Logic SHOIN(D) , based on the distribution semantics for probabilistic logic programs, which embeds probability; (2) a probabilistic reasoner for computing the probability of queries from uncertain knowledge bases following this semantics. The explanations of queries are encoded in Binary Decision Diagrams, with the same technique employed in the learning systems de- veloped for LPADs. This approach has been evaluated on a real-world probabilistic ontology

    Enhancement of Query processing on XML data

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Fifth Biennial Report : June 1999 - August 2001

    No full text

    Reasoning Algebraically with Description Logics

    Get PDF
    Semantic Web applications based on the Web Ontology Language (OWL) often require the use of numbers in class descriptions for expressing cardinality restrictions on properties or even classes. Some of these cardinalities are specified explicitly, but quite a few are entailed and need to be discovered by reasoning procedures. Due to the Description Logic (DL) foundation of OWL, those reasoning services are offered by DL reasoners. Existing DL reasoners employ reasoning procedures that are arithmetically uninformed and substitute arithmetic reasoning by "don't know" non-determinism in order to cover all possible cases. This lack of information about arithmetic problems dramatically degrades the performance of DL reasoners in many cases, especially with ontologies relying on the use of Nominals and Qualied Cardinality Restrictions. The contribution of this thesis is twofold: on the theoretical level, it presents algebra�ic reasoning with DL (ReAl DL) using a sound, complete, and terminating reasoning procedure for the DL SHOQ. ReAl DL combines tableau reasoning procedures with algebraic methods, namely Integer Programming, to ensure arithmetically better informed reasoning. SHOQ extends the standard DL ALC with transitive roles, role hierarchies, qualified cardinality restrictions (QCRs), and nominals, and forms an expressive subset of OWL. Although the proposed algebraic tableau is double exponential in the worst case, it deals with cardinalities with an additional level of information and properties that make the calculus amenable and well suited for optimizations. In order for ReAl DL to have a practical merit, suited optimizations are proposed towards achieving an efficient reasoning approach that addresses the sources of complexity related to nominals and QCRs. On the practical level, a running prototype reasoner (HARD) is implemented based on the proposed calculus and optimizations. HARD is used to evaluate the practical merit of ReAl DL, as well as the effectiveness of the proposed optimizations. Experimental results based on real world and synthetic ontologies show that ReAl DL outperforms existing reasoning approaches in handling the interactions between nominals and QCRs. ReAl DL also comes with some interesting features such as the ability to handle ontologies with cyclic descriptions without adopting special blocking strategies. ReAl DL can form a basis to provide more efficient reasoning support for ontologies using nominals or QCRs
    corecore