10 research outputs found

    Beyond the grounding bottleneck: Datalog techniques for inference in probabilistic logic programs

    Get PDF
    State-of-the-art inference approaches in probabilistic logic programming typically start by computing the relevant ground program with respect to the queries of interest, and then use this program for probabilistic inference using knowledge compilation and weighted model counting. We propose an alternative approach that uses efficient Datalog techniques to integrate knowledge compilation with forward reasoning with a non-ground program. This effectively eliminates the grounding bottleneck that so far has prohibited the application of probabilistic logic programming in query answering scenarios over knowledge graphs, while also providing fast approximations on classical benchmarks in the field

    PheNetic : network-based interpretation of molecular profiling data

    Get PDF
    Molecular profiling experiments have become standard in current wet-lab practices. Classically, enrichment analysis has been used to identify biological functions related to these experimental results. Combining molecular profiling results with the wealth of currently available interactomics data, however, offers the opportunity to identify the molecular mechanism behind an observed molecular phenotype. In this paper, we therefore introduce 'PheNetic', a userfriendly web server for inferring a sub-network based on probabilistic logical querying. PheNetic extracts from an interactome, the sub-network that best explains genes prioritized through a molecular profiling experiment. Depending on its run mode, PheNetic searches either for a regulatorymechanism that gave explains to the observed molecular phenotype or for the pathways (in) activated in the molecular phenotype. The web server provides access to a large number of interactomes, making sub-network inference readily applicable to a wide variety of organisms. The inferred sub-networks can be interactively visualized in the browser. PheNetic's method and use are illustrated using an example analysis of differential expression results of ampicillin treated Escherichia coli cells. The PheNetic web service is available at http://bioinformatics.intec.ugent.be/phenetic/

    Network-based identification of driver pathways in clonal systems

    Get PDF
    Highly ethanol-tolerant bacteria for the production of biofuels, bacterial pathogenes which are resistant to antibiotics and cancer cells are examples of phenotypes that are of importance to society and are currently being studied. In order to better understand these phenotypes and their underlying genotype-phenotype relationships it is now commonplace to investigate DNA and expression profiles using next generation sequencing (NGS) and microarray techniques. These techniques generate large amounts of omics data which result in lists of genes that have mutations or expression profiles which potentially contribute to the phenotype. These lists often include a multitude of genes and are troublesome to verify manually as performing literature studies and wet-lab experiments for a large number of genes is very time and resources consuming. Therefore, (computational) methods are required which can narrow these gene lists down by removing generally abundant false positives from these lists and can ideally provide additional information on the relationships between the selected genes. Other high-throughput techniques such as yeast two-hybrid (Y2H), ChIP-Seq and Chip-Chip but also a myriad of small-scale experiments and predictive computational methods have generated a treasure of interactomics data over the last decade, most of which is now publicly available. By combining this data into a biological interaction network, which contains all molecular pathways that an organisms can utilize and thus is the equivalent of the blueprint of an organisms, it is possible to integrate the omics data obtained from experiments with these biological interaction networks. Biological interaction networks are key to the computational methods presented in this thesis as they enables methods to account for important relations between genes (and gene products). Doing so it is possible to not only identify interesting genes but also to uncover molecular processes important to the phenotype. As the best way to analyze omics data from an interesting phenotype varies widely based on the experimental setup and the available data, multiple methods were developed and applied in the context of this thesis: In a first approach, an existing method (PheNetic) was applied to a consortium of three bacterial species that together are able to efficiently degrade a herbicide but none of the species are able to efficiently degrade the herbicide on their own. For each of the species expression data (RNA-seq) was generated for the consortium and the species in isolation. PheNetic identified molecular pathways which were differentially expressed and likely contribute to a cross-feeding mechanism between the species in the consortium. Having obtained proof-of-concept, PheNetic was adapted to cope with experimental evolution datasets in which, in addition to expression data, genomics data was also available. Two publicly available datasets were analyzed: Amikacin resistance in E. coli and coexisting ecotypes in E.coli. The results allowed to elicit well-known and newly found molecular pathways involved in these phenotypes. Experimental evolution sometimes generates datasets consisting of mutator phenotypes which have high mutation rates. These datasets are hard to analyze due to the large amount of noise (most mutations have no effect on the phenotype). To this end IAMBEE was developed. IAMBEE is able to analyze genomic datasets from evolution experiments even if they contain mutator phenotypes. IAMBEE was tested using an E. coli evolution experiment in which cells were exposed to increasing concentrations of ethanol. The results were validated in the wet-lab. In addition to methods for analysis of causal mutations and mechanisms in bacteria, a method for the identification of causal molecular pathways in cancer was developed. As bacteria and cancerous cells are both clonal, they can be treated similar in this context. The big differences are the amount of data available (many more samples are available in cancer) and the fact that cancer is a complex and heterogenic phenotype. Therefore we developed SSA-ME, which makes use of the concept that a causal molecular pathway has at most one mutation in a cancerous cell (mutual exclusivity). However, enforcing this criterion is computationally hard. SSA-ME is designed to cope with this problem and search for mutual exclusive patterns in relatively large datasets. SSA-ME was tested on cancer data from the TCGA PAN-cancer dataset. From the results we could, in addition to already known molecular pathways and mutated genes, predict the involvement of few rarely mutated genes.nrpages: 246status: publishe

    Anotación funcional de proteínas basada en representación relacional en el entorno de la biología de sistemas

    Get PDF
    La anotación funcional es un tema de investigación abierto e importante en Biología Molecular. El problema de definir función a nivel de terminología es complicado, puesto que la función ocupa muchos niveles para una misma proteína y no existe un criterio unificado. Ante estas dificultades, la forma de determinar la función de una proteína es anotarla con distintos términos en diferentes vocabularios. Las proteínas desarrollan su función en cooperación con otras proteínas formando complejos. Estas interacciones se representan en una red, formada por interacciones que han sido demostradas experimentalmente entre proteínas. Analizar y utilizar la red de interacciones es una tarea de interés debido al gran número de asociaciones existentes, y a las múltiples formas en que una proteína puede influir en la función de otras. Por lo tanto, esta tesis se centra en la predicción de anotación funcional basada en redes Es evidente que este complejo escenario no puede afrontarse sin el uso de herramientas computacionales. De hecho existe una actividad considerable en el área de Biología Computacional dedicada específicamente a este tema. Esta tesis es parte de este esfuerzo en la aplicación de métodos computacionales a problemas biológicos en el área de Biología de Sistemas. Esta aproximación puede enmarcarse en este contexto de la Biología de Sistemas, puesto que no se analiza la función de forma aislada para cada molécula, sino a nivel de sistema, teniendo en cuenta todas las relaciones existentes entre genes y proteínas conectados a distintos niveles. Para aprovechar todas estas relaciones biológicas, y mantener su semántica estructural, esta tesis plantea usar Representación Relacional, por ser un dominio particularmente apropiado para ello. A partir de dicha representación se aplican múltiples transformaciones y técnicas de Inteligencia Artificial para extraer conocimiento de las proteínas relacionadas, y proponer nuevas funciones a través de la prediccion de asociaciones funcionales entre proteínas. La propuesta general de esta tesis es la caracterización de función de proteínas y genes basándose en información de redes, a través de la Representación Relacional y el Aprendizaje Automático. En concreto, partiendo de una representación relacional para anotación funcional, se busca el diseño computacional necesario para resolver dos problemas concretos, diferentes e interesantes en Biología. Uno es la predicción de asociaciones funcionales entre pares de proteínas en E.coli, y el otro la extensión de rutas biológicas en humanos. Ambos se evalúan en términos computacionales y de interpretación biológica. También se proponen nuevas anotaciones funcionales de proteínas a ser verificadas experimentalmente. Además, se exploran diversos enfoques en la representación del conocimiento y en las técnicas de aprendizaje, proponiendo estrategias concretas para resolver otros problemas bioinformáticos, especialmente influenciados por la información relacional y el aprendizaje multi-clase y multi-etiqueta. -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Functional annotation is an open and interesting research topic in Molecular Biology. Determining a function in terminology terms is a hard task, due to lack of unified criterion and also because a function takes up many levels for the same protein. Given this difficulties, the way to determine a protein function is to annotate it with several terms from different vocabularies. Proteins carry out their function together with other proteins, being part of protein complexes. These interactions are represented in a network of experimentally verified protein-protein interactions. Analyzing and using the interaction network is task of interest due to the great number of associations, and to the multiple ways in which a protein could influence in the function of others. Therefore, this thesis focuses in the prediction of functional annotation based on networks. It’s apparent that this complex scenario couldn’t be faced without computational techniques. In fact, in Computational Biology, there is a considerable activity specially devoted to this topic. This thesis is part of this effort for applying computational methods to biological problems in the Systems Biology area. This approximation can belong to the Systems Biology context, because it does not analyze function in an isolated way for each molecule, but at system level, taking into account all the relations among genes and proteins linked at different levels. To take advantages of all these biological relations, and to preserve their structured semantics, this thesis suggests to use Relational Representation, since in particular it is suitable for the concerning domain. Over such representation, multiple transformations and Artificial Intelligence techniques are applied to retrieve implicit knowledge from the related proteins, and to propose new functions through the prediction of functional associations between proteins. The main proposal of this thesis is to characterize the function of proteins and genes based on networks, through Relational Representation and Machine Learning. Specially, from a relational representation specific to functional annotation, we look for the computational design needed to solve two specific, biological interesting and different problems. The former consists of predicting functional association between pair of proteins in E.coli, and the latter comprises expanding pathways in humans. We perform an assessment in computational and biological interpretation terms. Besides, we propose new putative protein functional annotations to be experimentally verified. In addition, the thesis investigates diverse approaches to knowledge representation and learning techniques, suggesting specific strategies to tackle other biological problems, specially where relational data or multi-class and multi-label targets are present

    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Get PDF
    corecore