73 research outputs found

    Inferring the function of genes from synthetic lethal mutations

    Get PDF
    Techniques for detecting synthetic lethal mutations in double gene deletion experiments are emerging as powerful tool for analysing genes in parallel or overlapping pathways with a shared function. This paper introduces a logic-based approach that uses synthetic lethal mutations for mapping genes of unknown function to enzymes in a known metabolic network. We show how such mappings can be automatically computed by a logical learning system called eXtended Hybrid Abductive Inductive Learning (XHAIL)

    The Best Model of a Cat Is Several Cats

    Get PDF
    Modern biotechnology is emerging at the intersection of engineering, biology, physics, and computer science. As such it carries with it history from several disparate fields of research including a strong tradition in deductive reasoning primarily derived from discovery focused molecular biology and physics. Engineering biological systems is a complex undertaking requiring a broader set of epistemic tools and methods than what is usually applied in today's discovery based research. Inductive reasoning as commonly used in computer science has proven to be a very efficient approach to build knowledge about complex megadimensional datasets, including synthetic biology applications. The authors conclude that the multi-heuristic nature of modern biotechnology makes it an engineering field primed for inductive reasoning to complement the dominating deductive tradition

    Annotation concept synthesis and enrichment analysis: a logic-based approach to the interpretation of high-throughput experiments

    Get PDF
    Motivation: Annotation Enrichment Analysis (AEA) is a widely used analytical approach to process data generated by high-throughput genomic and proteomic experiments such as gene expression microarrays. The analysis uncovers and summarizes discriminating background information (e.g. GO annotations) for sets of genes identified by experiments (e.g. a set of differentially expressed genes, a cluster). The discovered information is utilized by human experts to find biological interpretations of the experiments

    Annotation concept synthesis and enrichment analysis: a logic-based approach to the interpretation of high-throughput experiments

    Get PDF
    Motivation: Annotation Enrichment Analysis (AEA) is a widely used analytical approach to process data generated by high-throughput genomic and proteomic experiments such as gene expression microarrays. The analysis uncovers and summarizes discriminating background information (e.g. GO annotations) for sets of genes identified by experiments (e.g. a set of differentially expressed genes, a cluster). The discovered information is utilized by human experts to find biological interpretations of the experiments

    Biomedical Discovery Acceleration, with Applications to Craniofacial Development

    Get PDF
    The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work

    05051 Abstracts Collection -- Probabilistic, Logical and Relational Learning - Towards a Synthesis

    Get PDF
    From 30.01.05 to 04.02.05, the Dagstuhl Seminar 05051 ``Probabilistic, Logical and Relational Learning - Towards a Synthesis\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    From cancer gene expression to protein interaction: Interaction prediction, network reasoning and applications in pancreatic cancer

    Get PDF
    Microarray technologies enable scientists to identify co-expressed genes at large scale. However, the gene expression analysis does not show functional relationships between co-expressed genes. There is a demand for effective approaches to analyse gene expression data to enable biological discoveries that can lead to identification of markers or therapeutic targets of many diseases. In cancer research, a number of gene expression screens have been carried out to identify genes differentially expressed in cancerous tissue such as Pancreatic Ductal Adenocarcinoma (PDAC). PDAC carries very poor prognosis, it eludes early detection and is characterised by its aggressiveness and resistance to currently available therapies. To identify molecular markers and suitable targets, there exist a research effort that maps differentially expressed genes to protein interactions to gain an understanding at systems level. Such interaction networks have a complex interconnected structure, whose the understanding of which is not a trivial task. Several formal approaches use simulation to support the investigation of such networks. These approaches suffer from the missing knowledge concerning biological systems. Reasoning in the other hand has the advantage of dealing with incomplete and partial information of the network knowledge. The initial approach adopted was to provide an algorithm that utilises a network-centric approach to pancreatic cancer, by re-constructing networks from known interactions and predicting novel protein interactions from structural templates. This method was applied to a data set of co-expressed PDAC genes. To this end, structural domains for the gene products are identified by using threading which is a 3D structure prediction technique. Next, the Protein Structure Interaction Database (SCOPPI), a database that classifies and annotates domain interactions derived from all known protein structures, is used to find templates of structurally interacting domains. Moreover, a network of related biological pathways for the PDAC data was constructed. In order to reason over molecular networks that are affected by dysregulation of gene expression, BioRevise was implemented. It is a belief revision system where the inhibition behaviour of reactions is modelled using extended logic programming. The system computes a minimal set of enzymes whose malfunction explains the abnormal expression levels of observed metabolites or enzymes. As a result of this research, two complementary approaches for the analysis of pancreatic cancer gene expression data are presented. Using the first approach, the pathways found to be largely affected in pancreatic cancer are signal transduction, actin cytoskeleton regulation, cell growth and cell communication. The analysis indicates that the alteration of the calcium pathway plays an important role in pancreas specific tumorigenesis. Furthermore, the structural prediction method reveals ~ 700 potential protein-protein interactions from the PDAC microarray data, among them, 81 novel interactions such as: serine/threonine kinase CDC2L1 interacting with cyclin-dependent kinase inhibitor CDKN3 and the tissue factor pathway inhibitor 2 (TFPI2) interacting with the transmembrane protease serine 4 (TMPRSS4). These resulting genes were further investigated and some were found to be potential therapeutic markers for PDAC. Since TMPRSS4 is involved in metastasis formation, it is hypothesised that the upregulation of TMPRSS4 and the downregulation of its predicted inhibitor TFPI2 plays an important role in this process. The predicted protein-protein network inspired the analysis of the data from two other perspectives. The resulting protein-protein interaction network highlighted the importance of the co-expression of KLK6 and KLK10 as prognostic factors for survival in PDAC as well as the construction of a PDAC specific apoptosis pathway to study different effects of multiple gene silencing in order to reactivate apoptosis in PDAC. Using the second approach, the behaviour of biological interaction networks using computational logic formalism was modelled, reasoning over the networks is enabled and the abnormal behaviour of its components is explained. The usability of the BioRevise system is demonstrated through two examples, a metabolic disorder disease and a deficiency in a pancreatic cancer associated pathway. The system successfully identified the inhibition of the enzyme glucose-6-phosphatase as responsible for the Glycogen storage disease type I, which according to literature is known to be the main reason for this disease. Furthermore, BioRevise was used to model reaction inhibition in the Glycolysis pathway which is known to be affected by Pancreatic cancer

    Zsyntax: A Formal Language for Molecular Biology with Projected Applications in Text Mining and Biological Prediction

    Get PDF
    We propose a formal language that allows for transposing biological information precisely and rigorously into machine-readable information. This language, which we call Zsyntax (where Z stands for the Greek word ζωή, life), is grounded on a particular type of non-classical logic, and it can be used to write algorithms and computer programs. We present it as a first step towards a comprehensive formal language for molecular biology in which any biological process can be written and analyzed as a sort of logical “deduction”. Moreover, we illustrate the potential value of this language, both in the field of text mining and in that of biological prediction
    corecore