7,374 research outputs found
Modelling epistasis in genetic disease using Petri nets, evolutionary computation and frequent itemset mining
Petri nets are useful for mathematically modelling disease-causing genetic epistasis. A Petri net model of an interaction has the potential to lead to biological insight into the cause of a genetic disease. However, defining a Petri net by hand for a particular interaction is extremely difficult because of the sheer complexity of the problem and degrees of freedom inherent in a Petri net’s architecture.
We propose therefore a novel method, based on evolutionary computation and data mining, for automatically constructing Petri net models of non-linear gene interactions. The method comprises two main steps. Firstly, an initial partial Petri net is set up with several repeated sub-nets that model individual genes and a set of constraints, comprising relevant common sense and biological knowledge, is also defined. These constraints characterise the class of Petri nets that are desired. Secondly, this initial Petri net structure and the constraints are used as the input to a genetic algorithm. The genetic algorithm searches for a Petri net architecture that is both a superset of the initial net, and also conforms to all of the given constraints. The genetic algorithm evaluation function that we employ gives equal weighting to both the accuracy of the net and also its parsimony.
We demonstrate our method using an epistatic model related to the presence of digital ulcers in systemic sclerosis patients that was recently reported in the literature. Our results show that although individual “perfect” Petri nets can frequently be discovered for this interaction, the true value of this approach lies in generating many different perfect nets, and applying data mining techniques to them in order to elucidate common and statistically significant patterns of interaction
Deciphering ocean carbon in a changing world
Author Posting. © The Author(s), 2016. This is the author's version of the work. It is posted here for personal use, not for redistribution. The definitive version was published in Proceedings of the National Academy of Sciences of the United States of America 113 (2016): 3143-3151, doi:10.1073/pnas.1514645113.Dissolved organic matter (DOM) in the oceans is one of the largest pools of reduced carbon on Earth, comparable in size to the atmospheric CO2 reservoir. A vast number of compounds are present in DOM and they play important roles in all major element cycles, contribute to the storage of atmospheric CO2 in the ocean, support marine ecosystems, and facilitate interactions between organisms. At the heart of the DOM cycle lie molecular-level relationships between the individual compounds in DOM and the members of the ocean microbiome that produce and consume them. In the past, these connections have eluded clear definition because of the sheer numerical complexity of both DOM molecules and microorganisms. Emerging tools in analytical chemistry, microbiology and informatics are breaking down the barriers to a fuller appreciation of these connections. Here we highlight questions being addressed using recent methodological and technological developments in those fields and consider how these advances are transforming our understanding of some of the most important reactions of the marine carbon cycle.Support was provided by National Science Foundation grants OCE1356010, OCE1154320, and OCE1356890, and Gordon and Betty Moore Foundation Grant #3304
A markov classification model for metabolic pathways
<p>Abstract</p> <p>Background</p> <p>This paper considers the problem of identifying pathways through metabolic networks that relate to a specific biological response. Our proposed model, HME3M, first identifies frequently traversed network paths using a Markov mixture model. Then by employing a hierarchical mixture of experts, separate classifiers are built using information specific to each path and combined into an ensemble prediction for the response.</p> <p>Results</p> <p>We compared the performance of HME3M with logistic regression and support vector machines (SVM) for both simulated pathways and on two metabolic networks, glycolysis and the pentose phosphate pathway for <it>Arabidopsis thaliana</it>. We use AltGenExpress microarray data and focus on the pathway differences in the developmental stages and stress responses of <it>Arabidopsis</it>. The results clearly show that HME3M outperformed the comparison methods in the presence of increasing network complexity and pathway noise. Furthermore an analysis of the paths identified by HME3M for each metabolic network confirmed known biological responses of <it>Arabidopsis</it>.</p> <p>Conclusions</p> <p>This paper clearly shows HME3M to be an accurate and robust method for classifying metabolic pathways. HME3M is shown to outperform all comparison methods and further is capable of identifying known biologically active pathways within microarray data.</p
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
Recommended from our members
Pathway based microarray analysis based on multi-membership gene regulation
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel UniversityRecent developments in automation and novel experimental techniques have led to the accumulation of vast amounts of biological data and the emergence of numerous databases to store the wealth of information. Consequentially, bioinformatics have drawn considerable attention, accompanied by the development of a plethora of tools for the analysis of biological data. DNA microarrays constitute a prominent example of a high-throughput experimental technique that has required substantial contribution of bioinformatics tools. Following its popularity there is an on-going effort to integrate gene expression with other types of data in a common analytical approach. Pathway based microarray analysis seeks to facilitate microarray data in conjunction with biochemical pathway data and look for a coordinated change in the expression of genes constituting a pathway. However, it has been observed that genes in a pathway may show variable expression, with some appearing activated while others repressed. This thesis aims to add some contribution to pathway based microarray analysis and assist the interpretation of such observations, based on the fact that in all organisms a substantial number of genes take part in more than one biochemical pathway. It explores the hypothesis that the expression of such genes represents a net effect of their contribution to all their constituent pathways, applying statistical and data mining approaches. A heuristic search methodology is proposed to manipulate the pathway contribution of genes to follow underlying trends and interpret microarray results centred on pathway behaviour. The methodology is further refined to account for distinct genes encoding enzymes that catalyse the same reaction, and applied to modules, shorter chains of reactions forming sub-networks within pathways. Results based on various datasets are discussed, showing that the methodology is promising and may assist a biologist to decipher the biochemical state of an organism, in experiments where pathways exhibit variable expression.School of Information Systems, Computing and Mathematics, Brunel Universit
Mining metabolic pathways through gene expression
Motivation: An observed metabolic response is the result of the coordinated activation and interaction between multiple genetic pathways. However, the complex structure of metabolism has meant that a compete understanding of which pathways are required to produce an observed metabolic response is not fully understood. In this article, we propose an approach that can identify the genetic pathways which dictate the response of metabolic network to specific experimental conditions
Statistical and Functional Analysis of Genomic and Proteomic Data
High-throughput technologies have led to an explosion in the availability of data at the genome scale. Such data provide important information about cellular processes and causes of human diseases, as well as for drug discovery. Deciphering the biologically relevant results from these data requires comprehensive analytical methods. In this dissertation, we present methods for gene and protein expression data analysis. Our major contributions include a method for differential in-gelelectrophoresis data analysis capable of removing protein-specific dye bias in the data, a method for finding unknown biological groups using expression data, and a method for identifying active and inactive signaling pathways in a gene expression signature based on the enrichment of downstream target genes of pathways
- …