142 research outputs found

    Blueprint: descrição da complexidade da regulação metabólica através da reconstrução de modelos metabólicos e regulatórios integrados

    Get PDF
    Tese de doutoramento em Biomedical EngineeringUm modelo metabólico consegue prever o fenótipo de um organismo. No entanto, estes modelos podem obter previsões incorretas, pois alguns processos metabólicos são controlados por mecanismos reguladores. Assim, várias metodologias foram desenvolvidas para melhorar os modelos metabólicos através da integração de redes regulatórias. Todavia, a reconstrução de modelos regulatórios e metabólicos à escala genómica para diversos organismos apresenta diversos desafios. Neste trabalho, propõe-se o desenvolvimento de diversas ferramentas para a reconstrução e análise de modelos metabólicos e regulatórios à escala genómica. Em primeiro lugar, descreve-se o Biological networks constraint-based In Silico Optimization (BioISO), uma nova ferramenta para auxiliar a curação manual de modelos metabólicos. O BioISO usa um algoritmo de relação recursiva para orientar as previsões de fenótipo. Assim, esta ferramenta pode reduzir o número de artefatos em modelos metabólicos, diminuindo a possibilidade de obter erros durante a fase de curação. Na segunda parte deste trabalho, desenvolveu-se um repositório de redes regulatórias para procariontes que permite suportar a sua integração em modelos metabólicos. O Prokaryotic Transcriptional Regulatory Network Database (ProTReND) inclui diversas ferramentas para extrair e processar informação regulatória de recursos externos. Esta ferramenta contém um sistema de integração de dados que converte dados dispersos de regulação em redes regulatórias integradas. Além disso, o ProTReND dispõe de uma aplicação que permite o acesso total aos dados regulatórios. Finalmente, desenvolveu-se uma ferramenta computacional no MEWpy para simular e analisar modelos regulatórios e metabólicos. Esta ferramenta permite ler um modelo metabólico e/ou rede regulatória, em diversos formatos. Esta estrutura consegue construir um modelo regulatório e metabólico integrado usando as interações regulatórias e as ligações entre genes e proteínas codificadas no modelo metabólico e na rede regulatória. Além disso, esta estrutura suporta vários métodos de previsão de fenótipo implementados especificamente para a análise de modelos regulatórios-metabólicos.Genome-Scale Metabolic (GEM) models can predict the phenotypic behavior of organisms. However, these models can lead to incorrect predictions, as certain metabolic processes are controlled by regulatory mechanisms. Accordingly, many methodologies have been developed to extend the reconstruction and analysis of GEM models via the integration of Transcriptional Regulatory Network (TRN)s. Nevertheless, the perspective of reconstructing integrated genome-scale regulatory and metabolic models for diverse prokaryotes is still an open challenge. In this work, we propose several tools to assist the reconstruction and analysis of regulatory and metabolic models. We start by describing BioISO, a novel tool to assist the manual curation of GEM models. BioISO uses a recursive relation-like algorithm and Flux Balance Analysis (FBA) to evaluate and guide debugging of in silico phenotype predictions. Hence, this tool can reduce the number of artifacts in GEM models, decreasing the burdens of model refinement and curation. A state-of-the-art repository of TRNs for prokaryotes was implemented to support the reconstruction and integration of TRNs into GEM models. The ProTReND repository comprehends several tools to extract and process regulatory information available in several resources. More importantly, this repository contains a data integration system to unify the regulatory data into standardized TRNs at the genome scale. In addition, ProTReND contains a web application with full access to the regulatory data. Finally, we have developed a new modeling framework to define, simulate and analyze GEnome-scale Regulatory and Metabolic (GERM) models in MEWpy. The GERM model framework can read a GEM model, as well as a TRN from different file formats. This framework assembles a GERM model using the regulatory interactions and Genes-Proteins-Reactions (GPR) rules encoded into the GEM model and TRN. In addition, this modeling framework supports several methods of phenotype prediction designed for regulatory-metabolic models.I would like to thank Fundação para a Ciência e Tecnologia for the Ph.D. studentship I was awarded with (SFRH/BD/139198/2018)

    Bioinformatic-driven search for metabolic biomarkers in disease

    Get PDF
    The search and validation of novel disease biomarkers requires the complementary power of professional study planning and execution, modern profiling technologies and related bioinformatics tools for data analysis and interpretation. Biomarkers have considerable impact on the care of patients and are urgently needed for advancing diagnostics, prognostics and treatment of disease. This survey article highlights emerging bioinformatics methods for biomarker discovery in clinical metabolomics, focusing on the problem of data preprocessing and consolidation, the data-driven search, verification, prioritization and biological interpretation of putative metabolic candidate biomarkers in disease. In particular, data mining tools suitable for the application to omic data gathered from most frequently-used type of experimental designs, such as case-control or longitudinal biomarker cohort studies, are reviewed and case examples of selected discovery steps are delineated in more detail. This review demonstrates that clinical bioinformatics has evolved into an essential element of biomarker discovery, translating new innovations and successes in profiling technologies and bioinformatics to clinical application

    Protein-protein interactions and metabolic pathways reconstruction of Caenorhabditis elegans

    Get PDF
    Metabolic networks are the collections of all cellular activities taking place in a living cell and all the relationships among biological elements of the cell including genes, proteins, enzymes, metabolites, and reactions. They provide a better understanding of cellular mechanisms and phenotypic characteristics of the studied organism. In order to reconstruct a metabolic network, interactions among genes and their molecular attributes along with their functions must be known. Using this information, proteins are distributed among pathways as sub-networks of a greater metabolic network. Proteins which carry out various steps of a biological process operate in same pathway.The metabolic network of Caenorhabditis elegans was reconstructed based on current genomic information obtained from the KEGG database, and commonly found in SWISS-PROT and WormBase. Assuming proteins operating in a pathway are interacting proteins, currently available protein-protein interaction map of the studied organism was assembled. This map contains all known protein-protein interactions collected from various sources up to the time. Topology of the reconstructed network was briefly studied and the role of key enzymes in the interconnectivity of the network was analysed. The analysis showed that the shortest metabolic paths represent the most probable routes taken by the organism where endogenous sources of nutrient are available to the organism. Nonetheless, there are alternate paths to allow the organism to survive under extraneous variations. Signature content information of proteins was utilized to reveal protein interactions upon a notion that when two proteins share signature(s) in their primary structures, the two proteins are more likely to interact. The signature content of proteins was used to measure the extent of similarity between pairs of proteins based on binary similarity score. Pairs of proteins with a binary similarity score greater than a threshold corresponding to confidence level 95% were predicted as interacting proteins. The reliability of predicted pairs was statistically analyzed. The sensitivity and specificity analysis showed that the proposed approach outperformed maximum likelihood estimation (MLE) approach with a 22% increase in area under curve of receiving operator characteristic (ROC) when they were applied to the same datasets. When proteins containing one and two known signatures were removed from the protein dataset, the area under curve (AUC) increased from 0.549 to 0.584 and 0.655, respectively. Increase in the AUC indicates that proteins with one or two known signatures do not provide sufficient information to predict robust protein-protein interactions. Moreover, it demonstrates that when proteins with more known signatures are used in signature profiling methods the overlap with experimental findings will increase resulting in higher true positive rate and eventually greater AUC. Despite the accuracy of protein-protein interaction methods proposed here and elsewhere, they often predict true positive interactions along with numerous false positive interactions. A global algorithm was also proposed to reduce the number of false positive predicted protein interacting pairs. This algorithm relies on gene ontology (GO) annotations of proteins involved in predicted interactions. A dataset of experimentally confirmed protein pair interactions and their GO annotations was used as a training set to train keywords which were able to recover both their source interactions (training set) and predicted interactions in other datasets (test sets). These keywords along with the cellular component annotation of proteins were employed to set a pair of rules that were to be satisfied by any predicted pair of interacting proteins. When this algorithm was applied to four predicted datasets obtained using phylogenetic profiles, gene expression patterns, chance co-occurrence distribution coefficient, and maximum likelihood estimation for S. cerevisiae and C. elegans, the improvement in true positive fractions of the datasets was observed in a magnitude of 2-fold to 10-fold depending on the computational method used to create the dataset and the available information on the organism of interest. The predicted protein-protein interactions were incorporated into the prior reconstructed metabolic network of C. elegans, resulting in 1024 new interactions among 94 metabolic pathways. In each of 1024 new interactions one unknown protein was interacting with a known partner found in the reconstructed metabolic network. Unknown proteins were characterized based on the involvement of their known partners. Based on the binary similarity scores, the function of an uncharacterized protein in an interacting pair was defined according to its known counterpart whose function was already specified. With the incorporation of new predicted interactions to the metabolic network, an expanded version of that network was resulted with 27% increase in the number of known proteins involved in metabolism. Connectivity of proteins in protein-protein interaction map changed from 42 to 34 due to the increase in the number of characterized proteins in the network

    Genome-wide discovery of missing genes in biological pathways of prokaryotes

    Get PDF
    <p> Abstract</p> <p>Background</p> <p>Reconstruction of biological pathways is typically done through mapping well-characterized pathways of model organisms to a target genome, through orthologous gene mapping. A limitation of such pathway-mapping approaches is that the mapped pathway models are constrained by the composition of the template pathways, e.g., some genes in a target pathway may not have corresponding genes in the template pathways, the so-called “missing gene” problem.</p> <p>Methods</p> <p>We present a novel pathway-expansion method for identifying additional genes that are possibly involved in a target pathway after pathway mapping, to fill holes caused by missing genes as well as to expand the mapped pathway model. The basic idea of the algorithm is to identify genes in the target genome whose homologous genes share common operons with homologs of any mapped pathway genes in some reference genome, and to add such genes to the target pathway if their functions are consistent with the cellular function of the target pathway.</p> <p>Results</p> <p>We have implemented this idea using a graph-theoretic approach and demonstrated the effectiveness of the algorithm on known pathways of <it>E. coli</it> in the KEGG database. On all KEGG pathways containing at least 5 genes, our method achieves an average of 60% positive predictive value (PPV) and the performance is increased with more seed genes added. Analysis shows that our method is highly robust.</p> <p>Conclusions</p> <p>An effective method is presented to find missing genes in biological pathways of prokaryotes, which achieves high prediction reliability on <it>E. coli</it> at a genome level. Numerous missing genes are found to be related to knwon <it>E. coli</it> pathways, which can be further validated through biological experiments. Overall this method is robust and can be used for functional inference.</p

    ComPath: comparative enzyme analysis and annotation in pathway/subsystem contexts

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Once a new genome is sequenced, one of the important questions is to determine the presence and absence of biological pathways. Analysis of biological pathways in a genome is a complicated task since a number of biological entities are involved in pathways and biological pathways in different organisms are not identical. Computational pathway identification and analysis thus involves a number of computational tools and databases and typically done in comparison with pathways in other organisms. This computational requirement is much beyond the capability of biologists, so information systems for reconstructing, annotating, and analyzing biological pathways are much needed. We introduce a new comparative pathway analysis workbench, ComPath, which integrates various resources and computational tools using an interactive spreadsheet-style web interface for reliable pathway analyses.</p> <p>Results</p> <p>ComPath allows users to compare biological pathways in multiple genomes using a spreadsheet style web interface where various sequence-based analysis can be performed either to compare enzymes (e.g. sequence clustering) and pathways (e.g. pathway hole identification), to search a genome for <it>de novo </it>prediction of enzymes, or to annotate a genome in comparison with reference genomes of choice. To fill in pathway holes or make <it>de novo </it>enzyme predictions, multiple computational methods such as FASTA, Whole-HMM, CSR-HMM (a method of our own introduced in this paper), and PDB-domain search are integrated in ComPath. Our experiments show that FASTA and CSR-HMM search methods generally outperform Whole-HMM and PDB-domain search methods in terms of sensitivity, but FASTA search performs poorly in terms of specificity, detecting more false positive as E-value cutoff increases. Overall, CSR-HMM search method performs best in terms of both sensitivity and specificity. Gene neighborhood and pathway neighborhood (global network) visualization tools can be used to get context information that is complementary to conventional KEGG map representation.</p> <p>Conclusion</p> <p>ComPath is an interactive workbench for pathway reconstruction, annotation, and analysis where experts can perform various sequence, domain, context analysis, using an intuitive and interactive spreadsheet-style interface. </p

    Dynamic gene network reconstruction from gene expression data in mice after influenza A (H1N1) infection

    Get PDF
    Abstract Background The immune response to viral infection is a temporal process, represented by a dynamic and complex network of gene and protein interactions. Here, we present a reverse engineering strategy aimed at capturing the temporal evolution of the underlying Gene Regulatory Networks (GRN). The proposed approach will be an enabling step towards comprehending the dynamic behavior of gene regulation circuitry and mapping the network structure transitions in response to pathogen stimuli. Results We applied the Time Varying Dynamic Bayesian Network (TV-DBN) method for reconstructing the gene regulatory interactions based on time series gene expression data for the mouse C57BL/6J inbred strain after infection with influenza A H1N1 (PR8) virus. Initially, 3500 differentially expressed genes were clustered with the use of k-means algorithm. Next, the successive in time GRNs were built over the expression profiles of cluster centroids. Finally, the identified GRNs were examined with several topological metrics and available protein-protein and protein-DNA interaction data, transcription factor and KEGG pathway data. Conclusions Our results elucidate the potential of TV-DBN approach in providing valuable insights into the temporal rewiring of the lung transcriptome in response to H1N1 virus

    Teak: A Novel Computational And Gui Software Pipeline For Reconstructing Biological Networks, Detecting Activated Biological Subnetworks, And Querying Biological Networks.

    Get PDF
    As high-throughput gene expression data becomes cheaper and cheaper, researchers are faced with a deluge of data from which biological insights need to be extracted and mined since the rate of data accumulation far exceeds the rate of data analysis. There is a need for computational frameworks to bridge the gap and assist researchers in their tasks. The Topology Enrichment Analysis frameworK (TEAK) is an open source GUI and software pipeline that seeks to be one of many tools that fills in this gap and consists of three major modules. The first module, the Gene Set Cultural Algorithm, de novo infers biological networks from gene sets using the KEGG pathways as prior knowledge. The second and third modules query against the KEGG pathways using molecular profiling data and query graphs, respectively. In particular, the second module, also called TEAK, is a network partitioning module that partitions the KEGG pathways into both linear and nonlinear subpathways. In conjunction with molecular profiling data, the subpathways are ranked and displayed to the user within the TEAK GUI. Using a public microarray yeast data set, previously unreported fitness defects for dpl1 delta and lag1 delta mutants under conditions of nitrogen limitation were found using TEAK. Finally, the third module, the Query Structure Enrichment Analysis framework, is a network query module that allows researchers to query their biological hypotheses in the form of Directed Acyclic Graphs against the KEGG pathways

    SEA: a novel computational and GUI software pipeline for detecting activated biological sub-pathways

    Get PDF
    With the ever increasing amount of high-throughput molecular profile data, biologists need versatile tools to enable them to quickly and succinctly analyze their data. Furthermore, pathway databases have grown increasingly robust with the KEGG database at the forefront. Previous tools have color-coded the genes on different pathways using differential expression analysis. Unfortunately, they do not adequately capture the relationships of the genes amongst one another. Structure Enrichment Analysis (SEA) thus seeks to take biological analysis to the next level. SEA accomplishes this goal by highlighting for users the sub-pathways of a biological pathways that best correspond to their molecular profile data in an easy to use GUI interface
    corecore