2,981 research outputs found

    Modeling gene regulatory networks through data integration

    Full text link
    Modeling gene regulatory networks has become a problem of great interest in biology and medical research. Most common methods for learning regulatory dependencies rely on observations in the form of gene expression data. In this dissertation, computational models for gene regulation have been developed based on constrained regression by integrating comprehensive gene expression data for M. tuberculosis with genome-scale ChIP-Seq interaction data. The resulting models confirmed predictive power for expression in independent stress conditions and identified mechanisms driving hypoxic adaptation and lipid metabolism in M. tuberculosis. I then used the regulatory network model for M. tuberculosis to identify factors responding to stress conditions and drug treatments, revealing drug synergies and conditions that potentiate drug treatments. These results can guide and optimize design of drug treatments for this pathogen. I took the next step in this direction, by proposing a new probabilistic framework for learning modular structures in gene regulatory networks from gene expression and protein-DNA interaction data, combining the ideas of module networks and stochastic blockmodels. These models also capture combinatorial interactions between regulators. Comparisons with other network modeling methods that rely solely on expression data, showed the essentiality of integrating ChIP-Seq data in identifying direct regulatory links in M. tuberculosis. Moreover, this work demonstrates the theoretical advantages of integrating ChIP-Seq data for the class of widely-used module network models. The systems approach and statistical modeling presented in this dissertation can also be applied to problems in other organisms. A similar approach was taken to model the regulatory network controlling genes with circadian gene expression in Neurospora crassa, through integrating time-course expression data with ChIP-Seq data. The models explained combinatorial regulations leading to different phase differences in circadian rhythms. The Neurospora crassa network model also works as a tool to manipulate the phases of target genes

    Metabolic modeling of mycobacterium tuberculosis through the integration of large-scale genomics datasets

    Full text link
    Thesis (Ph. D.)--Boston UniversityMycobacterium tuberculosis (MTB) is the bacterium that is the causal agent of tuberculosis. MTB is estimated to infect one-third of the world's population. The emergence of multi drug-resistant and extensively drug-resistant strains of the bacterium are becoming a larger threat to global health as they decrease the efficacy of current treatments and make the disease more fatal. These factors combine to make MTB an interesting target for study with novel systems biology approaches. Genome-scale metabolic models have emerged as important platforms for the analysis of datasets that describe highly-interconnected biological processes. We have the first comprehensive profiling of mRNA, proteins, metabolites, and lipids in MTB during an in vitro model of infection that includes a time course of induced hypoxia andre-aeration. Hypoxia and reaeration are important cues during infection of the human host and act to model the environment seen in the host. We use genome-scale metabolic modeling methods to integrate these data with our metabolic model will allow us to generate experimentally testable predictions about metabolic adaptations that occur in response to experimental perturbations that represent an in vitro model of important environmental cues present during infection, dormancy, and re-activation in the human host

    Evolution of substrate specificity in a recipient's enzyme following horizontal gene transfer

    Get PDF
    Despite the prominent role of horizontal gene transfer (HGT) in shaping bacterial metabolism, little is known about the impact of HGT on the evolution of enzyme function. Specifically, what is the influence of a recently acquired gene on the function of an existing gene? For example, certain members of the genus Corynebacterium have horizontally acquired a whole L-tryptophan biosynthetic operon, whereas in certain closely related actinobacteria, for example, Mycobacterium, the trpF gene is missing. In Mycobacterium, the function of the trpF gene is performed by a dual-substrate (βα)8 phosphoribosyl isomerase (priA gene) also involved in L-histidine (hisA gene) biosynthesis. We investigated the effect of a HGT-acquired TrpF enzyme upon PriA’s substrate specificity in Corynebacterium through comparative genomics and phylogenetic reconstructions. After comprehensive in vivo and enzyme kinetic analyses of selected PriA homologs, a novel (βα)8 isomerase subfamily with a specialized function in L-histidine biosynthesis, termed subHisA, was confirmed. X-ray crystallography was used to reveal active-site mutations in subHisA important for narrowing of substrate specificity, which when mutated to the naturally occurring amino acid in PriA led to gain of function. Moreover, in silico molecular dynamic analyses demonstrated that the narrowing of substrate specificity of subHisA is concomitant with loss of ancestral protein conformational states. Our results show the importance of HGT in shaping enzyme evolution and metabolism

    A review of methods for the reconstruction and analysis of integrated genome-scale models of metabolism and regulation

    Get PDF
    The current survey aims to describe the main methodologies for extending the reconstruction and analysis of genome-scale metabolic models and phenotype simulation with Flux Balance Analysis mathematical frameworks, via the integration of Transcriptional Regulatory Networks and/or gene expression data. Although the surveyed methods are aimed at improving phenotype simulations obtained from these models, the perspective of reconstructing integrated genome-scale models of metabolism and gene expression for diverse prokaryotes is still an open challenge.This study was supported by the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UIDB/04 469/2020 unit and BioTecNorte operation (NORTE-01-0145-FEDER-000004) funded by the European Regional Development Fund under the scope of Norte2020 -Programa Operacional Regional do Norte. Fernando Cruz holds a doctoral fellowship (SFRH/BD/139198/2018) funded by the FCT. This study was supported by the European Commission through project SHIKIFACTORY100 -Modular cell factories for the production of 100 compounds from the shikimate pathway (Reference 814408). The submitted manuscript has been created by UChicago Argonne, LLC as Operator of Argonne National Laboratory (`Argonne') under Contract No. DE-AC02-06CH11357 with the U.S. Department of Energy. The U.S. Government retains for itself, and others acting on its behalf, a paid-up, nonexclusive, irrevocable worldwide license in said article to reproduce, prepare derivative works, distribute copies to the public, and perform publicly and display publicly, by or on behalf of the Government. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan.info:eu-repo/semantics/publishedVersio

    Understanding Communication Signals during Mycobacterial Latency through Predicted Genome-Wide Protein Interactions and Boolean Modeling

    Get PDF
    About 90% of the people infected with Mycobacterium tuberculosis carry latent bacteria that are believed to get activated upon immune suppression. One of the fundamental challenges in the control of tuberculosis is therefore to understand molecular mechanisms involved in the onset of latency and/or reactivation. We have attempted to address this problem at the systems level by a combination of predicted functional protein∶protein interactions, integration of functional interactions with large scale gene expression studies, predicted transcription regulatory network and finally simulations with a Boolean model of the network. Initially a prediction for genome-wide protein functional linkages was obtained based on genome-context methods using a Support Vector Machine. This set of protein functional linkages along with gene expression data of the available models of latency was employed to identify proteins involved in mediating switch signals during dormancy. We show that genes that are up and down regulated during dormancy are not only coordinately regulated under dormancy-like conditions but also under a variety of other experimental conditions. Their synchronized regulation indicates that they form a tightly regulated gene cluster and might form a latency-regulon. Conservation of these genes across bacterial species suggests a unique evolutionary history that might be associated with M. tuberculosis dormancy. Finally, simulations with a Boolean model based on the regulatory network with logical relationships derived from gene expression data reveals a bistable switch suggesting alternating latent and actively growing states. Our analysis based on the interaction network therefore reveals a potential model of M. tuberculosis latency

    Modeling metabolism of Mycobacterium tuberculosis

    Get PDF
    Approximately one-fourth of the Mycobacterium tuberculosis (Mtb) genome contains genes that encode enzymes directly involved in its metabolism. These enzymes represent potential drug targets that can be systematically probed with constraint based (CB) models through the prediction of genes essential (or the combination thereof) for the pathogen to grow. However, gene essentiality depends on the growth conditions and, so far, no in vitro model precisely mimics the host at the different stages of mycobacterial infection, limiting model predictions. A first step in creating such a model is a thoroughly curated and extended genome-scale CB metabolic model of Mtb metabolism. The history of genome-scale CB models of Mtb metabolism up to model sMtb are discussed and sMtb is quantitatively validated using 13C measurements. The human pathogen Mtb has the capacity to escape eradication by professional phagocytes. During infection, Mtb resists the harsh environment of phagosomes and actively manipulates macrophages and dendritic cells to ensure prolonged intracellular survival. In contrast to many other intracellular pathogens, it has remained difficult to capture the transcriptome of mycobacteria during infection due to an unfavorable host-to-pathogen ratio. The human macrophage-like cell line THP-1 was infected with the attenuated Mtb surrogate Mycobacterium bovis Bacillus Calmette–Guérin (M. bovis BCG). Mycobacterial RNA was up to 1000-fold underrepresented in total RNA preparations of infected host cells. By combining microbial enrichment with specific ribosomal RNA depletion the transcriptional responses of host and pathogen during infection were simultaneously analyzed using dual RNA sequencing. Mycobacterial pathways for cholesterol degradation and iron acquisition are upregulated during infection. In addition, genes involved in the methylcitrate cycle, aspartate metabolism and recycling of mycolic acids are induced. In response to M. bovis BCG infection, host cells upregulate de novo cholesterol biosynthesis presumably to compensate for the loss of this metabolite by bacterial catabolism. By systematically probing the metabolic network underpinning sMtb, the reactions that are essential for Mtb are identified. A majority of these reactions are catalyzed by enzymes and thus represent candidate drug targets to fight an Mtb infection. Modeling the behavior of the bacteria during infection requires knowledge of the so-called biomass reaction that represents bacterial biomass composition. This composition varies in different environments or bacterial growth phases. Accurate modeling of all fluxes through metabolism under a given condition at a moment in time, the so called metabolic state, requires a precise description of the biomass reaction for the described condition. The transcript abundance data obtained by dual RNA sequencing was used to develop a straightforward and systematic method to obtain a condition-specific biomass reaction for Mtb during in vitro growth and during infection of its host. The method described herein is virtually free of any pre-set assumptions on uptake rates of nutrients, making it suitable for exploring environments with limited accessibility. The condition-specific biomass reaction represents the 'metabolic objective' of Mtb in a given environment (in-host growth and growth on defined medium) at a specific time point, and as such allows modeling the bacterial metabolic state in these environments. Five different biomass reactions were used predict nutrient uptake rates and gene essentiality. Predictions were subsequently compared to available experimental data. Nutrient uptake can accurately be predicted, but accurate gene essentiality predictions remain difficult to obtain. By combining sMtb and a model of human metabolism, model sMtb-RECON was developed and used to predict the metabolic state of Mtb during infection of the host. Amino acids are predicted to be used for energy production as well as biomass formation. Subsequently the effect of increasing dosages of drugs, targeting metabolism, on the metabolic state of the pathogen was assessed and resulting metabolic adaptations and flux rerouting through various pathways is predicted. In particular, the TCA cycle becomes more important upon drug application, as well as alanine, aspartate, glutamate, proline, arginine and porphyrin metabolism, while glycine, serine and threonine metabolism become less important for survival. Notably, an effect of eight out of eleven metabolically active drugs could be recreated and two major profiles of the metabolic state were predicted. The profiles of the metabolic states of Mtb affected by the drugs BTZ043, cycloserine and its derivative terizidone, ethambutol, ethionamide, propionamide, and isoniazid were very similar, while TMC207 is predicted to have quite a different effect on metabolism as it inhibits ATP synthase and therefore indirectly interferes with a multitude of metabolic pathways.</p

    _M. tuberculosis_ interactome analysis unravels potential pathways to drug resistance

    Get PDF
    Drug resistance is a major problem for combating tuberculosis. Lack of understanding of how resistance emerges in bacteria upon drug treatment limits our ability to counter resistance. By analysis of the _Mycobacterium tuberculosis_ interactome network, along with drug-induced expression data from literature, we show possible pathways for the emergence of drug resistance. To a curated set of resistance related proteins, we have identified sets of high propensity paths from different drug targets. Many top paths were upregulated upon exposure to anti-tubercular drugs. Different targets appear to have different propensities for the four resistance mechanisms. Knowledge of important proteins in such pathways enables identification of appropriate _&#x27;co-targets&#x27;_, which when simultaneously inhibited with the intended target, is likely to help in combating drug resistance. RecA, Rv0823c, Rv0892 and DnaE1 were the best examples of co-targets for combating tuberculosis. This approach is also inherently generic, likely to significantly impact drug discovery

    Semantic systems biology of prokaryotes : heterogeneous data integration to understand bacterial metabolism

    Get PDF
    The goal of this thesis is to improve the prediction of genotype to phenotypeassociations with a focus on metabolic phenotypes of prokaryotes. This goal isachieved through data integration, which in turn required the development ofsupporting solutions based on semantic web technologies. Chapter 1 providesan introduction to the challenges associated to data integration. Semantic webtechnologies provide solutions to some of these challenges and the basics ofthese technologies are explained in the Introduction. Furthermore, the ba-sics of constraint based metabolic modeling and construction of genome scalemodels (GEM) are also provided. The chapters in the thesis are separated inthree related topics: chapters 2, 3 and 4 focus on data integration based onheterogeneous networks and their application to the human pathogen M. tu-berculosis; chapters 5, 6, 7, 8 and 9 focus on the semantic web based solutionsto genome annotation and applications thereof; and chapter 10 focus on thefinal goal to associate genotypes to phenotypes using GEMs. Chapter 2 provides the prototype of a workflow to efficiently analyze in-formation generated by different inference and prediction methods. This me-thod relies on providing the user the means to simultaneously visualize andanalyze the coexisting networks generated by different algorithms, heteroge-neous data sets, and a suite of analysis tools. As a show case, we have ana-lyzed the gene co-expression networks of M. tuberculosis generated using over600 expression experiments. Hereby we gained new knowledge about theregulation of the DNA repair, dormancy, iron uptake and zinc uptake sys-tems. Furthermore, it enabled us to develop a pipeline to integrate ChIP-seqdat and a tool to uncover multiple regulatory layers. In chapter 3 the prototype presented in chapter 2 is further developedinto the Synchronous Network Data Integration (SyNDI) framework, whichis based on Cytoscape and Galaxy. The functionality and usability of theframework is highlighted with three biological examples. We analyzed thedistinct connectivity of plasma metabolites in networks associated with highor low latent cardiovascular disease risk. We obtained deeper insights froma few similar inflammatory response pathways in Staphylococcus aureus infec-tion common to human and mouse. We identified not yet reported regulatorymotifs associated with transcriptional adaptations of M. tuberculosis.In chapter 4 we present a review providing a systems level overview ofthe molecular and cellular components involved in divalent metal homeosta-sis and their role in regulating the three main virulence strategies of M. tu-berculosis: immune modulation, dormancy and phagosome escape. With theuse of the tools presented in chapter 2 and 3 we identified a single regulatorycascade for these three virulence strategies that respond to limited availabilityof divalent metals in the phagosome. The tools presented in chapter 2 and 3 achieve data integration throughthe use of multiple similarity, coexistence, coexpression and interaction geneand protein networks. However, the presented tools cannot store additional(genome) annotations. Therefore, we applied semantic web technologies tostore and integrate heterogeneous annotation data sets. An increasing num-ber of widely used biological resources are already available in the RDF datamodel. There are however, no tools available that provide structural overviewsof these resources. Such structural overviews are essential to efficiently querythese resources and to assess their structural integrity and design. There-fore, in chapter 5, I present RDF2Graph, a tool that automatically recoversthe structure of an RDF resource. The generated overview enables users tocreate complex queries on these resources and to structurally validate newlycreated resources. Direct functional comparison support genotype to phenotype predictions.A prerequisite for a direct functional comparison is consistent annotation ofthe genetic elements with evidence statements. However, the standard struc-tured formats used by the public sequence databases to present genome an-notations provide limited support for data mining, hampering comparativeanalyses at large scale. To enable interoperability of genome annotations fordata mining application, we have developed the Genome Biology OntologyLanguage (GBOL) and associated infrastructure (GBOL stack), which is pre-sented in chapter 6. GBOL is provenance aware and thus provides a consistentrepresentation of functional genome annotations linked to the provenance.The provenance of a genome annotation describes the contextual details andderivation history of the process that resulted in the annotation. GBOL is mod-ular in design, extensible and linked to existing ontologies. The GBOL stackof supporting tools enforces consistency within and between the GBOL defi-nitions in the ontology. Based on GBOL, we developed the genome annotation pipeline SAPP (Se-mantic Annotation Platform with Provenance) presented in chapter 7. SAPPautomatically predicts, tracks and stores structural and functional annotationsand associated dataset- and element-wise provenance in a Linked Data for-mat, thereby enabling information mining and retrieval with Semantic Webtechnologies. This greatly reduces the administrative burden of handling mul-tiple analysis tools and versions thereof and facilitates multi-level large scalecomparative analysis. In turn this can be used to make genotype to phenotypepredictions. The development of GBOL and SAPP was done simultaneously. Duringthe development we realized that we had to constantly validated the data ex-ported to RDF to ensure coherence with the ontology. This was an extremelytime consuming process and prone to error, therefore we developed the Em-pusa code generator. Empusa is presented in chapter 8. SAPP has been successfully used to annotate 432 sequenced Pseudomonas strains and integrate the resulting annotation in a large scale functional com-parison using protein domains. This comparison is presented in chapter 9.Additionally, data from six metabolic models, nearly a thousand transcrip-tome measurements and four large scale transposon mutagenesis experimentswere integrated with the genome annotations. In this way, we linked gene es-sentiality, persistence and expression variability. This gave us insight into thediversity, versatility and evolutionary history of the Pseudomonas genus, whichcontains some important pathogens as well some useful species for bioengi-neering and bioremediation purposes. Genome annotation can be used to create GEM, which can be used to betterlink genotypes to phenotypes. Bio-Growmatch, presented in chapter 10, istool that can automatically suggest modification to improve a GEM based onphenotype data. Thereby integrating growth data into the complete processof modelling the metabolism of an organism. Chapter 11 presents a general discussion on how the chapters contributedthe central goal. After which I discuss provenance requirements for data reuseand integration. I further discuss how this can be used to further improveknowledge generation. The acquired knowledge could, in turn, be used to de-sign new experiments. The principles of the dry-lab cycle and how semantictechnologies can contribute to establish these cycles are discussed in chapter11. Finally a discussion is presented on how to apply these principles to im-prove the creation and usability of GEM’s.</p
    corecore