1,809 research outputs found

    Whole-transciptome analysis of [psi+] budding yeast via cDNA microarrays

    Get PDF
    Introduction: Prions of yeast present a novel analytical challenge in terms of both initial characterization and in vitro manipulation as models for human disease research. Presently, few robust analysis strategies have been successfully implemented which enable the efficient study of prion behavior in vivo. This study sought to evaluate the utilization of conventional dual-channel cDNA microarrays for the surveillance of transcriptomic regulation patterns by the [PSI+] yeast prion relative to an identical prion deficient yeast variant, [psi-]. Methods: A data analysis and normalization workflow strategy was developed and applied to cDNA array images, yielded quality-regulated expression ratios for a subset of genes exhibiting statistical congruence across multiple experimental repetitions and nested hybridization events. The significant gene list was analyzed using classical analytical approaches including several clustering-based methods and singular value decomposition. To add biological meaning to the differential expression data in hand, functional annotation using the Gene Ontology as well as several pathway-mapping approaches was conducted. Finally, the expression patterns observed were queried against all publicly curated microarray data performed using S. cerevisiae in order to discover similar expression behavior across a vast array of experimental conditions. Results: These data collectively implicate a low-level of overall genomic regulation as a result of the [PSI+] state, where the maximum statistically significant degree of differential expression was less than ±1 Log2(FC) in all cases. Notwithstanding, the [PSI+] differential expression was localized to several specific classes of structural elements and cellular functions, implying under homeostatic conditions significant up or down regulation is likely unnecessary but possible in those specific systems if environmental conditions warranted. As a result of these findings additional work pertaining to this system should include controlled insult to both yeast variants of differing environmental properties to promote a potential [PSI+] regulatory response coupled with co-surveillance of these conditions using transcriptomic and proteomic analysis methodologies

    Genome-Wide Analysis of Gene Expression during Early Arabidopsis Flower Development

    Get PDF
    Detailed information about stage-specific changes in gene expression is crucial for the understanding of the gene regulatory networks underlying development. Here, we describe the global gene expression dynamics during early flower development, a key process in the life cycle of a plant, during which floral patterning and the specification of floral organs is established. We used a novel floral induction system in Arabidopsis, which allows the isolation of a large number of synchronized floral buds, in conjunction with whole-genome microarray analysis to identify genes with differential expression at distinct stages of flower development. We found that the onset of flower formation is characterized by a massive downregulation of genes in incipient floral primordia, which is followed by a predominance of gene activation during the differentiation of floral organs. Among the genes we identified as differentially expressed in the experiment, we detected a significant enrichment of closely related members of gene families. The expression profiles of these related genes were often highly correlated, indicating similar temporal expression patterns. Moreover, we found that the majority of these genes is specifically up-regulated during certain developmental stages. Because co-expressed members of gene families in Arabidopsis frequently act in a redundant manner, these results suggest a high degree of functional redundancy during early flower development, but also that its extent may vary in a stage-specific manner

    Systems biology applications to study mechanisms of human immunodeficiency virus latency and reactivation

    Get PDF
    Eradication of human immunodeficiency virus (HIV) in infected individuals is currently not possible because of the presence of the persistent cellular reservoir of latent infection. The identification of HIV latency biomarkers and a better understanding of the molecular mechanisms contributing to regulation of HIV expression might provide essential tools to eliminate these latently infected cells. This review aims at summarizing gene expression profiling and systems biology applications to studies of HIV latency and eradication. Studies comparing gene expression in latently infected and uninfected cells identify candidate latency biomarkers and novel mechanisms of latency control. Studies that profiled gene expression changes induced by existing latency reversing agents (LRAs) highlight uniting themes driving HIV reactivation and novel mechanisms that contribute to regulation of HIV expression by different LRAs. Among the reviewed gene expression studies, the common approaches included identification of differentially expressed genes and gene functional category assessment. Integration of transcriptomic data with other biological data types is presently scarce, and the field would benefit from increased adoption of these methods in future studies. In addition, designing prospective studies that use the same methods of data acquisition and statistical analyses will facilitate a more reliable identification of latency biomarkers using different model systems and the comparison of the effects of different LRAs on host factors with a role in HIV reactivation. The results from such studies would have the potential to significantly impact the process by which candidate drugs are selected and combined for future evaluations and advancement to clinical trials

    Computationally Linking Chemical Exposure to Molecular Effects with Complex Data: Comparing Methods to Disentangle Chemical Drivers in Environmental Mixtures and Knowledge-based Deep Learning for Predictions in Environmental Toxicology

    Get PDF
    Chemical exposures affect the environment and may lead to adverse outcomes in its organisms. Omics-based approaches, like standardised microarray experiments, have expanded the toolbox to monitor the distribution of chemicals and assess the risk to organisms in the environment. The resulting complex data have extended the scope of toxicological knowledge bases and published literature. A plethora of computational approaches have been applied in environmental toxicology considering systems biology and data integration. Still, the complexity of environmental and biological systems given in data challenges investigations of exposure-related effects. This thesis aimed at computationally linking chemical exposure to biological effects on the molecular level considering sources of complex environmental data. The first study employed data of an omics-based exposure study considering mixture effects in a freshwater environment. We compared three data-driven analyses in their suitability to disentangle mixture effects of chemical exposures to biological effects and their reliability in attributing potentially adverse outcomes to chemical drivers with toxicological databases on gene and pathway levels. Differential gene expression analysis and a network inference approach resulted in toxicologically meaningful outcomes and uncovered individual chemical effects — stand-alone and in combination. We developed an integrative computational strategy to harvest exposure-related gene associations from environmental samples considering mixtures of lowly concentrated compounds. The applied approaches allowed assessing the hazard of chemicals more systematically with correlation-based compound groups. This dissertation presents another achievement toward a data-driven hypothesis generation for molecular exposure effects. The approach combined text-mining and deep learning. The study was entirely data-driven and involved state-of-the-art computational methods of artificial intelligence. We employed literature-based relational data and curated toxicological knowledge to predict chemical-biomolecule interactions. A word embedding neural network with a subsequent feed-forward network was implemented. Data augmentation and recurrent neural networks were beneficial for training with curated toxicological knowledge. The trained models reached accuracies of up to 94% for unseen test data of the employed knowledge base. However, we could not reliably confirm known chemical-gene interactions across selected data sources. Still, the predictive models might derive unknown information from toxicological knowledge sources, like literature, databases or omics-based exposure studies. Thus, the deep learning models might allow predicting hypotheses of exposure-related molecular effects. Both achievements of this dissertation might support the prioritisation of chemicals for testing and an intelligent selection of chemicals for monitoring in future exposure studies.:Table of Contents ... I Abstract ... V Acknowledgements ... VII Prelude ... IX 1 Introduction 1.1 An overview of environmental toxicology ... 2 1.1.1 Environmental toxicology ... 2 1.1.2 Chemicals in the environment ... 4 1.1.3 Systems biological perspectives in environmental toxicology ... 7 Computational toxicology ... 11 1.2.1 Omics-based approaches ... 12 1.2.2 Linking chemical exposure to transcriptional effects ... 14 1.2.3 Up-scaling from the gene level to higher biological organisation levels ... 19 1.2.4 Biomedical literature-based discovery ... 24 1.2.5 Deep learning with knowledge representation ... 27 1.3 Research question and approaches ... 29 2 Methods and Data ... 33 2.1 Linking environmental relevant mixture exposures to transcriptional effects ... 34 2.1.1 Exposure and microarray data ... 34 2.1.2 Preprocessing ... 35 2.1.3 Differential gene expression ... 37 2.1.4 Association rule mining ... 38 2.1.5 Weighted gene correlation network analysis ... 39 2.1.6 Method comparison ... 41 Predicting exposure-related effects on a molecular level ... 44 2.2.1 Input ... 44 2.2.2 Input preparation ... 47 2.2.3 Deep learning models ... 49 2.2.4 Toxicogenomic application ... 54 3 Method comparison to link complex stream water exposures to effects on the transcriptional level ... 57 3.1 Background and motivation ... 58 3.1.1 Workflow ... 61 3.2 Results ... 62 3.2.1 Data preprocessing ... 62 3.2.2 Differential gene expression analysis ... 67 3.2.3 Association rule mining ... 71 3.2.4 Network inference ... 78 3.2.5 Method comparison ... 84 3.2.6 Application case of method integration ... 87 3.3 Discussion ... 91 3.4 Conclusion ... 99 4 Deep learning prediction of chemical-biomolecule interactions ... 101 4.1 Motivation ... 102 4.1.1Workflow ...105 4.2 Results ... 107 4.2.1 Input preparation ... 107 4.2.2 Model selection ... 110 4.2.3 Model comparison ... 118 4.2.4 Toxicogenomic application ... 121 4.2.5 Horizontal augmentation without tail-padding ...123 4.2.6 Four-class problem formulation ... 124 4.2.7 Training with CTD data ... 125 4.3 Discussion ... 129 4.3.1 Transferring biomedical knowledge towards toxicology ... 129 4.3.2 Deep learning with biomedical knowledge representation ...133 4.3.3 Data integration ...136 4.4 Conclusion ... 141 5 Conclusion and Future perspectives ... 143 5.1 Conclusion ... 143 5.1.1 Investigating complex mixtures in the environment ... 144 5.1.2 Complex knowledge from literature and curated databases predict chemical- biomolecule interactions ... 145 5.1.3 Linking chemical exposure to biological effects by integrating CTD ... 146 5.2 Future perspectives ... 147 S1 Supplement Chapter 1 ... 153 S1.1 Example of an estrogen bioassay ... 154 S1.2 Types of mode of action ... 154 S1.3 The dogma of molecular biology ... 157 S1.4 Transcriptomics ... 159 S2 Supplement Chapter 3 ... 161 S3 Supplement Chapter 4 ... 175 S3.1 Hyperparameter tuning results ... 176 S3.2 Functional enrichment with predicted chemical-gene interactions and CTD reference pathway genesets ... 179 S3.3 Reduction of learning rate in a model with large word embedding vectors ... 183 S3.4 Horizontal augmentation without tail-padding ... 183 S3.5 Four-relationship classification ... 185 S3.6 Interpreting loss observations for SemMedDB trained models ... 187 List of Abbreviations ... i List of Figures ... vi List of Tables ... x Bibliography ... xii Curriculum scientiae ... xxxix Selbständigkeitserklärung ... xlii

    Analysis of the dynamic co-expression network of heart regeneration in the zebrafish.

    Get PDF
    The zebrafish has the capacity to regenerate its heart after severe injury. While the function of a few genes during this process has been studied, we are far from fully understanding how genes interact to coordinate heart regeneration. To enable systematic insights into this phenomenon, we generated and integrated a dynamic co-expression network of heart regeneration in the zebrafish and linked systems-level properties to the underlying molecular events. Across multiple post-injury time points, the network displays topological attributes of biological relevance. We show that regeneration steps are mediated by modules of transcriptionally coordinated genes, and by genes acting as network hubs. We also established direct associations between hubs and validated drivers of heart regeneration with murine and human orthologs. The resulting models and interactive analysis tools are available at http://infused.vital-it.ch. Using a worked example, we demonstrate the usefulness of this unique open resource for hypothesis generation and in silico screening for genes involved in heart regeneration

    Dissection of Complex Genetic Correlations into Interaction Effects

    Get PDF
    Living systems are overwhelmingly complex and consist of many interacting parts. Already the quantitative characterization of a single human cell type on genetic level requires at least the measurement of 20000 gene expressions. It remains a big challenge for theoretical approaches to discover patterns in these signals that represent specific interactions in such systems. A major problem is that available standard procedures summarize gene expressions in a hard-to-interpret way. For example, principal components represent axes of maximal variance in the gene vector space and thus often correspond to a superposition of multiple different gene regulation effects (e.g. I.1.4). Here, a novel approach to analyze and interpret such complex data is developed (Chapter II). It is based on an extremum principle that identifies an axis in the gene vector space to which as many as possible samples are correlated as highly as possible (II.3). This axis is maximally specific and thus most probably corresponds to exactly one gene regulation effect, making it considerably easier to interpret than principle components. To stabilize and optimize effect discovery, axes in the sample vector space are identified simultaneously. Genes and samples are always handled symmetrically by the algorithm. While sufficient for effect discovery, effect axes can only linearly approximate regulation laws. To represent a broader class of nonlinear regulations, including saturation effects or activity thresholds (e.g. II.1.1.2), a bimonotonic effect model is defined (II.2.1.2). A corresponding regression is realized that is monotonic over projections of samples (or genes) onto discovered gene (or sample) axes. Resulting effect curves can approximate regulation laws precisely (II.4.1). This enables the dissection of exclusively the discovered effect from the signal (II.4.2). Signal parts from other potentially overlapping effects remain untouched. This continues iteratively. In this way, the high-dimensional initial signal (II.2.1.1) can be dissected into highly specific effects. Method validation demonstrates that superposed effects of various size, shape and signal strength can be dissected reliably (II.6.2). Simulated laws of regulation are reconstructed with high correlation. Detection limits, e.g. for signal strength or for missing values, lie above practical requirements (II.6.4). The novel approach is systematically compared with standard procedures such as principal component analysis. Signal dissection is shown to have clear advantages, especially for many overlapping effects of comparable size (II.6.3). An ideal test field for such approaches is cancer cells, as they may be driven by multiple overlapping gene regulation networks that are largely unknown. Additionally, quantification and classification of cancer cells by their particular set of driving gene regulations is a prerequisite towards precision medicine. To validate the novel method against real biological data, it is applied to gene expressions of over 1000 tumor samples from Diffuse Large B-Cell Lymphoma (DLBCL) patients (Chapter III). Two already known subtypes of this disease (cf. I.1.2.1) with significantly different survival following the same chemotherapy were originally also discovered as a gene expression effect. These subtypes can only be precisely determined by this effect on molecular level. Such previous results offer a possibility for method validation and indeed, this effect has been unsupervisedly rediscovered (III.3.2.2). Several additional biologically relevant effects have been discovered and validated across four patient cohorts. Multivariate analyses (III.2) identify combinations of validated effects that can predict significant differences in patient survival. One novel effect possesses an even higher predictive value (cf. III.2.5.1) than the rediscovered subtype effect and is genetically more specific (cf. III.3.3.1). A trained and validated Cox survival model (III.2.5) can predict significant survival differences within known DLBCL subtypes (III.2.5.6), demonstrating that they are genetically heterogeneous as well. Detailed biostatistical evaluations of all survival effects (III.3.3) may help to clarify the molecular pathogenesis of DLBCL. Furthermore, the applicability of signal dissection is not limited to biological data. For instance, dissecting spectral energy distributions of stars observed in astrophysics might be useful to discover laws of light emission
    corecore