83 research outputs found

    Finding Exogenous Variables in Data with Many More Variables than Observations

    Full text link
    Many statistical methods have been proposed to estimate causal models in classical situations with fewer variables than observations (p<n, p: the number of variables and n: the number of observations). However, modern datasets including gene expression data need high-dimensional causal modeling in challenging situations with orders of magnitude more variables than observations (p>>n). In this paper, we propose a method to find exogenous variables in a linear non-Gaussian causal model, which requires much smaller sample sizes than conventional methods and works even when p>>n. The key idea is to identify which variables are exogenous based on non-Gaussianity instead of estimating the entire structure of the model. Exogenous variables work as triggers that activate a causal chain in the model, and their identification leads to more efficient experimental designs and better understanding of the causal mechanism. We present experiments with artificial data and real-world gene expression data to evaluate the method.Comment: A revised version of this was published in Proc. ICANN201

    Chemogenetic fingerprinting by analysis of cellular growth dynamics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A fundamental goal in chemical biology is the elucidation of on- and off-target effects of drugs and biocides. To this aim chemogenetic screens that quantify drug induced changes in cellular fitness, typically taken as changes in composite growth, is commonly applied.</p> <p>Results</p> <p>Using the model organism <it>Saccharomyces cerevisiae </it>we here report that resolving cellular growth dynamics into its individual components, growth lag, growth rate and growth efficiency, increases the predictive power of chemogenetic screens. Both in terms of drug-drug and gene-drug interactions did the individual growth variables capture distinct and only partially overlapping aspects of cell physiology. In fact, the impact on cellular growth dynamics represented functionally distinct chemical fingerprints.</p> <p>Discussion</p> <p>Our findings suggest that the resolution and quantification of all facets of growth increases the informational and interpretational output of chemogenetic screening. Hence, by facilitating a physiologically more complete analysis of gene-drug and drug-drug interactions the here reported results may simplify the assignment of mode-of-action to orphan bioactive compounds.</p

    Validation of Plasmodium falciparum deoxyhypusine synthase as an antimalarial target

    Get PDF
    Background Hypusination is an essential post-translational modification in eukaryotes. The two enzymes required for this modification, namely deoxyhypusine synthase (DHS) and deoxyhypusine hydrolase are also conserved. Plasmodium falciparum human malaria parasites possess genes for both hypusination enzymes, which are hypothesized to be targets of antimalarial drugs. Methods Transgenic P. falciparum parasites with modification of the PF3D7_1412600 gene encoding PfDHS enzyme were created by insertion of the glmS riboswitch or the M9 inactive variant. The PfDHS protein was studied in transgenic parasites by confocal microscopy and Western immunoblotting. The biochemical function of PfDHS enzyme in parasites was assessed by hypusination and nascent protein synthesis assays. Gene essentiality was assessed by competitive growth assays and chemogenomic profiling. Results Clonal transgenic parasites with integration of glmS riboswitch downstream of the PfDHS gene were established. PfDHS protein was present in the cytoplasm of transgenic parasites in asexual stages. The PfDHS protein could be attenuated fivefold in transgenic parasites with an active riboswitch, whereas PfDHS protein expression was unaffected in control transgenic parasites with insertion of the riboswitch-inactive sequence. Attenuation of PfDHS expression for 72 h led to a significant reduction of hypusinated protein; however, global protein synthesis was unaffected. Parasites with attenuated PfDHS expression showed a significant growth defect, although their decline was not as rapid as parasites with attenuated dihydrofolate reductase-thymidylate synthase (PfDHFR-TS) expression. PfDHS-attenuated parasites showed increased sensitivity to N1-guanyl-1,7-diaminoheptane, a structural analog of spermidine, and a known inhibitor of DHS enzymes. Discussion Loss of PfDHS function leads to reduced hypusination, which may be important for synthesis of some essential proteins. The growth defect in parasites with attenuated Pf DHS expression suggests that this gene is essential. However, the slower decline of PfDHS mutants compared with PfDHFR-TS mutants in competitive growth assays suggests that PfDHS is less vulnerable as an antimalarial target. Nevertheless, the data validate PfDHS as an antimalarial target which can be inhibited by spermidine-like compounds

    Perturbation Detection Through Modeling of Gene Expression on a Latent Biological Pathway Network: A Bayesian hierarchical approach

    Full text link
    Cellular response to a perturbation is the result of a dynamic system of biological variables linked in a complex network. A major challenge in drug and disease studies is identifying the key factors of a biological network that are essential in determining the cell's fate. Here our goal is the identification of perturbed pathways from high-throughput gene expression data. We develop a three-level hierarchical model, where (i) the first level captures the relationship between gene expression and biological pathways using confirmatory factor analysis, (ii) the second level models the behavior within an underlying network of pathways induced by an unknown perturbation using a conditional autoregressive model, and (iii) the third level is a spike-and-slab prior on the perturbations. We then identify perturbations through posterior-based variable selection. We illustrate our approach using gene transcription drug perturbation profiles from the DREAM7 drug sensitivity predication challenge data set. Our proposed method identified regulatory pathways that are known to play a causative role and that were not readily resolved using gene set enrichment analysis or exploratory factor models. Simulation results are presented assessing the performance of this model relative to a network-free variant and its robustness to inaccuracies in biological databases

    Inferring functional modules of protein families with probabilistic topic models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context.</p> <p>Results</p> <p>We describe a Bayesian method, based on a probabilistic topic model, which directly identifies functional modules of protein families. The method explores the co-occurrence patterns of protein families across a collection of sequence samples to infer a probabilistic model of arbitrarily-sized functional modules.</p> <p>Conclusions</p> <p>We show that our method identifies protein modules - some of which correspond to well-known biological processes - that are tightly interconnected with known functional interactions and are different from the interactions identified by pairwise co-occurrence. The modules are not specific to any given organism and may combine different realizations of a protein complex or pathway within different taxa.</p

    Mining for genotype-phenotype relations in Saccharomyces using partial least squares

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Multivariate approaches are important due to their versatility and applications in many fields as it provides decisive advantages over univariate analysis in many ways. Genome wide association studies are rapidly emerging, but approaches in hand pay less attention to multivariate relation between genotype and phenotype. We introduce a methodology based on a BLAST approach for extracting information from genomic sequences and Soft- Thresholding Partial Least Squares (ST-PLS) for mapping genotype-phenotype relations.</p> <p>Results</p> <p>Applying this methodology to an extensive data set for the model yeast <it>Saccharomyces cerevisiae</it>, we found that the relationship between genotype-phenotype involves surprisingly few genes in the sense that an overwhelmingly large fraction of the phenotypic variation can be explained by variation in less than 1% of the full gene reference set containing 5791 genes. These phenotype influencing genes were evolving 20% faster than non-influential genes and were unevenly distributed over cellular functions, with strong enrichments in functions such as cellular respiration and transposition. These genes were also enriched with known paralogs, stop codon variations and copy number variations, suggesting that such molecular adjustments have had a disproportionate influence on <it>Saccharomyces </it>yeasts recent adaptation to environmental changes in its ecological niche.</p> <p>Conclusions</p> <p>BLAST and PLS based multivariate approach derived results that adhere to the known yeast phylogeny and gene ontology and thus verify that the methodology extracts a set of fast evolving genes that capture the phylogeny of the yeast strains. The approach is worth pursuing, and future investigations should be made to improve the computations of genotype signals as well as variable selection procedure within the PLS framework.</p
    corecore