83 research outputs found
Finding Exogenous Variables in Data with Many More Variables than Observations
Many statistical methods have been proposed to estimate causal models in
classical situations with fewer variables than observations (p<n, p: the number
of variables and n: the number of observations). However, modern datasets
including gene expression data need high-dimensional causal modeling in
challenging situations with orders of magnitude more variables than
observations (p>>n). In this paper, we propose a method to find exogenous
variables in a linear non-Gaussian causal model, which requires much smaller
sample sizes than conventional methods and works even when p>>n. The key idea
is to identify which variables are exogenous based on non-Gaussianity instead
of estimating the entire structure of the model. Exogenous variables work as
triggers that activate a causal chain in the model, and their identification
leads to more efficient experimental designs and better understanding of the
causal mechanism. We present experiments with artificial data and real-world
gene expression data to evaluate the method.Comment: A revised version of this was published in Proc. ICANN201
Chemogenetic fingerprinting by analysis of cellular growth dynamics
<p>Abstract</p> <p>Background</p> <p>A fundamental goal in chemical biology is the elucidation of on- and off-target effects of drugs and biocides. To this aim chemogenetic screens that quantify drug induced changes in cellular fitness, typically taken as changes in composite growth, is commonly applied.</p> <p>Results</p> <p>Using the model organism <it>Saccharomyces cerevisiae </it>we here report that resolving cellular growth dynamics into its individual components, growth lag, growth rate and growth efficiency, increases the predictive power of chemogenetic screens. Both in terms of drug-drug and gene-drug interactions did the individual growth variables capture distinct and only partially overlapping aspects of cell physiology. In fact, the impact on cellular growth dynamics represented functionally distinct chemical fingerprints.</p> <p>Discussion</p> <p>Our findings suggest that the resolution and quantification of all facets of growth increases the informational and interpretational output of chemogenetic screening. Hence, by facilitating a physiologically more complete analysis of gene-drug and drug-drug interactions the here reported results may simplify the assignment of mode-of-action to orphan bioactive compounds.</p
Validation of Plasmodium falciparum deoxyhypusine synthase as an antimalarial target
Background Hypusination is an essential post-translational modification in eukaryotes. The two enzymes required for this modification, namely deoxyhypusine synthase (DHS) and deoxyhypusine hydrolase are also conserved. Plasmodium falciparum human malaria parasites possess genes for both hypusination enzymes, which are hypothesized to be targets of antimalarial drugs. Methods Transgenic P. falciparum parasites with modification of the PF3D7_1412600 gene encoding PfDHS enzyme were created by insertion of the glmS riboswitch or the M9 inactive variant. The PfDHS protein was studied in transgenic parasites by confocal microscopy and Western immunoblotting. The biochemical function of PfDHS enzyme in parasites was assessed by hypusination and nascent protein synthesis assays. Gene essentiality was assessed by competitive growth assays and chemogenomic profiling. Results Clonal transgenic parasites with integration of glmS riboswitch downstream of the PfDHS gene were established. PfDHS protein was present in the cytoplasm of transgenic parasites in asexual stages. The PfDHS protein could be attenuated fivefold in transgenic parasites with an active riboswitch, whereas PfDHS protein expression was unaffected in control transgenic parasites with insertion of the riboswitch-inactive sequence. Attenuation of PfDHS expression for 72 h led to a significant reduction of hypusinated protein; however, global protein synthesis was unaffected. Parasites with attenuated PfDHS expression showed a significant growth defect, although their decline was not as rapid as parasites with attenuated dihydrofolate reductase-thymidylate synthase (PfDHFR-TS) expression. PfDHS-attenuated parasites showed increased sensitivity to N1-guanyl-1,7-diaminoheptane, a structural analog of spermidine, and a known inhibitor of DHS enzymes. Discussion Loss of PfDHS function leads to reduced hypusination, which may be important for synthesis of some essential proteins. The growth defect in parasites with attenuated Pf DHS expression suggests that this gene is essential. However, the slower decline of PfDHS mutants compared with PfDHFR-TS mutants in competitive growth assays suggests that PfDHS is less vulnerable as an antimalarial target. Nevertheless, the data validate PfDHS as an antimalarial target which can be inhibited by spermidine-like compounds
Perturbation Detection Through Modeling of Gene Expression on a Latent Biological Pathway Network: A Bayesian hierarchical approach
Cellular response to a perturbation is the result of a dynamic system of
biological variables linked in a complex network. A major challenge in drug and
disease studies is identifying the key factors of a biological network that are
essential in determining the cell's fate.
Here our goal is the identification of perturbed pathways from
high-throughput gene expression data. We develop a three-level hierarchical
model, where (i) the first level captures the relationship between gene
expression and biological pathways using confirmatory factor analysis, (ii) the
second level models the behavior within an underlying network of pathways
induced by an unknown perturbation using a conditional autoregressive model,
and (iii) the third level is a spike-and-slab prior on the perturbations. We
then identify perturbations through posterior-based variable selection.
We illustrate our approach using gene transcription drug perturbation
profiles from the DREAM7 drug sensitivity predication challenge data set. Our
proposed method identified regulatory pathways that are known to play a
causative role and that were not readily resolved using gene set enrichment
analysis or exploratory factor models. Simulation results are presented
assessing the performance of this model relative to a network-free variant and
its robustness to inaccuracies in biological databases
Inferring functional modules of protein families with probabilistic topic models
<p>Abstract</p> <p>Background</p> <p>Genome and metagenome studies have identified thousands of protein families whose functions are poorly understood and for which techniques for functional characterization provide only partial information. For such proteins, the genome context can give further information about their functional context.</p> <p>Results</p> <p>We describe a Bayesian method, based on a probabilistic topic model, which directly identifies functional modules of protein families. The method explores the co-occurrence patterns of protein families across a collection of sequence samples to infer a probabilistic model of arbitrarily-sized functional modules.</p> <p>Conclusions</p> <p>We show that our method identifies protein modules - some of which correspond to well-known biological processes - that are tightly interconnected with known functional interactions and are different from the interactions identified by pairwise co-occurrence. The modules are not specific to any given organism and may combine different realizations of a protein complex or pathway within different taxa.</p
Mining for genotype-phenotype relations in Saccharomyces using partial least squares
<p>Abstract</p> <p>Background</p> <p>Multivariate approaches are important due to their versatility and applications in many fields as it provides decisive advantages over univariate analysis in many ways. Genome wide association studies are rapidly emerging, but approaches in hand pay less attention to multivariate relation between genotype and phenotype. We introduce a methodology based on a BLAST approach for extracting information from genomic sequences and Soft- Thresholding Partial Least Squares (ST-PLS) for mapping genotype-phenotype relations.</p> <p>Results</p> <p>Applying this methodology to an extensive data set for the model yeast <it>Saccharomyces cerevisiae</it>, we found that the relationship between genotype-phenotype involves surprisingly few genes in the sense that an overwhelmingly large fraction of the phenotypic variation can be explained by variation in less than 1% of the full gene reference set containing 5791 genes. These phenotype influencing genes were evolving 20% faster than non-influential genes and were unevenly distributed over cellular functions, with strong enrichments in functions such as cellular respiration and transposition. These genes were also enriched with known paralogs, stop codon variations and copy number variations, suggesting that such molecular adjustments have had a disproportionate influence on <it>Saccharomyces </it>yeasts recent adaptation to environmental changes in its ecological niche.</p> <p>Conclusions</p> <p>BLAST and PLS based multivariate approach derived results that adhere to the known yeast phylogeny and gene ontology and thus verify that the methodology extracts a set of fast evolving genes that capture the phylogeny of the yeast strains. The approach is worth pursuing, and future investigations should be made to improve the computations of genotype signals as well as variable selection procedure within the PLS framework.</p
Recommended from our members
Inference of gene regulatory networks from genome-wide knockout fitness data
Motivation: Genome-wide fitness is an emerging type of high-throughput biological data generated for individual organisms by creating libraries of knockouts, subjecting them to broad ranges of environmental conditions, and measuring the resulting clone-specific fitnesses. Since fitness is an organism-scale measure of gene regulatory network behaviour, it may offer certain advantages when insights into such phenotypical and functional features are of primary interest over individual gene expression. Previous works have shown that genome-wide fitness data can be used to uncover novel gene regulatory interactions, when compared with results of more conventional gene expression analysis. Yet, to date, few algorithms have been proposed for systematically using genome-wide mutant fitness data for gene regulatory network inference. Results: In this article, we describe a model and propose an inference algorithm for using fitness data from knockout libraries to identify underlying gene regulatory networks. Unlike most prior methods, the presented approach captures not only structural, but also dynamical and non-linear nature of biomolecular systems involved. A state–space model with non-linear basis is used for dynamically describing gene regulatory networks. Network structure is then elucidated by estimating unknown model parameters. Unscented Kalman filter is used to cope with the non-linearities introduced in the model, which also enables the algorithm to run in on-line mode for practical use. Here, we demonstrate that the algorithm provides satisfying results for both synthetic data as well as empirical measurements of GAL network in yeast Saccharomyces cerevisiae and TyrR–LiuR network in bacteria Shewanella oneidensis
- …