736 research outputs found

    Algebraic Comparison of Partial Lists in Bioinformatics

    Get PDF
    The outcome of a functional genomics pipeline is usually a partial list of genomic features, ranked by their relevance in modelling biological phenotype in terms of a classification or regression model. Due to resampling protocols or just within a meta-analysis comparison, instead of one list it is often the case that sets of alternative feature lists (possibly of different lengths) are obtained. Here we introduce a method, based on the algebraic theory of symmetric groups, for studying the variability between lists ("list stability") in the case of lists of unequal length. We provide algorithms evaluating stability for lists embedded in the full feature set or just limited to the features occurring in the partial lists. The method is demonstrated first on synthetic data in a gene filtering task and then for finding gene profiles on a recent prostate cancer dataset

    Assessing and selecting gene expression signals based upon the quality of the measured dynamics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the challenges with modeling the temporal progression of biological signals is dealing with the effect of noise and the limited number of replicates at each time point. Given the rising interest in utilizing predictive mathematical models to describe the biological response of an organism or analysis such as clustering and gene ontology enrichment, it is important to determine whether the dynamic progression of the data has been accurately captured despite the limited number of replicates, such that one can have confidence that the results of the analysis are capturing important salient dynamic features.</p> <p>Results</p> <p>By pre-selecting genes based upon quality before the identification of differential expression via algorithm such as EDGE, it was found that the percentage of statistically enriched ontologies (p < .05) was improved. Furthermore, it was found that a majority of the genes found via the proposed technique were also selected via an EDGE selection though the reverse was not necessarily true. It was also found that improvements offered by the proposed algorithm are anti-correlated with improvements in the various microarray platforms and the number of replicates. This is illustrated by the fact that newer arrays and experiments with more replicates show less improvement when the filtering for quality is first run before the selection of differentially expressed genes. This suggests that the increase in the number of replicates as well as improvements in array technologies are increase the confidence one has in the dynamics obtained from the experiment.</p> <p>Conclusion</p> <p>We have developed an algorithm that quantifies the quality of temporal biological signal rather than whether the signal illustrates a significant change over the experimental time course. Because the use of these temporal signals, whether it is in mathematical modeling or clustering, focuses upon the entire time series, it is necessary to develop a method to quantify and select for signals which conform to this ideal. By doing this, we have demonstrated a marked and consistent improvement in the results of a clustering exercise over multiple experiments, microarray platforms, and experimental designs.</p

    Strain-dependent host transcriptional responses to toxoplasma infection are largely conserved in mammalian and avian hosts

    Get PDF
    Toxoplasma gondii has a remarkable ability to infect an enormous variety of mammalian and avian species. Given this, it is surprising that three strains (Types I/II/III) account for the majority of isolates from Europe/North America. The selective pressures that have driven the emergence of these particular strains, however, remain enigmatic. We hypothesized that strain selection might be partially driven by adaptation of strains for mammalian versus avian hosts. To test this, we examine in vitro, strain-dependent host responses in fibroblasts of a representative avian host, the chicken (Gallus gallus). Using gene expression profiling of infected chicken embryonic fibroblasts and pathway analysis to assess host response, we show here that chicken cells respond with distinct transcriptional profiles upon infection with Type II versus III strains that are reminiscent of profiles observed in mammalian cells. To identify the parasite drivers of these differences, chicken fibroblasts were infected with individual F1 progeny of a Type II x III cross and host gene expression was assessed for each by microarray. QTL mapping of transcriptional differences suggested, and deletion strains confirmed, that, as in mammalian cells, the polymorphic rhoptry kinase ROP16 is the major driver of strain-specific responses. We originally hypothesized that comparing avian versus mammalian host response might reveal an inversion in parasite strain-dependent phenotypes; specifically, for polymorphic effectors like ROP16, we hypothesized that the allele with most activity in mammalian cells might be less active in avian cells. Instead, we found that activity of ROP16 alleles appears to be conserved across host species; moreover, additional parasite loci that were previously mapped for strain-specific effects on mammalian response showed similar strain-specific effects in chicken cells. These results indicate that if different hosts select for different parasite genotypes, the selection operates downstream of the signaling occurring during the beginning of the host's immune response. © 2011 Ong et al

    Combinatorial CRISPR-Cas9 screens for de novo mapping of genetic interactions.

    Get PDF
    We developed a systematic approach to map human genetic networks by combinatorial CRISPR-Cas9 perturbations coupled to robust analysis of growth kinetics. We targeted all pairs of 73 cancer genes with dual guide RNAs in three cell lines, comprising 141,912 tests of interaction. Numerous therapeutically relevant interactions were identified, and these patterns replicated with combinatorial drugs at 75% precision. From these results, we anticipate that cellular context will be critical to synthetic-lethal therapies

    A full Bayesian hierarchical mixture model for the variance of gene differential expression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In many laboratory-based high throughput microarray experiments, there are very few replicates of gene expression levels. Thus, estimates of gene variances are inaccurate. Visual inspection of graphical summaries of these data usually reveals that heteroscedasticity is present, and the standard approach to address this is to take a log<sub>2 </sub>transformation. In such circumstances, it is then common to assume that gene variability is constant when an analysis of these data is undertaken. However, this is perhaps too stringent an assumption. More careful inspection reveals that the simple log<sub>2 </sub>transformation does not remove the problem of heteroscedasticity. An alternative strategy is to assume independent gene-specific variances; although again this is problematic as variance estimates based on few replications are highly unstable. More meaningful and reliable comparisons of gene expression might be achieved, for different conditions or different tissue samples, where the test statistics are based on accurate estimates of gene variability; a crucial step in the identification of differentially expressed genes.</p> <p>Results</p> <p>We propose a Bayesian mixture model, which classifies genes according to similarity in their variance. The result is that genes in the same latent class share the similar variance, estimated from a larger number of replicates than purely those per gene, i.e. the total of all replicates of all genes in the same latent class. An example dataset, consisting of 9216 genes with four replicates per condition, resulted in four latent classes based on their similarity of the variance.</p> <p>Conclusion</p> <p>The mixture variance model provides a realistic and flexible estimate for the variance of gene expression data under limited replicates. We believe that in using the latent class variances, estimated from a larger number of genes in each derived latent group, the <it>p</it>-values obtained are more robust than either using a constant gene or gene-specific variance estimate.</p

    Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression

    Get PDF
    BACKGROUND: The identification of differentially expressed genes (DEGs) from Affymetrix GeneChips arrays is currently done by first computing expression levels from the low-level probe intensities, then deriving significance by comparing these expression levels between conditions. The proposed PL-LM (Probe-Level Linear Model) method implements a linear model applied on the probe-level data to directly estimate the treatment effect. A finite mixture of Gaussian components is then used to identify DEGs using the coefficients estimated by the linear model. This approach can readily be applied to experimental design with or without replication. RESULTS: On a wholly defined dataset, the PL-LM method was able to identify 75% of the differentially expressed genes within 10% of false positives. This accuracy was achieved both using the three replicates per conditions available in the dataset and using only one replicate per condition. CONCLUSION: The method achieves, on this dataset, a higher accuracy than the best set of tools identified by the authors of the dataset, and does so using only one replicate per condition

    A comprehensive re-analysis of the Golden Spike data: Towards a benchmark for differential expression methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The Golden Spike data set has been used to validate a number of methods for summarizing Affymetrix data sets, sometimes with seemingly contradictory results. Much less use has been made of this data set to evaluate differential expression methods. It has been suggested that this data set should not be used for method comparison due to a number of inherent flaws.</p> <p>Results</p> <p>We have used this data set in a comparison of methods which is far more extensive than any previous study. We outline six stages in the analysis pipeline where decisions need to be made, and show how the results of these decisions can lead to the apparently contradictory results previously found. We also show that, while flawed, this data set is still a useful tool for method comparison, particularly for identifying combinations of summarization and differential expression methods that are unlikely to perform well on real data sets. We describe a new benchmark, AffyDEComp, that can be used for such a comparison.</p> <p>Conclusion</p> <p>We conclude with recommendations for preferred Affymetrix analysis tools, and for the development of future spike-in data sets.</p

    Role of Esrrg in the Fibrate-Mediated Regulation of Lipid Metabolism Genes in Human ApoA-I Transgenic Mice

    Get PDF
    We have used a new ApoA-I transgenic mouse model to identify by global gene expression profiling, candidate genes that affect lipid and lipoprotein metabolism in response to fenofibrate treatment. Multilevel bioinformatical analysis and stringent selection criteria (2-fold change, 0% false discovery rate) identified 267 significantly changed genes involved in several molecular pathways. The fenofibrate-treated group did not have significantly altered levels of hepatic human APOA-I mRNA and plasma ApoA-I compared with the control group. However, the treatment increased cholesterol levels to 1.95-fold mainly due to the increase in high-density lipoprotein (HDL) cholesterol. The observed changes in HDL are associated with the upregulation of genes involved in phospholipid biosynthesis and lipid hydrolysis, as well as phospholipid transfer protein. Significant upregulation was observed in genes involved in fatty acid transport and β-oxidation, but not in those of fatty acid and cholesterol biosynthesis, Krebs cycle and gluconeogenesis. Fenofibrate changed significantly the expression of seven transcription factors. The estrogen receptor-related gamma gene was upregulated 2.36-fold and had a significant positive correlation with genes of lipid and lipoprotein metabolism and mitochondrial functions, indicating an important role of this orphan receptor in mediating the fenofibrate-induced activation of a specific subset of its target genes.National Institutes of Health (HL48739 and HL68216); European Union (LSHM-CT-2006-0376331, LSHG-CT-2006-037277); the Biomedical Research Foundation of the Academy of Athens; the Hellenic Cardiological Society; the John F Kostopoulos Foundatio

    Semi-supervised discovery of differential genes

    Get PDF
    BACKGROUND: Various statistical scores have been proposed for evaluating the significance of genes that may exhibit differential expression between two or more controlled conditions. However, in many clinical studies to detect clinical marker genes for example, the conditions have not necessarily been controlled well, thus condition labels are sometimes hard to obtain due to physical, financial, and time costs. In such a situation, we can consider an unsupervised case where labels are not available or a semi-supervised case where labels are available for a part of the whole sample set, rather than a well-studied supervised case where all samples have their labels. RESULTS: We assume a latent variable model for the expression of active genes and apply the optimal discovery procedure (ODP) proposed by Storey (2005) to the model. Our latent variable model allows gene significance scores to be applied to unsupervised and semi-supervised cases. The ODP framework improves detectability by sharing the estimated parameters of null and alternative models of multiple tests over multiple genes. A theoretical consideration leads to two different interpretations of the latent variable, i.e., it only implicitly affects the alternative model through the model parameters, or it is explicitly included in the alternative model, so that the interpretations correspond to two different implementations of ODP. By comparing the two implementations through experiments with simulation data, we have found that sharing the latent variable estimation is effective for increasing the detectability of truly active genes. We also show that the unsupervised and semi-supervised rating of genes, which takes into account the samples without condition labels, can improve detection of active genes in real gene discovery problems. CONCLUSION: The experimental results indicate that the ODP framework is effective for hypotheses including latent variables and is further improved by sharing the estimations of hidden variables over multiple tests
    corecore