736 research outputs found
Algebraic Comparison of Partial Lists in Bioinformatics
The outcome of a functional genomics pipeline is usually a partial list of
genomic features, ranked by their relevance in modelling biological phenotype
in terms of a classification or regression model. Due to resampling protocols
or just within a meta-analysis comparison, instead of one list it is often the
case that sets of alternative feature lists (possibly of different lengths) are
obtained. Here we introduce a method, based on the algebraic theory of
symmetric groups, for studying the variability between lists ("list stability")
in the case of lists of unequal length. We provide algorithms evaluating
stability for lists embedded in the full feature set or just limited to the
features occurring in the partial lists. The method is demonstrated first on
synthetic data in a gene filtering task and then for finding gene profiles on a
recent prostate cancer dataset
Recommended from our members
Combined transcriptomic-(1)H NMR metabonomic study reveals yhat monoethylhexyl phthalate stimulates adipogenesis and glyceroneogenesis in human adipocytes
Adipose tissue is a major storage site for lipophilic environmental contaminants. The environmental metabolic disruptor hypothesis postulates that some pollutants can promote obesity or metabolic disorders by activating nuclear receptors involved in the control of energetic homeostasis. In this context, monoethylhexyl phthalate (MEHP) is of particular concern since it was shown to activate the peroxisome proliferator-activated receptor γ (PPARγ) in 3T3-L1 murine preadipocytes. In the present work, we used an untargeted, combined transcriptomic-(1)H NMR-based metabonomic approach to describe the overall effect of MEHP on primary cultures of human subcutaneous adipocytes differentiated in vitro. MEHP stimulated rapidly and selectively the expression of genes involved in glyceroneogenesis, enhanced the expression of the cytosolic phosphoenolpyruvate carboxykinase, and reduced fatty acid release. These results demonstrate that MEHP increased glyceroneogenesis and fatty acid reesterification in human adipocytes. A longer treatment with MEHP induced the expression of genes involved in triglycerides uptake, synthesis, and storage; decreased intracellular lactate, glutamine, and other amino acids; increased aspartate and NAD, and resulted in a global increase in triglycerides. Altogether, these results indicate that MEHP promoted the differentiation of human preadipocytes to adipocytes. These mechanisms might contribute to the suspected obesogenic effect of MEHP
Assessing and selecting gene expression signals based upon the quality of the measured dynamics
<p>Abstract</p> <p>Background</p> <p>One of the challenges with modeling the temporal progression of biological signals is dealing with the effect of noise and the limited number of replicates at each time point. Given the rising interest in utilizing predictive mathematical models to describe the biological response of an organism or analysis such as clustering and gene ontology enrichment, it is important to determine whether the dynamic progression of the data has been accurately captured despite the limited number of replicates, such that one can have confidence that the results of the analysis are capturing important salient dynamic features.</p> <p>Results</p> <p>By pre-selecting genes based upon quality before the identification of differential expression via algorithm such as EDGE, it was found that the percentage of statistically enriched ontologies (p < .05) was improved. Furthermore, it was found that a majority of the genes found via the proposed technique were also selected via an EDGE selection though the reverse was not necessarily true. It was also found that improvements offered by the proposed algorithm are anti-correlated with improvements in the various microarray platforms and the number of replicates. This is illustrated by the fact that newer arrays and experiments with more replicates show less improvement when the filtering for quality is first run before the selection of differentially expressed genes. This suggests that the increase in the number of replicates as well as improvements in array technologies are increase the confidence one has in the dynamics obtained from the experiment.</p> <p>Conclusion</p> <p>We have developed an algorithm that quantifies the quality of temporal biological signal rather than whether the signal illustrates a significant change over the experimental time course. Because the use of these temporal signals, whether it is in mathematical modeling or clustering, focuses upon the entire time series, it is necessary to develop a method to quantify and select for signals which conform to this ideal. By doing this, we have demonstrated a marked and consistent improvement in the results of a clustering exercise over multiple experiments, microarray platforms, and experimental designs.</p
Strain-dependent host transcriptional responses to toxoplasma infection are largely conserved in mammalian and avian hosts
Toxoplasma gondii has a remarkable ability to infect an enormous variety of mammalian and avian species. Given this, it is surprising that three strains (Types I/II/III) account for the majority of isolates from Europe/North America. The selective pressures that have driven the emergence of these particular strains, however, remain enigmatic. We hypothesized that strain selection might be partially driven by adaptation of strains for mammalian versus avian hosts. To test this, we examine in vitro, strain-dependent host responses in fibroblasts of a representative avian host, the chicken (Gallus gallus). Using gene expression profiling of infected chicken embryonic fibroblasts and pathway analysis to assess host response, we show here that chicken cells respond with distinct transcriptional profiles upon infection with Type II versus III strains that are reminiscent of profiles observed in mammalian cells. To identify the parasite drivers of these differences, chicken fibroblasts were infected with individual F1 progeny of a Type II x III cross and host gene expression was assessed for each by microarray. QTL mapping of transcriptional differences suggested, and deletion strains confirmed, that, as in mammalian cells, the polymorphic rhoptry kinase ROP16 is the major driver of strain-specific responses. We originally hypothesized that comparing avian versus mammalian host response might reveal an inversion in parasite strain-dependent phenotypes; specifically, for polymorphic effectors like ROP16, we hypothesized that the allele with most activity in mammalian cells might be less active in avian cells. Instead, we found that activity of ROP16 alleles appears to be conserved across host species; moreover, additional parasite loci that were previously mapped for strain-specific effects on mammalian response showed similar strain-specific effects in chicken cells. These results indicate that if different hosts select for different parasite genotypes, the selection operates downstream of the signaling occurring during the beginning of the host's immune response. © 2011 Ong et al
Combinatorial CRISPR-Cas9 screens for de novo mapping of genetic interactions.
We developed a systematic approach to map human genetic networks by combinatorial CRISPR-Cas9 perturbations coupled to robust analysis of growth kinetics. We targeted all pairs of 73 cancer genes with dual guide RNAs in three cell lines, comprising 141,912 tests of interaction. Numerous therapeutically relevant interactions were identified, and these patterns replicated with combinatorial drugs at 75% precision. From these results, we anticipate that cellular context will be critical to synthetic-lethal therapies
A full Bayesian hierarchical mixture model for the variance of gene differential expression
<p>Abstract</p> <p>Background</p> <p>In many laboratory-based high throughput microarray experiments, there are very few replicates of gene expression levels. Thus, estimates of gene variances are inaccurate. Visual inspection of graphical summaries of these data usually reveals that heteroscedasticity is present, and the standard approach to address this is to take a log<sub>2 </sub>transformation. In such circumstances, it is then common to assume that gene variability is constant when an analysis of these data is undertaken. However, this is perhaps too stringent an assumption. More careful inspection reveals that the simple log<sub>2 </sub>transformation does not remove the problem of heteroscedasticity. An alternative strategy is to assume independent gene-specific variances; although again this is problematic as variance estimates based on few replications are highly unstable. More meaningful and reliable comparisons of gene expression might be achieved, for different conditions or different tissue samples, where the test statistics are based on accurate estimates of gene variability; a crucial step in the identification of differentially expressed genes.</p> <p>Results</p> <p>We propose a Bayesian mixture model, which classifies genes according to similarity in their variance. The result is that genes in the same latent class share the similar variance, estimated from a larger number of replicates than purely those per gene, i.e. the total of all replicates of all genes in the same latent class. An example dataset, consisting of 9216 genes with four replicates per condition, resulted in four latent classes based on their similarity of the variance.</p> <p>Conclusion</p> <p>The mixture variance model provides a realistic and flexible estimate for the variance of gene expression data under limited replicates. We believe that in using the latent class variances, estimated from a larger number of genes in each derived latent group, the <it>p</it>-values obtained are more robust than either using a constant gene or gene-specific variance estimate.</p
Probe-level linear model fitting and mixture modeling results in high accuracy detection of differential gene expression
BACKGROUND: The identification of differentially expressed genes (DEGs) from Affymetrix GeneChips arrays is currently done by first computing expression levels from the low-level probe intensities, then deriving significance by comparing these expression levels between conditions. The proposed PL-LM (Probe-Level Linear Model) method implements a linear model applied on the probe-level data to directly estimate the treatment effect. A finite mixture of Gaussian components is then used to identify DEGs using the coefficients estimated by the linear model. This approach can readily be applied to experimental design with or without replication. RESULTS: On a wholly defined dataset, the PL-LM method was able to identify 75% of the differentially expressed genes within 10% of false positives. This accuracy was achieved both using the three replicates per conditions available in the dataset and using only one replicate per condition. CONCLUSION: The method achieves, on this dataset, a higher accuracy than the best set of tools identified by the authors of the dataset, and does so using only one replicate per condition
A comprehensive re-analysis of the Golden Spike data: Towards a benchmark for differential expression methods
<p>Abstract</p> <p>Background</p> <p>The Golden Spike data set has been used to validate a number of methods for summarizing Affymetrix data sets, sometimes with seemingly contradictory results. Much less use has been made of this data set to evaluate differential expression methods. It has been suggested that this data set should not be used for method comparison due to a number of inherent flaws.</p> <p>Results</p> <p>We have used this data set in a comparison of methods which is far more extensive than any previous study. We outline six stages in the analysis pipeline where decisions need to be made, and show how the results of these decisions can lead to the apparently contradictory results previously found. We also show that, while flawed, this data set is still a useful tool for method comparison, particularly for identifying combinations of summarization and differential expression methods that are unlikely to perform well on real data sets. We describe a new benchmark, AffyDEComp, that can be used for such a comparison.</p> <p>Conclusion</p> <p>We conclude with recommendations for preferred Affymetrix analysis tools, and for the development of future spike-in data sets.</p
Role of Esrrg in the Fibrate-Mediated Regulation of Lipid Metabolism Genes in Human ApoA-I Transgenic Mice
We have used a new ApoA-I transgenic mouse model to identify by global gene expression profiling, candidate genes that affect lipid and lipoprotein metabolism in response to fenofibrate treatment. Multilevel bioinformatical analysis and stringent selection criteria (2-fold change, 0% false discovery rate) identified 267 significantly changed genes involved in several molecular pathways. The fenofibrate-treated group did not have significantly altered levels of hepatic human APOA-I mRNA and plasma ApoA-I compared with the control group. However, the treatment increased cholesterol levels to 1.95-fold mainly due to the increase in high-density lipoprotein (HDL) cholesterol. The observed changes in HDL are associated with the upregulation of genes involved in phospholipid biosynthesis and lipid hydrolysis, as well as phospholipid transfer protein. Significant upregulation was observed in genes involved in fatty acid transport and β-oxidation, but not in those of fatty acid and cholesterol biosynthesis, Krebs cycle and gluconeogenesis. Fenofibrate changed significantly the expression of seven transcription factors. The estrogen receptor-related gamma gene was upregulated 2.36-fold and had a significant positive correlation with genes of lipid and lipoprotein metabolism and mitochondrial functions, indicating an important role of this orphan receptor in mediating the fenofibrate-induced activation of a specific subset of its target genes.National Institutes of Health (HL48739 and HL68216); European Union (LSHM-CT-2006-0376331, LSHG-CT-2006-037277); the Biomedical Research Foundation of the Academy of Athens; the Hellenic Cardiological Society; the John F Kostopoulos Foundatio
Semi-supervised discovery of differential genes
BACKGROUND: Various statistical scores have been proposed for evaluating the significance of genes that may exhibit differential expression between two or more controlled conditions. However, in many clinical studies to detect clinical marker genes for example, the conditions have not necessarily been controlled well, thus condition labels are sometimes hard to obtain due to physical, financial, and time costs. In such a situation, we can consider an unsupervised case where labels are not available or a semi-supervised case where labels are available for a part of the whole sample set, rather than a well-studied supervised case where all samples have their labels. RESULTS: We assume a latent variable model for the expression of active genes and apply the optimal discovery procedure (ODP) proposed by Storey (2005) to the model. Our latent variable model allows gene significance scores to be applied to unsupervised and semi-supervised cases. The ODP framework improves detectability by sharing the estimated parameters of null and alternative models of multiple tests over multiple genes. A theoretical consideration leads to two different interpretations of the latent variable, i.e., it only implicitly affects the alternative model through the model parameters, or it is explicitly included in the alternative model, so that the interpretations correspond to two different implementations of ODP. By comparing the two implementations through experiments with simulation data, we have found that sharing the latent variable estimation is effective for increasing the detectability of truly active genes. We also show that the unsupervised and semi-supervised rating of genes, which takes into account the samples without condition labels, can improve detection of active genes in real gene discovery problems. CONCLUSION: The experimental results indicate that the ODP framework is effective for hypotheses including latent variables and is further improved by sharing the estimations of hidden variables over multiple tests
- …