5,362 research outputs found

    A novel approach to detect differentially expressed genes from count-based digital databases by normalizing with housekeeping genes

    Get PDF
    AbstractSequence tag count-based gene expression analysis is potent for the identification of candidate genes relevant to the cancerous phenotype. With the public availability of count-based data, the computational approaches for differentially expressed genes, which are mainly based on Binomial or beta-Binomial distribution, become practical and important in cancer biology. It remains a permanent need to select a proper statistical model for these methods. In this study, we developed a novel Bayesian algorithm-based method, Electronic Differential Gene Expression Screener (EDGES), in which a statistical model was determined by geometric averaging of 12 common housekeeping genes. EDGES identified a set of differentially expressed genes in lung, breast and colorectal cancers by using publically available Serial Analysis of Gene Expression (SAGE) and Expressed Sequence Tag (EST data). Gene expression microarray analysis and quantitative reverse transcription real-time PCR demonstrated the effectiveness of this procedure. We conclude that current normalization of calibrators provides a new insight into count-based digital subtraction in cancer research

    GaGa: A parsimonious and flexible model for differential expression analysis

    Full text link
    Hierarchical models are a powerful tool for high-throughput data with a small to moderate number of replicates, as they allow sharing information across units of information, for example, genes. We propose two such models and show its increased sensitivity in microarray differential expression applications. We build on the gamma--gamma hierarchical model introduced by Kendziorski et al. [Statist. Med. 22 (2003) 3899--3914] and Newton et al. [Biostatistics 5 (2004) 155--176], by addressing important limitations that may have hampered its performance and its more widespread use. The models parsimoniously describe the expression of thousands of genes with a small number of hyper-parameters. This makes them easy to interpret and analytically tractable. The first model is a simple extension that improves the fit substantially with almost no increase in complexity. We propose a second extension that uses a mixture of gamma distributions to further improve the fit, at the expense of increased computational burden. We derive several approximations that significantly reduce the computational cost. We find that our models outperform the original formulation of the model, as well as some other popular methods for differential expression analysis. The improved performance is specially noticeable for the small sample sizes commonly encountered in high-throughput experiments. Our methods are implemented in the freely available Bioconductor gaga package.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS244 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Differential gene expression graphs: A data structure for classification in DNA microarrays

    Get PDF
    This paper proposes an innovative data structure to be used as a backbone in designing microarray phenotype sample classifiers. The data structure is based on graphs and it is built from a differential analysis of the expression levels of healthy and diseased tissue samples in a microarray dataset. The proposed data structure is built in such a way that, by construction, it shows a number of properties that are perfectly suited to address several problems like feature extraction, clustering, and classificatio

    Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics.

    Get PDF
    BackgroundSingle-cell transcriptomics allows researchers to investigate complex communities of heterogeneous cells. It can be applied to stem cells and their descendants in order to chart the progression from multipotent progenitors to fully differentiated cells. While a variety of statistical and computational methods have been proposed for inferring cell lineages, the problem of accurately characterizing multiple branching lineages remains difficult to solve.ResultsWe introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data. In previously published datasets, Slingshot correctly identifies the biological signal for one to three branching trajectories. Additionally, our simulation study shows that Slingshot infers more accurate pseudotimes than other leading methods.ConclusionsSlingshot is a uniquely robust and flexible tool which combines the highly stable techniques necessary for noisy single-cell data with the ability to identify multiple trajectories. Accurate lineage inference is a critical step in the identification of dynamic temporal gene expression
    • ā€¦
    corecore