721 research outputs found

    Laplace Approximated EM Microarray Analysis: An Empirical Bayes Approach for Comparative Microarray Experiments

    Full text link
    A two-groups mixed-effects model for the comparison of (normalized) microarray data from two treatment groups is considered. Most competing parametric methods that have appeared in the literature are obtained as special cases or by minor modification of the proposed model. Approximate maximum likelihood fitting is accomplished via a fast and scalable algorithm, which we call LEMMA (Laplace approximated EM Microarray Analysis). The posterior odds of treatment ×\times gene interactions, derived from the model, involve shrinkage estimates of both the interactions and of the gene specific error variances. Genes are classified as being associated with treatment based on the posterior odds and the local false discovery rate (f.d.r.) with a fixed cutoff. Our model-based approach also allows one to declare the non-null status of a gene by controlling the false discovery rate (FDR). It is shown in a detailed simulation study that the approach outperforms well-known competitors. We also apply the proposed methodology to two previously analyzed microarray examples. Extensions of the proposed method to paired treatments and multiple treatments are also discussed.Comment: Published in at http://dx.doi.org/10.1214/10-STS339 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Strategy for multivariate Identification of diferentially expressed genes in microarray data

    Get PDF
    Abstract. Microarray technology has become one of the most important tools in understanding genetic expression in biological processes. As microarrays contain measurements of thousands of genes' expression levels across multiple conditions, identification of differentially expressed genes will necessarily involve data mining or large scale multiple testing procedures. To the date, advances in this regard have either been multivariate but descriptive, or inferential but univariate. In this work, we present a new multivariate inferential analysis method for detecting differentially expressed genes in microarray data. It estimates the positive false discovery rate (pFDR) using artificial components close to the data's principal components, but with an exact interpretation in terms of differential gene expression. Our method works best under very common assumptions and gives way to a new understanding of genetic differential expression in microarray data. We provide a methodology to analyse time course microarray experiments and some guidelines for assessing whether the required assumptions hold. We illustrate our method on two publicly available microarray data sets.Los microarreglos de ADN se han convertido en una de las herramientas más importantes para entender la expresión génica en procesos biológicos. Como cada microarreglo contiene mediciones del nivel de expressión de miles de genes en múltiples condiciones, la identificación de genes diferencialmente expresados involucra necesariamente minería de datos o pruebas de hipótesis múltiples a gran escala. Hasta hoy, avances en este campo han sido o bien multivariados pero descriptivos, o bien inferenciales pero univariados. En este trabajo, presentamos un nuevo método inferencial y multivariado para identificar genes diferencialmente expresados en microarreglos de ADN. Estimamos la tasa positiva de falsos positivos (pFDR) utilizando componentes artificiales cercanos a los componentes principales de los datos, pero con una interpretación exacta en términos de expresión génica diferencial. Nuestro método funciona mejor bajo algunos supuestos muy comunes y da lugar a un nuevo entendimiento de la expresión diferencial en datos de microarreglos. Planteamos una metodología para analizar microarreglos con múltiples puntos en el tiempo y damos guías heurísticas para determinar si los supuestos necesarios se cumplen en una determinada base de datos. Ilustramos nuestro método con dos bases de datos públicas de microarreglos de ADN.Maestrí

    Modeling and analysis of RNA-seq data: a review from a statistical perspective

    Full text link
    Background: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date. Results: We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations. Conclusion: The development of statistical and computational methods for analyzing RNA- seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development

    Relative Abundance of Transcripts (RATs):Identifying differential isoform abundance from RNA-seq [version 1; referees: 1 approved, 2 approved with reservations]

    Get PDF
    The biological importance of changes in RNA expression is reflected by the wide variety of tools available to characterise these changes from RNA-seq data. Several tools exist for detecting differential transcript isoform usage (DTU) from aligned or assembled RNA-seq data, but few exist for DTU detection from alignment-free RNA-seq quantifications. We present the RATs, an R package that identifies DTU transcriptome-wide directly from transcript abundance estimates. RATs is unique in applying bootstrapping to estimate the reliability of detected DTU events and shows good performance at all replication levels (median false positive fraction < 0.05). We compare RATs to two existing DTU tools, DRIM-Seq & SUPPA2, using two publicly available simulated RNA-seq datasets and a published human RNA-seq dataset, in which 248 genes have been previously identified as displaying significant DTU. RATs with default threshold values on the simulated Human data has a sensitivity of 0.55, a Matthews correlation coefficient of 0.71 and a false discovery rate (FDR) of 0.04, outperforming both other tools. Applying the same thresholds for SUPPA2 results in a higher sensitivity (0.61) but poorer FDR performance (0.33). RATs and DRIM-seq use different methods for measuring DTU effect-sizes complicating the comparison of results between these tools, however, for a likelihood-ratio threshold of 30, DRIM-Seq has similar FDR performance to RATs (0.06), but worse sensitivity (0.47). These differences persist for the simulated drosophila dataset. On the published human RNA-seq dataset the greatest agreement between the tools tested is 53%, observed between RATs and SUPPA2. The bootstrapping quality filter in RATs is responsible for removing the majority of DTU events called by SUPPA2 that are not reported by RATs. All methods, including the previously published qRT-PCR of three of the 248 detected DTU events, were found to be sensitive to annotation differences between Ensembl v60 and v87

    ARSyN: a method for the identification and removal of systematic noise in multifactorial time-course microarray experiments

    Full text link
    Transcriptomic profiling experiments that aim to the identification of responsive genes in specific biological conditions are commonly set up under defined experimental designs that try to assess the effects of factors and their interactions on gene expression. Data from these controlled experiments, however, may also contain sources of unwanted noise that can distort the signal under study, affect the residuals of applied statistical models, and hamper data analysis. Commonly, normalization methods are applied to transcriptomics data to remove technical artifacts, but these are normally based on general assumptions of transcript distribution and greatly ignore both the characteristics of the experiment under consideration and the coordinative nature of gene expression. In this paper, we propose a novel methodology, ARSyN, for the preprocessing of microarray data that takes into account these 2 last aspects. By combining analysis of variance (ANOVA) modeling of gene expression values and multivariate analysis of estimated effects, the method identifies the nonstructured part of the signal associated to the experimental factors (the noise within the signal) and the structured variation of the ANOVA errors (the signal of the noise). By removing these noise fractions from the original data, we create a filtered data set that is rich in the information of interest and includes only the random noise required for inferential analysis. In this work, we focus on multifactorial time course microarray (MTCM) experiments with 2 factors: one quantitative such as time or dosage and the other qualitative, as tissue, strain, or treatment. However, the method can be used in other situations such as experiments with only one factor or more complex designs with more than 2 factors. The filtered data obtained after applying ARSyN can be further analyzed with the appropriate statistical technique to obtain the biological information required. To evaluate the performance of the filtering strategy, we have applied different statistical approaches for MTCM analysis to several real and simulateddata sets, studying also the efficiency of these techniques. By comparing the results obtained with the original and ARSyN filtered data and also with other filtering techniques, we can conclude that the proposed method increases the statistical power to detect biological signals, especially in cases where there are high levels of structural noise. Software for ARSyN is freely available at http://www.ua.es/personal/mj.nuedaSpanish MICINN Project (BIO2008-04368-E and DPI2008-06880-C03-03/DPI).Nueda, MJ.; Ferrer Riquelme, AJ.; Conesa, A. (2011). ARSyN: a method for the identification and removal of systematic noise in multifactorial time-course microarray experiments. Biostatistics. 13(3):553-566. doi:10.1093/biostatistics/kxr042S553566133Al-Shahrour, F., Minguez, P., Tárraga, J., Medina, I., Alloza, E., Montaner, D., & Dopazo, J. (2007). FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments. Nucleic Acids Research, 35(suppl_2), W91-W96. doi:10.1093/nar/gkm260Alter, O., Brown, P. O., & Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences, 97(18), 10101-10106. doi:10.1073/pnas.97.18.10101Benito, M., Parker, J., Du, Q., Wu, J., Xiang, D., Perou, C. M., & Marron, J. S. (2003). Adjustment of systematic microarray data biases. Bioinformatics, 20(1), 105-114. doi:10.1093/bioinformatics/btg385Brumós, J., Colmenero-Flores, J. M., Conesa, A., Izquierdo, P., Sánchez, G., Iglesias, D. J., … Talón, M. (2009). Membrane transporters and carbon metabolism implicated in chloride homeostasis differentiate salt stress responses in tolerant and sensitive Citrus rootstocks. Functional & Integrative Genomics, 9(3), 293-309. doi:10.1007/s10142-008-0107-6Conesa, A., Nueda, M. J., Ferrer, A., & Talon, M. (2006). maSigPro: a method to identify significantly differential expression profiles in time-course microarray experiments. Bioinformatics, 22(9), 1096-1102. doi:10.1093/bioinformatics/btl056Heijne, W. H. ., Stierum, R. H., Slijper, M., van Bladeren, P. J., & van Ommen, B. (2003). Toxicogenomics of bromobenzene hepatotoxicity: a combined transcriptomics and proteomics approach. Biochemical Pharmacology, 65(5), 857-875. doi:10.1016/s0006-2952(02)01613-1Jansen, J. J., Hoefsloot, H. C. J., van der Greef, J., Timmerman, M. E., Westerhuis, J. A., & Smilde, A. K. (2005). ASCA: analysis of multivariate data obtained from an experimental design. Journal of Chemometrics, 19(9), 469-481. doi:10.1002/cem.952Johnson, W. E., Li, C., & Rabinovic, A. (2006). Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 8(1), 118-127. doi:10.1093/biostatistics/kxj037Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., … Irizarry, R. A. (2010). Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11(10), 733-739. doi:10.1038/nrg2825Luo, J., Schumacher, M., Scherer, A., Sanoudou, D., Megherbi, D., Davison, T., … Zhang, J. (2010). A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data. The Pharmacogenomics Journal, 10(4), 278-291. doi:10.1038/tpj.2010.57(2010). The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models. Nature Biotechnology, 28(8), 827-838. doi:10.1038/nbt.1665Morán, J. M., Ortiz-Ortiz, M. A., Ruiz-Mesa, L. M., & Fuentes, J. M. (2010). Nitric oxide in paraquat-mediated toxicity: A review. Journal of Biochemical and Molecular Toxicology, 24(6), 402-409. doi:10.1002/jbt.20348Nueda, M. J., Conesa, A., Westerhuis, J. A., Hoefsloot, H. C. J., Smilde, A. K., Talón, M., & Ferrer, A. (2007). Discovering gene expression patterns in time course microarray experiments by ANOVA–SCA. Bioinformatics, 23(14), 1792-1800. doi:10.1093/bioinformatics/btm251Rensink, W. A., Iobst, S., Hart, A., Stegalkina, S., Liu, J., & Buell, C. R. (2005). Gene expression profiling of potato responses to cold, heat, and salt stress. Functional & Integrative Genomics, 5(4), 201-207. doi:10.1007/s10142-005-0141-6Smilde, A. K., Jansen, J. J., Hoefsloot, H. C. J., Lamers, R.-J. A. N., van der Greef, J., & Timmerman, M. E. (2005). ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data. Bioinformatics, 21(13), 3043-3048. doi:10.1093/bioinformatics/bti476Storey, J. D., Xiao, W., Leek, J. T., Tompkins, R. G., & Davis, R. W. (2005). Significance analysis of time course microarray experiments. Proceedings of the National Academy of Sciences, 102(36), 12837-12842. doi:10.1073/pnas.0504609102Svendsen, C., Owen, J., Kille, P., Wren, J., Jonker, M. J., Headley, B. A., … Spurgeon, D. J. (2008). Comparative Transcriptomic Responses to Chronic Cadmium, Fluoranthene, and Atrazine Exposure in Lumbricus rubellus. Environmental Science & Technology, 42(11), 4208-4214. doi:10.1021/es702745dTai, Y. C., & Speed, T. P. (2006). A multivariate empirical Bayes statistic for replicated microarray time course data. The Annals of Statistics, 34(5), 2387-2412. doi:10.1214/009053606000000759Chuan Tai, Y., & Speed, T. P. (2008). On Gene Ranking Using Replicated Microarray Time Course Data. Biometrics, 65(1), 40-51. doi:10.1111/j.1541-0420.2008.01057.xYang, Y. H. (2002). Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research, 30(4), 15e-15. doi:10.1093/nar/30.4.e1

    Statistical methods for transcriptomics: From microarrays to RNA-seq

    Full text link
    La transcriptómica estudia el nivel de expresión de los genes en distintas condiciones experimentales para tratar de identificar los genes asociados a un fenotipo dado así como las relaciones de regulación entre distintos genes. Los datos ómicos se caracterizan por contener información de miles de variables en una muestra con pocas observaciones. Las tecnologías de alto rendimiento más comunes para medir el nivel de expresión de miles de genes simultáneamente son los microarrays y, más recientemente, la secuenciación de RNA (RNA-seq). Este trabajo de tesis versará sobre la evaluación, adaptación y desarrollo de modelos estadísticos para el análisis de datos de expresión génica, tanto si ha sido estimada mediante microarrays o bien con RNA-seq. El estudio se abordará con herramientas univariantes y multivariantes, así como con métodos tanto univariantes como multivariantes.Tarazona Campos, S. (2014). Statistical methods for transcriptomics: From microarrays to RNA-seq [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/48485TESISPremios Extraordinarios de tesis doctorale
    • …
    corecore