15 research outputs found

    MultiBaC: an R package to remove batch effects in multi-omic experiments

    Get PDF
    Motivation: Batch effects in omics datasets are usually a source of technical noise that masks the biological signal and hampers data analysis. Batch effect removal has been widely addressed for individual omics technologies. However, multi-omic datasets may combine data obtained in different batches where omics type and batch are often confounded. Moreover, systematic biases may be introduced without notice during data acquisition, which creates a hidden batch effect. Current methods fail to address batch effect correction in these cases. Results: In this article, we introduce the MultiBaC R package, a tool for batch effect removal in multi-omics and hidden batch effect scenarios. The package includes a diversity of graphical outputs for model validation and assessment of the batch effect correction. Availability and implementation: MultiBaC package is available on Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/MultiBaC.html) and GitHub (https://github.com/ConesaLab/MultiBaC.git). The data underlying this article are available in Gene Expression Omnibus repository (accession numbers GSE11521, GSE1002, GSE56622 and GSE43747).This work was funded by the Generalitat Valenciana through PROMETEO grants program for excellence research groups [PROMETEO 2016/093] and by the Spanish MICINN [PID2020-119537RB-I00]. Funding for open access charge: Universitat Politècnica de València

    MIA and NIR Chemical Imaging for pharmaceutical product characterization

    Full text link
    [EN] This paper presents a three step methodology based on the use of chemical oriented models (MCR and CLS) for extracting out the chemical distribution maps (CDMs) from hyperspectral images, afterwards performing multivariate image analysis (MIA) on the CDMs, and !nally extracting 'channel' and textural features from the score images related to quality characteristics These features show complementary properties to those directly obtained from the CDMs, since they take advantage of their internal correlation structure. The approach has been successfully applied to the evaluation of homogeneity and cluster presence of API in a novel formulation developed to improve the dissolution of poorly soluble drugs. © 2012 Elsevier B.V. All rights reserved.Research in this study was partially supported by the Spanish Ministry of Science and Innovation and FEDER funds from the European Union through grant DPI2011-28112-C04-02, and also by NSF-Engineering Research Center for Structured Organic Particulate Systems (ERC-SOPS, EEC-0540855) and the program NSF-Major Research Instrumentation grant 0821113.Prats-Montalbán, JM.; Jerez-Rozo, J.; Romanach, R.; Ferrer Riquelme, AJ. (2012). MIA and NIR Chemical Imaging for pharmaceutical product characterization. Chemometrics and Intelligent Laboratory Systems. 117(117):240-249. https://doi.org/10.1016/j.chemolab.2012.04.002S24024911711

    MultiBaC: A strategy to remove batch effects between different omic data types

    Full text link
    [EN] Diversity of omic technologies has expanded in the last years together with the number of omic data integration strategies. However, multiomic data generation is costly, and many research groups cannot afford research projects where many different omic techniques are generated, at least at the same time. As most researchers share their data in public repositories, different omic datasets of the same biological system obtained at different labs can be combined to construct a multiomic study. However, data obtained at different labs or moments in time are typically subjected to batch effects that need to be removed for successful data integration. While there are methods to correct batch effects on the same data types obtained in different studies, they cannot be applied to correct lab or batch effects across omics. This impairs multiomic meta-analysis. Fortunately, in many cases, at least one omics platform-i.e. gene expression- is repeatedly measured across labs, together with the additional omic modalities that are specific to each study. This creates an opportunity for batch analysis. We have developed MultiBaC (multiomic Multiomics Batch-effect Correction correction), a strategy to correct batch effects from multiomic datasets distributed across different labs or data acquisition events. Our strategy is based on the existence of at least one shared data type which allows data prediction across omics. We validate this approach both on simulated data and on a case where the multiomic design is fully shared by two labs, hence batch effect correction within the same omic modality using traditional methods can be compared with the MultiBaC correction across data types. Finally, we apply MultiBaC to a true multiomic data integration problem to show that we are able to improve the detection of meaningful biological effects.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is part of a research project that is totally funded by Conselleria d'Educacio, Cultura i Esport (Generalitat Valenciana) through PROMETEO grants program for excellence research groups.Ugidos, M.; Tarazona Campos, S.; Prats-Montalbán, JM.; Ferrer, A.; Conesa, A. (2020). MultiBaC: A strategy to remove batch effects between different omic data types. Statistical Methods in Medical Research. 29(10):2851-2864. https://doi.org/10.1177/0962280220907365S285128642910Kupfer, P., Guthke, R., Pohlers, D., Huber, R., Koczan, D., & Kinne, R. W. (2012). Batch correction of microarray data substantially improves the identification of genes differentially expressed in Rheumatoid Arthritis and Osteoarthritis. BMC Medical Genomics, 5(1). doi:10.1186/1755-8794-5-23Gregori, J., Villarreal, L., Méndez, O., Sánchez, A., Baselga, J., & Villanueva, J. (2012). Batch effects correction improves the sensitivity of significance tests in spectral counting-based comparative discovery proteomics. Journal of Proteomics, 75(13), 3938-3951. doi:10.1016/j.jprot.2012.05.005Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47-e47. doi:10.1093/nar/gkv007Gagnon-Bartsch, J. A., & Speed, T. P. (2011). Using control genes to correct for unwanted variation in microarray data. Biostatistics, 13(3), 539-552. doi:10.1093/biostatistics/kxr034Nueda, M. j., Ferrer, A., & Conesa, A. (2011). ARSyN: a method for the identification and removal of systematic noise in multifactorial time course microarray experiments. Biostatistics, 13(3), 553-566. doi:10.1093/biostatistics/kxr042Jansen, J. J., Hoefsloot, H. C. J., van der Greef, J., Timmerman, M. E., Westerhuis, J. A., & Smilde, A. K. (2005). ASCA: analysis of multivariate data obtained from an experimental design. Journal of Chemometrics, 19(9), 469-481. doi:10.1002/cem.952Nueda, M. J., Conesa, A., Westerhuis, J. A., Hoefsloot, H. C. J., Smilde, A. K., Talón, M., & Ferrer, A. (2007). Discovering gene expression patterns in time course microarray experiments by ANOVA–SCA. Bioinformatics, 23(14), 1792-1800. doi:10.1093/bioinformatics/btm251Giordan, M. (2013). A Two-Stage Procedure for the Removal of Batch Effects in Microarray Studies. Statistics in Biosciences, 6(1), 73-84. doi:10.1007/s12561-013-9081-1Nyamundanda, G., Poudel, P., Patil, Y., & Sadanandam, A. (2017). A Novel Statistical Method to Diagnose, Quantify and Correct Batch Effects in Genomic Studies. Scientific Reports, 7(1). doi:10.1038/s41598-017-11110-6Reese, S. E., Archer, K. J., Therneau, T. M., Atkinson, E. J., Vachon, C. M., de Andrade, M., … Eckel-Passow, J. E. (2013). A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis. Bioinformatics, 29(22), 2877-2883. doi:10.1093/bioinformatics/btt480Papiez, A., Marczyk, M., Polanska, J., & Polanski, A. (2018). BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm. Bioinformatics, 35(11), 1885-1892. doi:10.1093/bioinformatics/bty900Keel, B. N., Zarek, C. M., Keele, J. W., Kuehn, L. A., Snelling, W. M., Oliver, W. T., … Lindholm-Perry, A. K. (2018). RNA-Seq Meta-analysis identifies genes in skeletal muscle associated with gain and intake across a multi-season study of crossbred beef steers. BMC Genomics, 19(1). doi:10.1186/s12864-018-4769-8Li, M. D., Burns, T. C., Morgan, A. A., & Khatri, P. (2014). Integrated multi-cohort transcriptional meta-analysis of neurodegenerative diseases. Acta Neuropathologica Communications, 2(1). doi:10.1186/s40478-014-0093-yAndres-Terre, M., McGuire, H. M., Pouliot, Y., Bongen, E., Sweeney, T. E., Tato, C. M., & Khatri, P. (2015). Integrated, Multi-cohort Analysis Identifies Conserved Transcriptional Signatures across Multiple Respiratory Viruses. Immunity, 43(6), 1199-1211. doi:10.1016/j.immuni.2015.11.003Sandhu, V., Labori, K. J., Borgida, A., Lungu, I., Bartlett, J., Hafezi-Bakhtiari, S., … Haibe-Kains, B. (2019). Meta-Analysis of 1,200 Transcriptomic Profiles Identifies a Prognostic Model for Pancreatic Ductal Adenocarcinoma. JCO Clinical Cancer Informatics, (3), 1-16. doi:10.1200/cci.18.00102Huang, H., Liu, C.-C., & Zhou, X. J. (2010). Bayesian approach to transforming public gene expression repositories into disease diagnosis databases. Proceedings of the National Academy of Sciences, 107(15), 6823-6828. doi:10.1073/pnas.0912043107Pelechano, V., & Pérez-Ortín, J. E. (2010). There is a steady-state transcriptome in exponentially growing yeast cells. Yeast, 27(7), 413-422. doi:10.1002/yea.1768Garcı́a-Martı́nez, J., Aranda, A., & Pérez-Ortı́n, J. E. (2004). Genomic Run-On Evaluates Transcription Rates for All Yeast Genes and Identifies Gene Regulatory Mechanisms. Molecular Cell, 15(2), 303-313. doi:10.1016/j.molcel.2004.06.004Pelechano, V., Chávez, S., & Pérez-Ortín, J. E. (2010). A Complete Set of Nascent Transcription Rates for Yeast Genes. PLoS ONE, 5(11), e15442. doi:10.1371/journal.pone.0015442Zid, B. M., & O’Shea, E. K. (2014). Promoter sequences direct cytoplasmic localization and translation of mRNAs during starvation in yeast. Nature, 514(7520), 117-121. doi:10.1038/nature13578Freeberg, M. A., Han, T., Moresco, J. J., Kong, A., Yang, Y.-C., Lu, Z., … Kim, J. K. (2013). Pervasive and dynamic protein binding sites of the mRNA transcriptome in Saccharomyces cerevisiae. Genome Biology, 14(2), R13. doi:10.1186/gb-2013-14-2-r13McKinlay, A., Araya, C. L., & Fields, S. (2011). Genome-Wide Analysis of Nascent Transcription in Saccharomyces cerevisiae. G3 Genes|Genomes|Genetics, 1(7), 549-558. doi:10.1534/g3.111.000810Castells-Roca, L., García-Martínez, J., Moreno, J., Herrero, E., Bellí, G., & Pérez-Ortín, J. E. (2011). Heat Shock Response in Yeast Involves Changes in Both Transcription Rates and mRNA Stabilities. PLoS ONE, 6(2), e17272. doi:10.1371/journal.pone.0017272Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109-130. doi:10.1016/s0169-7439(01)00155-1Folch-Fortuny, A., Vitale, R., de Noord, O. E., & Ferrer, A. (2017). Calibration transfer between NIR spectrometers: New proposals and a comparative study. Journal of Chemometrics, 31(3), e2874. doi:10.1002/cem.2874García Muñoz, S., MacGregor, J. F., & Kourti, T. (2005). Product transfer between sites using Joint-Y PLS. Chemometrics and Intelligent Laboratory Systems, 79(1-2), 101-114. doi:10.1016/j.chemolab.2005.04.009Andrade, J. M., Gómez-Carracedo, M. P., Krzanowski, W., & Kubista, M. (2004). Procrustes rotation in analytical chemistry, a tutorial. Chemometrics and Intelligent Laboratory Systems, 72(2), 123-132. doi:10.1016/j.chemolab.2004.01.007Hurley, J. R., & Cattell, R. B. (2007). The procrustes program: Producing direct rotation to test a hypothesized factor structure. Behavioral Science, 7(2), 258-262. doi:10.1002/bs.3830070216Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A K-Means Clustering Algorithm. Applied Statistics, 28(1), 100. doi:10.2307/234683

    Measurement of the colour index of early-season citrus fruits using computer vision

    Get PDF
    Aspect and colour of food surface is one of the first quality parameters evaluated by consumers and it is a key factor in the acceptance of a particular product for the consumers. In the Mediterranean area, early-season citrus fruit reach acceptable internal maturity standards for marketing, while the skin of the fruit is still green. A degreening treatment is widely used as a postharvest practice to improve the external colour. This treatment depends on the initial colour of the fruit. Therefore, an application where the inspection of the colour is needed is the assessment of these citrus fruits in order to determine accurately the colour of the citrus at harvest. The classification to determine if the fruit needs to be treated for degreening is based on the citrus colour index (CCI). In this work, the potential for the in-line CCI assessment of an industrial computer vision system is studied and compared with two other devices; a characterized computer vision system and a spectrocolorimeter used as reference in the analysis of colour on food. The results obtained prove that the industrial computer vision system predicts the colour index of citrus with a good reliability (R2 = 0.975) and is effective for classification of the fruit according to its colour

    Estimación de los componentes del racimo mediante análisis de imagen

    Get PDF
    El peso de baya, así como el número de bayas y peso del racimo son parámetros fundamentales en la estimación del rendimiento en la industria vitivinícola y de uva de mesa. En la actualidad, los métodos utilizados para estimar y predecir el rendimiento productivo del viñedo son destructivos, lentos, y requieren elevada cantidad de mano de obra. En este trabajo se presenta una nueva metodología, basada en el análisis de imagen, para determinar los componentes del racimo de forma rápida y económica. Se fotografiaron racimos de siete variedades de uva (Vitis vinifera L.) distintas en condiciones de laboratorio y se determinaron los componentes del racimo de forma manual después de la adquisición de imágenes. El tratamiento de las imágenes incluyó el desarrollo de dos algoritmos basados en las estrategias de Canny y LIP (Logarithmic Image Processing) para encontrar los contornos de las bayas, como paso previo a la detección de las mismas mediante la Transformada de Hough. Asimismo, se comparó la capacidad de los algoritmos desarrollados utilizando una única imagen por racimo o cuatro imágenes por racimo, obtenidas de diferentes orientaciones. Los mejores resultados (R2 entre 69%-95% en detección del número de bayas por racimo, y R2 entre 65%-97% en la estimación del peso de racimo) se obtuvieron utilizando cuatro imágenes por racimo y aplicando el algoritmo de Canny. Asimismo, la capacidad del modelo basado en análisis de imagen para predecir el peso de baya fue 84%. La novedosa metodología desarrollada y presentada en este trabajo ha permitido la estimación de los componentes del racimo de forma rápida y económica, en comparación con los métodos manuales actuales

    Un nuevo método para la evaluación de la compacidad del racimo mediante análisis de imagen

    Get PDF
    La compacidad del racimo es una característica clave que puede influir de manera importante en la calidad de la uva y del vino. El descriptor OIV, método más utilizado para la evaluación de la compacidad del racimo, requiere una inspección visual por evaluadores entrenados y proporciona valores subjetivos y cualitativos. Este trabajo presenta una nueva metodología basada en análisis de imagen para determinar la compacidad del racimo de manera no invasiva, objetiva y cuantitativa. El modelo PLS construido a partir de algunas características morfológicas extraídas de forma automática mediante técnicas de análisis de imagen mostró una capacidad de predicción del 85,3% en la evaluación de la compacidad respecto a la evaluación visual

    Near infrared hyperspectral imaging for forensic analysis of document forgery

    Full text link
    [EN] Hyperspectral images in the near infrared range (HSI-NIR) were evaluated as a nondestructive method to detect fraud in documents. Three different types of typical forgeries were simulated by (a) obliterating text, (b) adding text and (c) approaching the crossing lines problem. The simulated samples were imaged in the range of 928 2524 nm with spectral and spatial resolutions of 6.3 nm and 10 mm, respectively. After data pre-processing, different chemometric techniques were evaluated for each type of forgery. Principal component analysis (PCA) was performed to elucidate the first two types of adulteration, (a) and (b). Moreover, Multivariate Curve Resolution Alternating Least Squares (MCR-ALS) was used in an attempt to improve the results of the type (a) obliteration and type (b) adding text problems. Finally, MCR-ALS and Partial Least Squares Discriminant Analysis (PLS-DA), employed as a variable selection tool, were used to study the type (c) forgeries, i.e. crossing lines problem. Type (a) forgeries (obliterating text) were successfully identified in 43% of the samples using both the chemometric methods (PCA and MCR-ALS). Type (b) forgeries (adding text) were successfully identified in 82% of the samples using both the methods (PCA and MCR-ALS). Finally, type (c) forgeries (crossing lines) were successfully identified in 85% of the samples. The results demonstrate the potential of HSI-NIR associated with chemometric tools to support document forgery identificationINCTAA (Processes no. : CNPq 573894/2008-6; FAPESP 2008/57808-1), NUQAAPE, FACEPE, CNPq, CAPES, Spanish Ministry of Science and Innovation MICINN (grant DPI2011-28112-C04-02).Silva, CS.; Pimentel, MF.; Honorato, RS.; Pasquini, C.; Prats Montalbán, JM.; Ferrer Riquelme, AJ. (2014). Near infrared hyperspectral imaging for forensic analysis of document forgery. Analyst. 139(20):5176-5184. https://doi.org/10.1039/C4AN00961DS517651841392
    corecore