679 research outputs found

    MultiBaC: A strategy to remove batch effects between different omic data types

    Full text link
    [EN] Diversity of omic technologies has expanded in the last years together with the number of omic data integration strategies. However, multiomic data generation is costly, and many research groups cannot afford research projects where many different omic techniques are generated, at least at the same time. As most researchers share their data in public repositories, different omic datasets of the same biological system obtained at different labs can be combined to construct a multiomic study. However, data obtained at different labs or moments in time are typically subjected to batch effects that need to be removed for successful data integration. While there are methods to correct batch effects on the same data types obtained in different studies, they cannot be applied to correct lab or batch effects across omics. This impairs multiomic meta-analysis. Fortunately, in many cases, at least one omics platform-i.e. gene expression- is repeatedly measured across labs, together with the additional omic modalities that are specific to each study. This creates an opportunity for batch analysis. We have developed MultiBaC (multiomic Multiomics Batch-effect Correction correction), a strategy to correct batch effects from multiomic datasets distributed across different labs or data acquisition events. Our strategy is based on the existence of at least one shared data type which allows data prediction across omics. We validate this approach both on simulated data and on a case where the multiomic design is fully shared by two labs, hence batch effect correction within the same omic modality using traditional methods can be compared with the MultiBaC correction across data types. Finally, we apply MultiBaC to a true multiomic data integration problem to show that we are able to improve the detection of meaningful biological effects.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is part of a research project that is totally funded by Conselleria d'Educacio, Cultura i Esport (Generalitat Valenciana) through PROMETEO grants program for excellence research groups.Ugidos, M.; Tarazona Campos, S.; Prats-Montalbán, JM.; Ferrer, A.; Conesa, A. (2020). MultiBaC: A strategy to remove batch effects between different omic data types. Statistical Methods in Medical Research. 29(10):2851-2864. https://doi.org/10.1177/0962280220907365S285128642910Kupfer, P., Guthke, R., Pohlers, D., Huber, R., Koczan, D., & Kinne, R. W. (2012). Batch correction of microarray data substantially improves the identification of genes differentially expressed in Rheumatoid Arthritis and Osteoarthritis. BMC Medical Genomics, 5(1). doi:10.1186/1755-8794-5-23Gregori, J., Villarreal, L., Méndez, O., Sánchez, A., Baselga, J., & Villanueva, J. (2012). Batch effects correction improves the sensitivity of significance tests in spectral counting-based comparative discovery proteomics. Journal of Proteomics, 75(13), 3938-3951. doi:10.1016/j.jprot.2012.05.005Ritchie, M. E., Phipson, B., Wu, D., Hu, Y., Law, C. W., Shi, W., & Smyth, G. K. (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 43(7), e47-e47. doi:10.1093/nar/gkv007Gagnon-Bartsch, J. A., & Speed, T. P. (2011). Using control genes to correct for unwanted variation in microarray data. Biostatistics, 13(3), 539-552. doi:10.1093/biostatistics/kxr034Nueda, M. j., Ferrer, A., & Conesa, A. (2011). ARSyN: a method for the identification and removal of systematic noise in multifactorial time course microarray experiments. Biostatistics, 13(3), 553-566. doi:10.1093/biostatistics/kxr042Jansen, J. J., Hoefsloot, H. C. J., van der Greef, J., Timmerman, M. E., Westerhuis, J. A., & Smilde, A. K. (2005). ASCA: analysis of multivariate data obtained from an experimental design. Journal of Chemometrics, 19(9), 469-481. doi:10.1002/cem.952Nueda, M. J., Conesa, A., Westerhuis, J. A., Hoefsloot, H. C. J., Smilde, A. K., Talón, M., & Ferrer, A. (2007). Discovering gene expression patterns in time course microarray experiments by ANOVA–SCA. Bioinformatics, 23(14), 1792-1800. doi:10.1093/bioinformatics/btm251Giordan, M. (2013). A Two-Stage Procedure for the Removal of Batch Effects in Microarray Studies. Statistics in Biosciences, 6(1), 73-84. doi:10.1007/s12561-013-9081-1Nyamundanda, G., Poudel, P., Patil, Y., & Sadanandam, A. (2017). A Novel Statistical Method to Diagnose, Quantify and Correct Batch Effects in Genomic Studies. Scientific Reports, 7(1). doi:10.1038/s41598-017-11110-6Reese, S. E., Archer, K. J., Therneau, T. M., Atkinson, E. J., Vachon, C. M., de Andrade, M., … Eckel-Passow, J. E. (2013). A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis. Bioinformatics, 29(22), 2877-2883. doi:10.1093/bioinformatics/btt480Papiez, A., Marczyk, M., Polanska, J., & Polanski, A. (2018). BatchI: Batch effect Identification in high-throughput screening data using a dynamic programming algorithm. Bioinformatics, 35(11), 1885-1892. doi:10.1093/bioinformatics/bty900Keel, B. N., Zarek, C. M., Keele, J. W., Kuehn, L. A., Snelling, W. M., Oliver, W. T., … Lindholm-Perry, A. K. (2018). RNA-Seq Meta-analysis identifies genes in skeletal muscle associated with gain and intake across a multi-season study of crossbred beef steers. BMC Genomics, 19(1). doi:10.1186/s12864-018-4769-8Li, M. D., Burns, T. C., Morgan, A. A., & Khatri, P. (2014). Integrated multi-cohort transcriptional meta-analysis of neurodegenerative diseases. Acta Neuropathologica Communications, 2(1). doi:10.1186/s40478-014-0093-yAndres-Terre, M., McGuire, H. M., Pouliot, Y., Bongen, E., Sweeney, T. E., Tato, C. M., & Khatri, P. (2015). Integrated, Multi-cohort Analysis Identifies Conserved Transcriptional Signatures across Multiple Respiratory Viruses. Immunity, 43(6), 1199-1211. doi:10.1016/j.immuni.2015.11.003Sandhu, V., Labori, K. J., Borgida, A., Lungu, I., Bartlett, J., Hafezi-Bakhtiari, S., … Haibe-Kains, B. (2019). Meta-Analysis of 1,200 Transcriptomic Profiles Identifies a Prognostic Model for Pancreatic Ductal Adenocarcinoma. JCO Clinical Cancer Informatics, (3), 1-16. doi:10.1200/cci.18.00102Huang, H., Liu, C.-C., & Zhou, X. J. (2010). Bayesian approach to transforming public gene expression repositories into disease diagnosis databases. Proceedings of the National Academy of Sciences, 107(15), 6823-6828. doi:10.1073/pnas.0912043107Pelechano, V., & Pérez-Ortín, J. E. (2010). There is a steady-state transcriptome in exponentially growing yeast cells. Yeast, 27(7), 413-422. doi:10.1002/yea.1768Garcı́a-Martı́nez, J., Aranda, A., & Pérez-Ortı́n, J. E. (2004). Genomic Run-On Evaluates Transcription Rates for All Yeast Genes and Identifies Gene Regulatory Mechanisms. Molecular Cell, 15(2), 303-313. doi:10.1016/j.molcel.2004.06.004Pelechano, V., Chávez, S., & Pérez-Ortín, J. E. (2010). A Complete Set of Nascent Transcription Rates for Yeast Genes. PLoS ONE, 5(11), e15442. doi:10.1371/journal.pone.0015442Zid, B. M., & O’Shea, E. K. (2014). Promoter sequences direct cytoplasmic localization and translation of mRNAs during starvation in yeast. Nature, 514(7520), 117-121. doi:10.1038/nature13578Freeberg, M. A., Han, T., Moresco, J. J., Kong, A., Yang, Y.-C., Lu, Z., … Kim, J. K. (2013). Pervasive and dynamic protein binding sites of the mRNA transcriptome in Saccharomyces cerevisiae. Genome Biology, 14(2), R13. doi:10.1186/gb-2013-14-2-r13McKinlay, A., Araya, C. L., & Fields, S. (2011). Genome-Wide Analysis of Nascent Transcription in Saccharomyces cerevisiae. G3 Genes|Genomes|Genetics, 1(7), 549-558. doi:10.1534/g3.111.000810Castells-Roca, L., García-Martínez, J., Moreno, J., Herrero, E., Bellí, G., & Pérez-Ortín, J. E. (2011). Heat Shock Response in Yeast Involves Changes in Both Transcription Rates and mRNA Stabilities. PLoS ONE, 6(2), e17272. doi:10.1371/journal.pone.0017272Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems, 58(2), 109-130. doi:10.1016/s0169-7439(01)00155-1Folch-Fortuny, A., Vitale, R., de Noord, O. E., & Ferrer, A. (2017). Calibration transfer between NIR spectrometers: New proposals and a comparative study. Journal of Chemometrics, 31(3), e2874. doi:10.1002/cem.2874García Muñoz, S., MacGregor, J. F., & Kourti, T. (2005). Product transfer between sites using Joint-Y PLS. Chemometrics and Intelligent Laboratory Systems, 79(1-2), 101-114. doi:10.1016/j.chemolab.2005.04.009Andrade, J. M., Gómez-Carracedo, M. P., Krzanowski, W., & Kubista, M. (2004). Procrustes rotation in analytical chemistry, a tutorial. Chemometrics and Intelligent Laboratory Systems, 72(2), 123-132. doi:10.1016/j.chemolab.2004.01.007Hurley, J. R., & Cattell, R. B. (2007). The procrustes program: Producing direct rotation to test a hypothesized factor structure. Behavioral Science, 7(2), 258-262. doi:10.1002/bs.3830070216Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A K-Means Clustering Algorithm. Applied Statistics, 28(1), 100. doi:10.2307/234683

    ME3CA - Monitoring environment exercise and emotion by a cognitive assistant

    Get PDF
    The elderly population has increased dramatically in today’s society. This fact implies the need to propose new policies of attention to this group but without increasing social spending. Currently, there is a need to promote the care of elderly people in their own homes, avoiding being transferred to saturated residences. Bearing this in mind, in recent years numerous approaches have tried to offer solutions in this sense using the continuous advances in new information and communication technologies. In this way, this article proposes the employment of a personal assistant to help the elderly in the development of their daily life activities. The proposed system, called ME3CA, is a cognitive assistant that involves users in rehabilitating exercise, consisting of a sensorization platform and different integrated decision-making mechanisms. The system tries to plan and recommend activities to older people trying to improve their physical activity. In addition, in the decision making process the assistant takes into account the emotions of the user. In this way, the system is more personalized and emotionally intelligent.- (undefined

    Extensive Copy-Number Variation of Young Genes across Stickleback Populations

    Get PDF
    MM received funding from the Max Planck innovation funds for this project. PGDF was supported by a Marie Curie European Reintegration Grant (proposal nr 270891). CE was supported by German Science Foundation grants (DFG, EI 841/4-1 and EI 841/6-1). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    Initial Genomics of the Human Nucleolus

    Get PDF
    We report for the first time the genomics of a nuclear compartment of the eukaryotic cell. 454 sequencing and microarray analysis revealed the pattern of nucleolus-associated chromatin domains (NADs) in the linear human genome and identified different gene families and certain satellite repeats as the major building blocks of NADs, which constitute about 4% of the genome. Bioinformatic evaluation showed that NAD–localized genes take part in specific biological processes, like the response to other organisms, odor perception, and tissue development. 3D FISH and immunofluorescence experiments illustrated the spatial distribution of NAD–specific chromatin within interphase nuclei and its alteration upon transcriptional changes. Altogether, our findings describe the nature of DNA sequences associated with the human nucleolus and provide insights into the function of the nucleolus in genome organization and establishment of nuclear architecture

    Quantitative sequence-function relationships in proteins based on gene ontology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function – the basis of transfer of annotations in databases – must therefore be regarded with caution. Here, we present a quantitative study of sequence and function divergence, based on the Gene Ontology classification of function. We determined the relationship between sequence divergence and function divergence in 6828 protein families from the PFAM database. Within families there is a broad range of sequence similarity from very closely related proteins – for instance, orthologs in different mammals – to very distantly-related proteins at the limit of reliable recognition of homology.</p> <p>Results</p> <p>We correlated the divergence in sequences determined from pairwise alignments, and the divergence in function determined by path lengths in the Gene Ontology graph, taking into account the fact that many proteins have multiple functions. Our results show that, among homologous proteins, the proportion of divergent functions decreases dramatically above a threshold of sequence similarity at about 50% residue identity. For proteins with more than 50% residue identity, transfer of annotation between homologs will lead to an erroneous attribution with a totally dissimilar function in fewer than 6% of cases. This means that for very similar proteins (about 50 % identical residues) the chance of completely incorrect annotation is low; however, because of the phenomenon of recruitment, it is still non-zero.</p> <p>Conclusion</p> <p>Our results describe general features of the evolution of protein function, and serve as a guide to the reliability of annotation transfer, based on the closeness of the relationship between a new protein and its nearest annotated relative.</p

    The genomes of two key bumblebee species with primitive eusocial organization

    Get PDF
    Background: The shift from solitary to social behavior is one of the major evolutionary transitions. Primitively eusocial bumblebees are uniquely placed to illuminate the evolution of highly eusocial insect societies. Bumblebees are also invaluable natural and agricultural pollinators, and there is widespread concern over recent population declines in some species. High-quality genomic data will inform key aspects of bumblebee biology, including susceptibility to implicated population viability threats. Results: We report the high quality draft genome sequences of Bombus terrestris and Bombus impatiens, two ecologically dominant bumblebees and widely utilized study species. Comparing these new genomes to those of the highly eusocial honeybee Apis mellifera and other Hymenoptera, we identify deeply conserved similarities, as well as novelties key to the biology of these organisms. Some honeybee genome features thought to underpin advanced eusociality are also present in bumblebees, indicating an earlier evolution in the bee lineage. Xenobiotic detoxification and immune genes are similarly depauperate in bumblebees and honeybees, and multiple categories of genes linked to social organization, including development and behavior, show high conservation. Key differences identified include a bias in bumblebee chemoreception towards gustation from olfaction, and striking differences in microRNAs, potentially responsible for gene regulation underlying social and other traits. Conclusions: These two bumblebee genomes provide a foundation for post-genomic research on these key pollinators and insect societies. Overall, gene repertoires suggest that the route to advanced eusociality in bees was mediated by many small changes in many genes and processes, and not by notable expansion or depauperation

    Phlebotomine sand fly survey in the focus of leishmaniasis in Madrid, Spain (2012-2014): seasonal dynamics, Leishmania infantum infection rates and blood meal preferences

    Get PDF
    BACKGROUND: An unusual increase of human leishmaniasis cases due to Leishmania infantum is occurring in an urban area of southwestern Madrid, Spain, since 2010. Entomological surveys have shown that Phlebotomus perniciosus is the only potential vector. Direct xenodiagnosis in hares (Lepus granatensis) and rabbits (Oryctolagus cuniculus) collected in the focus area proved that they can transmit parasites to colonized P. perniciosus. Isolates were characterized as L. infantum. The aim of the present work was to conduct a comprehensive study of sand flies in the outbreak area, with special emphasis on P. perniciosus. METHODS: Entomological surveys were done from June to October 2012-2014 in 4 stations located close to the affected area. Twenty sticky traps (ST) and two CDC light traps (LT) were monthly placed during two consecutive days in every station. LT were replaced every morning. Sand fly infection rates were determined by dissecting females collected with LT. Molecular procedures applied to study blood meal preferences and to detect L. infantum were performed for a better understanding of the epidemiology of the outbreak. RESULTS: A total of 45,127 specimens belonging to 4 sand fly species were collected: P. perniciosus (75.34%), Sergentomyia minuta (24.65%), Phlebotomus sergenti (0.005%) and Phlebotomus papatasi (0.005%). No Phlebotomus ariasi were captured. From 3203 P. perniciosus female dissected, 117 were infected with flagellates (3.7%). Furthermore, 13.31% and 7.78% of blood-fed and unfed female sand flies, respectively, were found infected with L. infantum by PCR. The highest rates of infected P. perniciosus were detected at the end of the transmission periods. Regarding to blood meal preferences, hares and rabbits were preferred, although human, cat and dog blood were also found. CONCLUSIONS: This entomological study highlights the exceptional nature of the Leishmania outbreak occurring in southwestern Madrid, Spain. It is confirmed that P. perniciosus is the only vector in the affected area, with high densities and infection rates. Rabbits and hares were the main blood meal sources of this species. These results reinforce the need for an extensive and permanent surveillance in this region, and others of similar characteristics, in order to control the vector and regulate the populations of wild reservoirs.This study was partially sponsored and funded by: Dirección General de Salud Pública, Consejería de Sanidad, Comunidad de Madrid; Colegio de Veterinarios de Madrid; Colegio de Biólogos de Madrid and EU grant FP7-261504 EDENext (http://www.edenext.eu).S

    Transcriptome Sequencing and De Novo Analysis for Yesso Scallop (Patinopecten yessoensis) Using 454 GS FLX

    Get PDF
    BACKGROUND: Bivalves comprise 30,000 extant species, constituting the second largest group of mollusks. However, limited genetic research has focused on this group of animals so far, which is, in part, due to the lack of genomic resources. The advent of high-throughput sequencing technologies enables generation of genomic resources in a short time and at a minimal cost, and therefore provides a turning point for bivalve research. In the present study, we performed de novo transcriptome sequencing to first produce a comprehensive expressed sequence tag (EST) dataset for the Yesso scallop (Patinopecten yessoensis). RESULTS: In a single 454 sequencing run, 805,330 reads were produced and then assembled into 32,590 contigs, with about six-fold sequencing coverage. A total of 25,237 unique protein-coding genes were identified from a variety of developmental stages and adult tissues based on sequence similarities with known proteins. As determined by GO annotation and KEGG pathway mapping, functional annotation of the unigenes recovered diverse biological functions and processes. Transcripts putatively involved in growth, reproduction and stress/immune-response were identified. More than 49,000 single nucleotide polymorphisms (SNPs) and 2,700 simple sequence repeats (SSRs) were also detected. CONCLUSION: Our data provide the most comprehensive transcriptomic resource currently available for P. yessoensis. Candidate genes potentially involved in growth, reproduction, and stress/immunity-response were identified, and are worthy of further investigation. A large number of SNPs and SSRs were also identified and ready for marker development. This resource should lay an important foundation for future genetic or genomic studies on this species
    corecore