42 research outputs found

    Evaluating the accuracy of AIM panels at quantifying genome ancestry

    Get PDF
    Background There is a growing interest among geneticists in developing panels of Ancestry Informative Markers (AIMs) aimed at measuring the biogeographical ancestry of individual genomes. The efficiency of these panels is commonly tested empirically by contrasting self-reported ancestry with the ancestry estimated from these panels. Results Using SNP data from HapMap we carried out a simulation-based study aimed at measuring the effect of SNP coverage on the estimation of genome ancestry. For three of the main continental groups (Africans, East Asians, Europeans) ancestry was first estimated using the whole HapMap SNP database as a proxy for global genome ancestry; these estimates were subsequently compared to those obtained from pre-designed AIM panels. Panels that consider >400 AIMs capture genome ancestry reasonably well, while those containing a few dozen AIMs show a large variability in ancestry estimates. Curiously, 500-1,000 SNPs selected at random from the genome provide an unbiased estimate of genome ancestry and perform as well as any AIM panel of similar size. In simulated scenarios of population admixture, panels containing few AIMs also show important deficiencies to measure genome ancestry. Conclusions The results indicate that the ability to estimate genome ancestry is strongly dependent on the number of AIMs used, and not primarily on their individual informativeness. Caution should be taken when making individual (medical, forensic, or anthropological) inferences based on AIMs.The research leading to these results has received funding from the “Ministerio de Ciencia e Innovación” (SAF2008-02971) and from the Plan Galego IDT, Xunta de Galicia (EM 2012/045) (A.S.) and Consellería de Sanidade/Xunta de Galicia (RHI07/2-intensificación actividad investigadora and 10PXIB918184PR), Instituto Carlos III (Intensificación de la actividad investigadora) and Fondo de Investigación Sanitaria (FIS; PI070069 and PI1000540) del Plan Nacional de I + D + I and ‘fondos FEDER’ (F.M.T.), and the grant from the Sistema Universitario Gallego- Modalidad REDES (2012-PG226) of the Consellería de Cultura, Educación e Ordenación Universitaria of the Xunta de Galicia (A.S., F.M.T.)S

    The saga of the many studies wrongly associating mitochondrial DNA with breast cancer

    Get PDF
    Background A large body of genetic research has focused on the potential role that mitochondrial DNA (mtDNA) variants might play on the predisposition to common and complex (multi-factorial) diseases. It has been argued however that many of these studies could be inconclusive due to artifacts related to genotyping errors or inadequate design. Methods Analyses of the data published in case–control breast cancer association studies have been performed using a phylogenetic-based approach. Variation observed in these studies has been interpreted in the light of data available on public resources, which now include over >27,000 complete mitochondrial sequences and the worldwide phylogeny determined by these mitogenomes. Complementary analyses were carried out using public datasets of partial mtDNA sequences, mainly corresponding to control-region segments. Results By way of example, we show here another kind of fallacy in these medical studies, namely, the phenomenon of SNP-SNP interaction wrongly applied to haploid data in a breast cancer study. We also reassessed the mutually conflicting studies suggesting some functional role of the non-synonymous polymorphism m.10398A > G (ND3 subunit of mitochondrial complex I) in breast cancer. In some studies, control groups were employed that showed an extremely odd haplogroup frequency spectrum compared to comparable information from much larger databases. Moreover, the use of inappropriate statistics signaled spurious “significance” in several instances. Conclusions Every case–control study should come under scrutiny in regard to the plausibility of the control-group data presented and appropriateness of the statistical methods employed; and this is best done before potential publication.AS has been supported by grants from the “Ministerio de Ciencia e Innovación” (SAF2011-26983) and from the Plan Galego IDT, Xunta de Galicia (EM 2012/045)S

    A 2-transcript host cell signature distinguishes viral from bacterial diarrhea and it is influenced by the severity of symptoms

    Get PDF
    Recently, a biomarker signature consisting of 2-transcript host RNAs was proposed for discriminating bacterial from viral infections in febrile children. We evaluated the performance of this signature in a different disease scenario, namely a cohort of Mexican children (n = 174) suffering from acute diarrhea of different infectious etiologies. We first examined the admixed background of the patients, indicating that most of them have a predominantly Native American genetic ancestry with a variable amount of European background (ranging from 0% to 57%). The results confirm that the RNA test can discriminate between viral and bacterial causes of infection (t-test; P-value = 6.94×10−11; AUC = 80%; sensitivity: 68% [95% CI: 55%–79%]; specificity: 84% [95% CI: 78%–90%]), but the strength of the signal differs substantially depending on the causal pathogen, with the stronger signal being that of Shigella (P-value = 3.14 × 10−12; AUC = 89; sensitivity: 70% [95% CI: 57%–83%]; specificity: 100% [95% CI: 100%–100%]). The accuracy of this test improves significantly when excluding mild cases (P-value = 2.13 × 10−6; AUC = 85%; sensitivity: 79% [95% CI: 58%–95%]; specificity: 78% [95% CI: 65%–88%]). The results broaden the scope of previous studies by incorporating different pathogens, variable levels of disease severity, and different ancestral background of patients, and add confirmatory support to the clinical utility of these 2-transcript biomarkers.S

    Rotavirus and autoimmunity

    Get PDF
    Rotavirus, a major etiological agent of acute diarrhea in children worldwide, has historically been linked to autoimmunity. In the last few years, several physiopathological approaches have been proposed to explain the leading mechanism triggering autoimmunity, from the old concept of molecular mimicry to the emerging theory of bystander activation and break of tolerance. Epidemiological and immunological data indicate a strong link between rotavirus infection and two of the autoimmune pathologies with the highest incidence: celiac disease and diabetes. The role for current oral rotavirus vaccines is now being elucidated, with a so far positive protective association demonstrated

    SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access

    Get PDF
    Background: In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genetics studies. Although many useful conclusions may be extracted by querying databases individually, the lack of flexibility for combining data from within and between each database does not allow the calculation of key population variability statistics. Results: We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs for Population Studies). A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference indices, and stored into a relational database that currently handles as many as 4 × 109 genotypes and that can be easily extended to new database initiatives. We have also built a web interface to the data mart that allows the browsing of underlying data indexed by population and the combining of populations, allowing intuitive and straightforward comparison of population groups. All the information served is optimized for web display, and most of the computations are already pre-processed in the data mart to speed up the data browsing and any computational treatment requested. Conclusion: In practice, SPSmart allows populations to be combined into user-defined groups, while multiple databases can be accessed and compared in a few simple steps from a single query. It performs the queries rapidly and gives straightforward graphical summaries of SNP population variability through visual inspection of allele frequencies outlined in standard pie-chart format. In addition, full numerical description of the data is output in statistical results panels that include common population genetics metrics such as heterozygosity, Fst and In.The grants from the Xunta de Galicia (PGIDIT06PXIB208079PR) and Fundación de Investigación Médica Mutua Madrileña awarded to AS partially supported this projectS

    A Generalized Model to Estimate the Statistical Power in Mitochondrial Disease Studies Involving 2×k Tables

    Get PDF
    Mitochondrial DNA (mtDNA) variation (i.e. haplogroups) has been analyzed in regards to a number of multifactorial diseases. The statistical power of a case-control study determines the a priori probability to reject the null hypothesis of homogeneity between cases and controlsThe research leading to these results has received funding from the “Ministerio de Ciencia e Innovación” (SAF2008-02971) and from the Plan Galego IDT, Xunta de Galicia (EM 2012/045) given to A.S.S

    Mapping genome variation of SARS-CoV-2 worldwide highlights the impact of COVID-19 super-spreaders

    Get PDF
    The human pathogen severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is responsible for the major pandemic of the twenty-first century. We analyzed more than 4700 SARS-CoV-2 genomes and associated metadata retrieved from public repositories. SARS-CoV-2 sequences have a high sequence identity (>99.9%), which drops to >96% when compared to bat coronavirus genome. We built a mutation-annotated reference SARS-CoV-2 phylogeny with two main macro-haplogroups, A and B, both of Asian origin, and more than 160 sub-branches representing virus strains of variable geographical origins worldwide, revealing a rather uniform mutation occurrence along branches that could have implications for diagnostics and the design of future vaccines. Identification of the root of SARS-CoV-2 genomes is not without problems, owing to conflicting interpretations derived from either using the bat coronavirus genomes as an outgroup or relying on the sampling chronology of the SARS-CoV-2 genomes and TMRCA estimates; however, the overall scenario favors haplogroup A as the ancestral node. Phylogenetic analysis indicates a TMRCA for SARS-CoV-2 genomes dating to November 12, 2019, thus matching epidemiological records. Sub-haplogroup A2 most likely originated in Europe from an Asian ancestor and gave rise to subclade A2a, which represents the major non-Asian outbreak, especially in Africa and Europe. Multiple founder effect episodes, most likely associated with super-spreader hosts, might explain COVID-19 pandemic to a large extentThis study received support from the Instituto de Salud Carlos III: project GePEM (Instituto de Salud Carlos III(ISCIII)/PI16/01478/ Cofinanciado FEDER), DIAVIR (Instituto de Salud Carlos III(ISCIII)/DTS19/00049/Cofinanciado FEDER; Proyecto de Desarrollo Tecnológico en Salud) and Resvi-Omics (Instituto de Salud Carlos III(ISCIII)/PI19/01039/Cofinanciado FEDER) and project BI-BACVIR (PRIS-3; Agencia de Conocimiento en Salud (ACIS)—Servicio Gallego de Salud (SERGAS)—Xunta de Galicia; Spain) given to A.S.; and projects ReSVinext (Instituto de Salud Carlos III(ISCIII)/PI16/01569/Cofinanciado FEDER), and Enterogen (Instituto de Salud Carlos III(ISCIII)/ PI19/01090/ Cofinanciado FEDER) given to F.M.-TS

    A Genome-Wide Study of Modern-Day Tuscans: Revisiting Herodotus's Theory on the Origin of the Etruscans

    Get PDF
    Background: The origin of the Etruscan civilization (Etruria, Central Italy) is a long-standing subject of debate among scholars from different disciplines. The bulk of the information has been reconstructed from ancient texts and archaeological findings and, in the last few years, through the analysis of uniparental genetic markers. Methods: By meta-analyzing genome-wide data from The 1000 Genomes Project and the literature, we were able to compare the genomic patterns (.540,000 SNPs) of present day Tuscans (N = 98) with other population groups from the main hypothetical source populations, namely, Europe and the Middle East. Results: Admixture analysis indicates the presence of 25–34% of Middle Eastern component in modern Tuscans. Different analyses have been carried out using identity-by-state (IBS) values and genetic distances point to Eastern Anatolia/Southern Caucasus as the most likely geographic origin of the main Middle Eastern genetic component observed in the genome of modern Tuscans. Conclusions: The data indicate that the admixture event between local Tuscans and Middle Easterners could have occurred in Central Italy about 2,600–3,100 years ago (y.a.). On the whole, the results validate the theory of the ancient historian Herodotus on the origin of Etruscans.The research leading to these results has received funding from the ‘‘Ministerio de Ciencia e Innovacio´n’’ (SAF2011-26983) and from the Plan Galego IDT, Xunta de Galicia (EM 2012/045) (A.S.) and Consellerı´a de Sanidade/Xunta de Galicia (RHI07/2-intensificacio´n de la actividad investigadora and 10PXIB918184PR), Instituto Carlos III (Intensificacio´n de la actividad investigadora) and Fondo de Investigacio´n Sanitaria (FIS; PI07/0069, PI10/00540 and PI13/ 02382) of the Plan Nacional de I+D+I and ‘fondos FEDER’ (F.M.T.). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.S

    Mapping the genomic mosaic of two ‘Afro-Bolivians’ from the isolated Yungas valleys

    Get PDF
    Background: Unraveling the ancestry of ‘Afro-American’ communities is hampered by the complex demographic processes that took place during the Transatlantic Slave Trade (TAST) and the (post-)colonization periods. ‘AfroBolivians’ from the subtropical Yungas valleys constitute small and isolated communities that live surrounded by the predominant Native American community of Bolivia. By genotyping >580,000 SNPs in two ‘Afro-Bolivians’, and comparing these genomic profiles with data compiled from more than 57 African groups and other reference ancestral populations (n = 1,161 in total), we aimed to disentangle the complex admixture processes undergone by ‘Afro-Bolivians’. Results: The data indicate that these two genomes constitute a complex mosaic of ancestries that is approximately 80 % of recent African origin; the remaining ~20 % being European and Native American. West-Central Africa contributed most of the African ancestry to ‘Afro-Bolivians’, and this component is related to populations living along the Atlantic coast (i.e. Senegal, Ghana, Nigeria). Using tract length distribution of genomic segments attributable to distinct ancestries, we could date the time of admixture in about 400 years ago. This time coincides with the maximum importation of slaves to Bolivia to compensate the diminishing indigenous labor force needed for the development of the National Mint of Potosí. Conclusions: Overall, the data indicate that the genome of ‘Afro-Bolivians’ was shaped by a complex process of admixture occurring in America among individuals originating in different West-Central African populations; their genomic mosaics received additional contributions of Europeans and local Native Americans (e.g. Aymaras)The research leading to these results has received funding from the People Program (Marie Curie Actions) of the European Union’s Seventh Framework Program FP7/2007–2013/under REA grant agreement no. 290344, from the “Ministerio de Ciencia e Innovación” (SAF2011–26983), the “Plan Galego IDT” (EM 2012/045) and the grant from the “Sistema Universitario Gallego- Modalidad REDES (2012-PG226) from the Xunta de Galicia (A.S.). F.M-T received support from the grant “ISCIII/INT14/00245/ Cofinanciado FEDER"S

    Ancestry patterns inferred from massive RNA-seq data

    Get PDF
    There is a growing body of evidence suggesting that patterns of gene expression vary within and between human populations. However, the impact of this variation in human diseases has been poorly explored, in part owing to the lack of a standardized protocol to estimate biogeographical ancestry from gene expression studies. Here we examine several studies that provide new solid evidence indicating that the ancestral background of individuals impacts gene expression patterns. Next, we test a procedure to infer genetic ancestry from RNA-seq data in 25 data sets where information on ethnicity was reported. Genome data of reference continental populations retrieved from The 1000 Genomes Project were used for comparisons. Remarkably, only eight out of 25 data sets passed FastQC default filters. We demonstrate that, for these eight population sets, the ancestral background of donors could be inferred very efficiently, even in data sets including samples with complex patterns of admixture (e.g., American-admixed populations). For most of the gene expression data sets of suboptimal quality, ancestral inference yielded odd patterns. The present study thus brings a cautionary note for gene expression studies highlighting the importance to control for the potential confounding effect of ancestral genetic background
    corecore