82 research outputs found
Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations
We present a mathematical model, and the corresponding mathematical analysis,
that justifies and quantifies the use of principal component analysis of
biallelic genetic marker data for a set of individuals to detect the number of
subpopulations represented in the data. We indicate that the power of the
technique relies more on the number of individuals genotyped than on the number
of markers.Comment: Corrected typos in Section 3.1 (M=120, N=2500) and proof of Lemma
The Genetic Ancestry of African Americans, Latinos, and European Americans Across the United States
Over the past 500 years, North America has been the site of ongoing mixing of Native Americans, European settlers, and Africans (brought largely by the trans-Atlantic slave trade), shaping the early history of what became the United States. We studied the genetic ancestry of 5,269 self-described African Americans, 8,663 Latinos, and 148,789 European Americans who are 23andMe customers and show that the legacy of these historical interactions is visible in the genetic ancestry of present-day Americans. We document pervasive mixed ancestry and asymmetrical male and female ancestry contributions in all groups studied. We show that regional ancestry differences reflect historical events, such as early Spanish colonization, waves of immigration from many regions of Europe, and forced relocation of Native Americans within the US. This study sheds light on the fine-scale differences in ancestry within and across the United States and informs our understanding of the relationship between racial and ethnic identities and genetic ancestry
PCAdmix: Principal Components-Based Assignment of Ancestry along Each Chromosome in Individuals with Admixed Ancestry from Two or More Populations
Identifying ancestry along each chromosome in admixed individuals provides a wealth of information for understanding the population genetic history of admixture events and is valuable for admixture mapping and identifying recent targets of selection. We present PCAdmix (available at https://sites.google.com/site/pcadmix/home), a Principal Componentsbased algorithm for determining ancestry along each chromosome from a high-density, genome-wide set of phased single-nucleotide polymorphism (SNP) genotypes of admixed individuals. We compare our method to HAPMIX on simulated data from two ancestral populations, and we find high concordance between the methods. Our method also has better accuracy than LAMP when applied to three-population admixture, a situation as yet unaddressed by HAPMIX. Finally, we apply our method to a data set of four Latino populations with European, African, and Native American ancestry. We find evidence of assortative mating in each of the four populations, and we identify regions of shared ancestry that may be recent targets of selection and could serve as candidate regions for admixture-based association mapping
On Identifying the Optimal Number of Population Clusters via the Deviance Information Criterion
Inferring population structure using Bayesian clustering programs often requires a priori specification of the number of subpopulations, , from which the sample has been drawn. Here, we explore the utility of a common Bayesian model selection criterion, the Deviance Information Criterion (DIC), for estimating . We evaluate the accuracy of DIC, as well as other popular approaches, on datasets generated by coalescent simulations under various demographic scenarios. We find that DIC outperforms competing methods in many genetic contexts, validating its application in assessing population structure
Author Correction: Discovery of 42 genome-wide significant loci associated with dyslexia
Correction to: Nature Genetics https://doi.org/10.1038/s41588-022-01192-y. Published online 20 October 2022.
In the version of this article originally published, a paragraph was omitted in the Methods section, reading “Genomic control. Top SNPs are reported from the more conservative GWAS results adjusted for genomic control (Fig. 1, Extended Data Figs. 1–4, and Supplementary Tables 1, 2, 9 and 10), whereas downstream analyses (including gene-set analysis, enrichment and heritability partitioning, genetic correlations, polygenic prediction, candidate gene replication) are based on GWAS results without genomic control.” The paragraph has now been included in the HTML and PDF versions of the article
Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals
Publisher Copyright: © 2022, The Author(s).We conduct a genome-wide association study (GWAS) of educational attainment (EA) in a sample of ~3 million individuals and identify 3,952 approximately uncorrelated genome-wide-significant single-nucleotide polymorphisms (SNPs). A genome-wide polygenic predictor, or polygenic index (PGI), explains 12–16% of EA variance and contributes to risk prediction for ten diseases. Direct effects (i.e., controlling for parental PGIs) explain roughly half the PGI’s magnitude of association with EA and other phenotypes. The correlation between mate-pair PGIs is far too large to be consistent with phenotypic assortment alone, implying additional assortment on PGI-associated factors. In an additional GWAS of dominance deviations from the additive model, we identify no genome-wide-significant SNPs, and a separate X-chromosome additive GWAS identifies 57.Peer reviewe
Recommended from our members
Genome-wide association and epidemiological analyses reveal common genetic origins between uterine leiomyomata and endometriosis
Abstract: Uterine leiomyomata (UL) are the most common neoplasms of the female reproductive tract and primary cause for hysterectomy, leading to considerable morbidity and high economic burden. Here we conduct a GWAS meta-analysis in 35,474 cases and 267,505 female controls of European ancestry, identifying eight novel genome-wide significant (P < 5 × 10−8) loci, in addition to confirming 21 previously reported loci, including multiple independent signals at 10 loci. Phenotypic stratification of UL by heavy menstrual bleeding in 3409 cases and 199,171 female controls reveals genome-wide significant associations at three of the 29 UL loci: 5p15.33 (TERT), 5q35.2 (FGFR4) and 11q22.3 (ATM). Four loci identified in the meta-analysis are also associated with endometriosis risk; an epidemiological meta-analysis across 402,868 women suggests at least a doubling of risk for UL diagnosis among those with a history of endometriosis. These findings increase our understanding of genetic contribution and biology underlying UL development, and suggest overlapping genetic origins with endometriosis
Association studies of up to 1.2 million individuals yield new insights into the genetic etiology of tobacco and alcohol use
Tobacco and alcohol use are leading causes of mortality that influence risk for many complex diseases and disorders 1 . They are heritable 2,3 and etiologically related 4,5 behaviors that have been resistant to gene discovery efforts 6–11 . In sample sizes up to 1.2 million individuals, we discovered 566 genetic variants in 406 loci associated with multiple stages of tobacco use (initiation, cessation, and heaviness) as well as alcohol use, with 150 loci evidencing pleiotropic association. Smoking phenotypes were positively genetically correlated with many health conditions, whereas alcohol use was negatively correlated with these conditions, such that increased genetic risk for alcohol use is associated with lower disease risk. We report evidence for the involvement of many systems in tobacco and alcohol use, including genes involved in nicotinic, dopaminergic, and glutamatergic neurotransmission. The results provide a solid starting point to evaluate the effects of these loci in model organisms and more precise substance use measures
- …