16 research outputs found

    Algorithms for Selecting Informative Marker Panels for Population Assignment

    Full text link
    Given a set of potential source populations, genotypes of an individual of unknown origin at a collection of markers can be used to predict the correct source population of the individual. For improved efficiency, informative markers can be chosen from a larger set of markers to maximize the accuracy of this prediction. However, selecting the loci that are individually most informative does not necessarily produce the optimal panel. Here, using genotypes from eight species—carp, cat, chicken, dog, fly, grayling, human, and maize—this univariate accumulation procedure is compared to new multivariate "greedy" and "maximin" algorithms for choosing marker panels. The procedures generally suggest similar panels, although the greedy method often recommends inclusion of loci that are not chosen by the other algorithms. In seven of the eight species, when applied to five or more markers, all methods achieve at least 94% assignment accuracy on simulated individuals, with one species—dog— producing this level of accuracy with only three markers, and the eighth species—human— requiring ∼13–16 markers. The new algorithms produce substantial improvements over use of randomly selected markers; where differences among the methods are noticeable, the greedy algorithm leads to slightly higher probabilities of correct assignment. Although none of the approaches necessarily chooses the panel with optimal performance, the algorithms all likely select panels with performance near enough to the maximum that they all are suitable for practical use.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/63393/1/cmb.2005.12.1183.pd

    Effective selection of informative SNPs and classification on the HapMap genotype data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since the single nucleotide polymorphisms (SNPs) are genetic variations which determine the difference between any two unrelated individuals, the SNPs can be used to identify the correct source population of an individual. For efficient population identification with the HapMap genotype data, as few informative SNPs as possible are required from the original 4 million SNPs. Recently, Park <it>et al.</it> (2006) adopted the nearest shrunken centroid method to classify the three populations, i.e., Utah residents with ancestry from Northern and Western Europe (CEU), Yoruba in Ibadan, Nigeria in West Africa (YRI), and Han Chinese in Beijing together with Japanese in Tokyo (CHB+JPT), from which 100,736 SNPs were obtained and the top 82 SNPs could completely classify the three populations.</p> <p>Results</p> <p>In this paper, we propose to first rank each feature (SNP) using a ranking measure, i.e., a modified t-test or F-statistics. Then from the ranking list, we form different feature subsets by sequentially choosing different numbers of features (e.g., 1, 2, 3, ..., 100.) with top ranking values, train and test them by a classifier, e.g., the support vector machine (SVM), thereby finding one subset which has the highest classification accuracy. Compared to the classification method of Park <it>et al.</it>, we obtain a better result, i.e., good classification of the 3 populations using on average 64 SNPs.</p> <p>Conclusion</p> <p>Experimental results show that the both of the modified t-test and F-statistics method are very effective in ranking SNPs about their classification capabilities. Combined with the SVM classifier, a desirable feature subset (with the minimum size and most informativeness) can be quickly found in the greedy manner after ranking all SNPs. Our method is able to identify a very small number of important SNPs that can determine the populations of individuals.</p

    Evidence of discrete yellowfin tuna (Thunnus albacares) populations demands rethink of management for this globally important resource

    Get PDF
    Tropical tuna fisheries are central to food security and economic development of many regions of the world. Contemporary population assessment and management generally assume these fisheries exploit a single mixed spawning population, within ocean basins. To date population genetics has lacked the required power to conclusively test this assumption. Here we demonstrate heterogeneous population structure among yellowfin tuna sampled at three locations across the Pacific Ocean (western, central, and eastern) via analysis of double digest restriction-site associated DNA using Next Generation Sequencing technology. The differences among locations are such that individuals sampled from one of the three regions examined can be assigned with close to 100% accuracy demonstrating the power of this approach for providing practical markers for fishery independent verification of catch provenance in a way not achieved by previous techniques. Given these results, an extended pan-tropical survey of yellowfin tuna using this approach will not only help combat the largest threat to sustainable fisheries (i.e. illegal, unreported, and unregulated fishing) but will also provide a basis to transform current monitoring, assessment, and management approaches for this globally significant species

    Review of the Forensic Applicability of Biostatistical Methods for Inferring Ancestry from Autosomal Genetic Markers

    Get PDF
    The inference of ancestry has become a part of the services many forensic genetic laboratories provide. Interest in ancestry may be to provide investigative leads or identify the region of origin in cases of unidentified missing persons. There exist many biostatistical methods developed for the study of population structure in the area of population genetics. However, the challenges and questions are slightly different in the context of forensic genetics, where the origin of a specific sample is of interest compared to the understanding of population histories and genealogies. In this paper, the methodologies for modelling population admixture and inferring ancestral populations are reviewed with a focus on their strengths and weaknesses in relation to ancestry inference in the forensic context

    Detecting individual ancestry in the human genome

    Get PDF
    Detecting and quantifying the population substructure present in a sample of individuals are of main interest in the fields of genetic epidemiology, population genetics, and forensics among others. To date, several algorithms have been proposed for estimating the amount of genetic ancestry within an individual. In the present review, we introduce the most widely used methods in population genetics for detecting individual genetic ancestry. We further show, by means of simulations, the performance of popular algorithms for detecting individual ancestry in various controlled demographic scenarios. Finally, we provide some hints on how to interpret the results from these algorithms

    Population Identifiability from Forensic Genetic Markers: Ancestry Variation in Latin America

    Get PDF
    The Combined DNA Index System (CODIS) loci are a standard microsatellite marker set widely used for distinguishing among individuals in forensic DNA identity testing for medico-legal casework in the United States and in other countries. In anthropological genetic research, CODIS markers have become an important tool for uses extending beyond case investigations to quantify ancestry proportions, reveals patterns of admixture and trace population histories. These investigations are especially prevalent in studies of Latin American population structure. Nevertheless, the accuracy of the ancestry estimates computed from the CODIS loci for highly admixed Latino populations has not been formally tested. Long-standing arguments have been made that small ancestry panels, including the CODIS loci specifically, are not suitable for ancestry inference in admixed populations, due to the high heterozygosity and limited number of the loci used. Recent studies on ancestry inference using the CODIS loci suggest that these do confer more information of population-level identifiability than recognized in forensic genetic scholarship and by the medico-legal community. Here, we formally test the ability of CODIS and CODIS-Proxy (e.g. high heterozygosity and individual identifiability loci) marker panels to accurately estimate admixture proportions of individuals, including a sample of Latinos with a wide range of ancestry proportions. Using the same individuals in order to make direct comparisons of the outcomes, we produce ancestry estimates from 1) a small CODIS/CODIS Proxy loci panel and 2) a robust and validated microsatellite ancestry informative panel. We find evidence (e.g. ρ = 0.80 to 0.88) that supports the use of CODIS/CODIS-Proxy loci to capture the general ancestry estimation trends of a sample. This finding is in line with what studies using CODIS on Latin American populations have found, in that the ancestry estimations generated by CODIS present trends supported by documented population histories (e.g. colonialism and population movements) and microevolutionary events (e.g. gene flow) in Latin America. However, the present study also highlights the limitations of CODIS for making individual-level inferences of ancestry, as the associated estimates for an acceptable level of statistical confidence (95%) are demonstrated here to be too broad to make any nuanced inferences regarding the individual’s actual ancestry composition

    Cultural transmission of move choice in chess

    Full text link
    The study of cultural evolution benefits from detailed analysis of cultural transmission in specific human domains. Chess provides a platform for understanding the transmission of knowledge due to its active community of players, precise behaviors, and long-term records of high-quality data. In this paper, we perform an analysis of chess in the context of cultural evolution, describing multiple cultural factors that affect move choice. We then build a population-level statistical model of move choice in chess, based on the Dirichlet-multinomial likelihood, to analyze cultural transmission over decades of recorded games played by leading players. For moves made in specific positions, we evaluate the relative effects of frequency-dependent bias, success bias, and prestige bias on the dynamics of move frequencies. We observe that negative frequency-dependent bias plays a role in the dynamics of certain moves, and that other moves are compatible with transmission under prestige bias or success bias. These apparent biases may reflect recent changes, namely the introduction of computer chess engines and online tournament broadcasts. Our analysis of chess provides insights into broader questions concerning evolution of human behavioral preferences and modes of social learning.Comment: 25 page

    Variation in genetic admixture and population structure among Latinos: the Los Angeles Latino eye study (LALES)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Population structure and admixture have strong confounding effects on genetic association studies. Discordant frequencies for age-related macular degeneration (AMD) risk alleles and for AMD incidence and prevalence rates are reported across different ethnic groups. We examined the genomic ancestry characterizing 538 Latinos drawn from the Los Angeles Latino Eye Study [LALES] as part of an ongoing AMD-association study. To help assess the degree of Native American ancestry inherited by Latino populations we sampled 25 Mayans and 5 Mexican Indians collected through Coriell's Institute. Levels of European, Asian, and African descent in Latinos were inferred through the USC Multiethnic Panel (USC MEP), formed from a sample from the Multiethnic Cohort (MEC) study, the Yoruba African samples from HapMap II, the Singapore Chinese Health Study, and a prospective cohort from Shanghai, China. A total of 233 ancestry informative markers were genotyped for 538 LALES Latinos, 30 Native Americans, and 355 USC MEP individuals (African Americans, Japanese, Chinese, European Americans, Latinos, and Native Hawaiians). Sensitivity of ancestry estimates to relative sample size was considered.</p> <p>Results</p> <p>We detected strong evidence for recent population admixture in LALES Latinos. Gradients of increasing Native American background and of correspondingly decreasing European ancestry were observed as a function of birth origin from North to South. The strongest excess of homozygosity, a reflection of recent population admixture, was observed in non-US born Latinos that recently populated the US. A set of 42 SNPs especially informative for distinguishing between Native Americans and Europeans were identified.</p> <p>Conclusion</p> <p>These findings reflect the historic migration patterns of Native Americans and suggest that while the 'Latino' label is used to categorize the entire population, there exists a strong degree of heterogeneity within that population, and that it will be important to assess this heterogeneity within future association studies on Latino populations. Our study raises awareness of the diversity within "Latinos" and the necessity to assess appropriate risk and treatment management.</p

    Genealogical lineage sorting leads to significant, but incorrect Bayesian multilocus inference of population structure

    Get PDF
    Over the past decades, the use of molecular markers has revolutionized biology and led to the foundation of a new research discipline—phylogeography. Of particular interest has been the inference of population structure and biogeography. While initial studies focused on mtDNA as a molecular marker, it has become apparent that selection and genealogical lineage sorting could lead to erroneous inferences. As it is not clear to what extent these forces affect a given marker, it has become common practice to use the combined evidence from a set of molecular markers as an attempt to recover the signals that approximate the true underlying demography. Typically, the number of markers used is determined by either budget constraints or by statistical power required to recognize significant population differentiation. Using microsatellite markers from Drosophila and humans, we show that even large numbers of loci (>50) can frequently result in statistically well-supported, but incorrect inference of population structure using the software baps. Most importantly, genomic features, such as chromosomal location, variability of the markers, or recombination rate, cannot explain this observation. Instead, it can be attributed to sampling variation among loci with different realizations of the stochastic lineage sorting. This phenomenon is particularly pronounced for low levels of population differentiation. Our results have important implications for ongoing studies of population differentiation, as we unambiguously demonstrate that statistical significance of population structure inferred from a random set of genetic markers cannot necessarily be taken as evidence for a reliable demographic inference
    corecore