27 research outputs found

    A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data

    Get PDF
    Statistical tests for Hardy–Weinberg equilibrium have been an important tool for detecting genotyping errors in the past, and remain important in the quality control of next generation sequence data. In this paper, we analyze complete chromosomes of the 1000 genomes project by using exact test procedures for autosomal and X-chromosomal variants. We find that the rate of disequilibrium largely exceeds what might be expected by chance alone for all chromosomes. Observed disequilibrium is, in about 60% of the cases, due to heterozygote excess. We suggest that most excess disequilibrium can be explained by sequencing problems, and hypothesize mechanisms that can explain exceptional heterozygosities. We report higher rates of disequilibrium for the MHC region on chromosome 6, regions flanking centromeres and p-arms of acrocentric chromosomes. We also detected long-range haplotypes and areas with incidental high disequilibrium. We report disequilibrium to be related to read depth, with variants having extreme read depths being more likely to be out of equilibrium. Disequilibrium rates were found to be 11 times higher in segmental duplications and simple tandem repeat regions. The variants with significant disequilibrium are seen to be concentrated in these areas. For next generation sequence data, Hardy–Weinberg disequilibrium seems to be a major indicator for copy number variation.Peer ReviewedPostprint (published version

    Kinship Index Variations among Populations and Thresholds for Familial Searching

    Get PDF
    Current familial searching strategies are developed primarily based on autosomal STR loci, since most of the offender profiles in the forensic DNA databases do not contain Y-STR or mitochondrial DNA data. There are generally two familial searching methods, Identity-by-State (IBS) based methods or kinship index (KI) based methods. The KI based method is an analytically superior method because the allele frequency information is considered as opposed to solely allele counting. However, multiple KIs should be calculated if the unknown forensic profile may be attributed to multiple possible relevant populations. An important practical issue is the KI threshold to select for limiting the list of candidates from a search. There are generally three strategies of setting the KI threshold for familial searching: (1) SWGDAM recommendation 6; (2) minimum KI≥KI threshold; and (3) maximum KI≥KI threshold. These strategies were evaluated and compared by using both simulation data and empirical data. The minimum KI will tend to be closer to the KI appropriate for the population of which the forensic profile belongs. The minimum KI≥KI threshold performs better than the maximum KI≥KI threshold. The SWGDAM strategy may be too stringent for familial searching with large databases (e.g., 1 million or more profiles), because its KI thresholds depend on the database size and the KI thresholds of large databases have a higher probability to exclude true relatives than smaller databases. Minimum KI≥KI threshold strategy is a better option, as it provides the flexibility to adjust the KI threshold according to a pre-determined number of candidates or false positive/negative rates. Joint use of both IBS and KI does not significantly reduce the chance of including true relatives in a candidate list, but does provide a higher efficiency of familial searching

    Genetic prediction of complex traits: integrating infinitesimal and marked genetic effects

    Get PDF
    Genetic prediction for complex traits is usually based on models including individual (infinitesimal) or marker effects. Here, we concentrate on models including both the individual and the marker effects. In particular, we develop a ''Mendelian segregation'' model combining infinitesimal effects for base individuals and realized Mendelian sampling in descendants described by the available DNA data. The model is illustrated with an example and the analyses of a public simulated data file. Further, the potential contribution of such models is assessed by simulation. Accuracy, measured as the correlation between true (simulated) and predicted genetic values, was similar for all models compared under different genetic backgrounds. As expected, the segregation model is worthwhile when markers capture a low fraction of total genetic variance. (Résumé d'auteur

    Widespread adaptive evolution during repeated evolutionary radiations in New World lupins

    Get PDF
    The evolutionary processes that drive rapid species diversification are poorly understood. In particular, it is unclear whether Darwinian adaptation or non-adaptive processes are the primary drivers of explosive species diversifications. Here we show that repeated rapid radiations within New World lupins (Lupinus, Leguminosae) were underpinned by a major increase in the frequency of adaptation acting on coding and regulatory changes genome-wide. This contrasts with far less frequent adaptation in genomes of slowly diversifying lupins and all other plant genera analysed. Furthermore, widespread shifts in optimal gene expression coincided with shifts to high rates of diversification and evolution of perenniality, a putative key adaptation trait thought to have triggered the evolutionary radiations in New World lupins. Our results reconcile long-standing debate about the relative importance of protein-coding and regulatory evolution, and represent the first unambiguous evidence for the rapid onset of lineage- and genome-wide accelerated Darwinian evolution during rapid species diversification
    corecore