23 research outputs found

    A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data

    Get PDF
    Statistical tests for Hardy–Weinberg equilibrium have been an important tool for detecting genotyping errors in the past, and remain important in the quality control of next generation sequence data. In this paper, we analyze complete chromosomes of the 1000 genomes project by using exact test procedures for autosomal and X-chromosomal variants. We find that the rate of disequilibrium largely exceeds what might be expected by chance alone for all chromosomes. Observed disequilibrium is, in about 60% of the cases, due to heterozygote excess. We suggest that most excess disequilibrium can be explained by sequencing problems, and hypothesize mechanisms that can explain exceptional heterozygosities. We report higher rates of disequilibrium for the MHC region on chromosome 6, regions flanking centromeres and p-arms of acrocentric chromosomes. We also detected long-range haplotypes and areas with incidental high disequilibrium. We report disequilibrium to be related to read depth, with variants having extreme read depths being more likely to be out of equilibrium. Disequilibrium rates were found to be 11 times higher in segmental duplications and simple tandem repeat regions. The variants with significant disequilibrium are seen to be concentrated in these areas. For next generation sequence data, Hardy–Weinberg disequilibrium seems to be a major indicator for copy number variation.Peer ReviewedPostprint (published version

    Measuring Nepotism through Shared Last Names: The Case of Italian Academia

    Get PDF
    Nepotistic practices are detrimental for academia. Here I show how disciplines with a high likelihood of nepotism can be detected using standard statistical techniques based on shared last names among professors. As an example, I analyze the set of all 61,340 Italian academics. I find that nepotism is prominent in Italy, with particular disciplinary sectors being detected as especially problematic. Out of 28 disciplines, 9 – accounting for more than half of Italian professors – display a significant paucity of last names. Moreover, in most disciplines a clear north-south trend emerges, with likelihood of nepotism increasing with latitude. Even accounting for the geographic clustering of last names, I find that for many disciplines the probability of name-sharing is boosted when professors work in the same institution or sub-discipline. Using these techniques policy makers can target cuts and funding in order to promote fair practices

    A straightforward multiallelic significance test for the Hardy-Weinberg equilibrium law

    Get PDF
    Much forensic inference based upon DNA evidence is made assuming Hardy-Weinberg Equilibrium (HWE) for the genetic loci being used. Several statistical tests to detect and measure deviation from HWE have been devised, and their limitations become more obvious when testing for deviation within multiallelic DNA loci. The most popular methods-Chi-square and Likelihood-ratio tests-are based on asymptotic results and cannot guarantee a good performance in the presence of low frequency genotypes. Since the parameter space dimension increases at a quadratic rate on the number of alleles, some authors suggest applying sequential methods, where the multiallelic case is reformulated as a sequence of “biallelic” tests. However, in this approach it is not obvious how to assess the general evidence of the original hypothesis; nor is it clear how to establish the significance level for its acceptance/rejection. In this work, we introduce a straightforward method for the multiallelic HWE test, which overcomes the aforementioned issues of sequential methods. The core theory for the proposed method is given by the Full Bayesian Significance Test (FBST), an intuitive Bayesian approach which does not assign positive probabilities to zero measure sets when testing sharp hypotheses. We compare FBST performance to Chi-square, Likelihood-ratio and Markov chain tests, in three numerical experiments. The results suggest that FBST is a robust and high performance method for the HWE test, even in the presence of several alleles and small sample sizes

    Quantitative Analysis of Single Nucleotide Polymorphisms within Copy Number Variation

    Get PDF
    BACKGROUND: Single nucleotide polymorphisms (SNPs) have been used extensively in genetics and epidemiology studies. Traditionally, SNPs that did not pass the Hardy-Weinberg equilibrium (HWE) test were excluded from these analyses. Many investigators have addressed possible causes for departure from HWE, including genotyping errors, population admixture and segmental duplication. Recent large-scale surveys have revealed abundant structural variations in the human genome, including copy number variations (CNVs). This suggests that a significant number of SNPs must be within these regions, which may cause deviation from HWE. RESULTS: We performed a Bayesian analysis on the potential effect of copy number variation, segmental duplication and genotyping errors on the behavior of SNPs. Our results suggest that copy number variation is a major factor of HWE violation for SNPs with a small minor allele frequency, when the sample size is large and the genotyping error rate is 0~1%. CONCLUSIONS: Our study provides the posterior probability that a SNP falls in a CNV or a segmental duplication, given the observed allele frequency of the SNP, sample size and the significance level of HWE testing

    Association between DNA Damage Response and Repair Genes and Risk of Invasive Serous Ovarian Cancer

    Get PDF
    BACKGROUND: We analyzed the association between 53 genes related to DNA repair and p53-mediated damage response and serous ovarian cancer risk using case-control data from the North Carolina Ovarian Cancer Study (NCOCS), a population-based, case-control study. METHODS/PRINCIPAL FINDINGS: The analysis was restricted to 364 invasive serous ovarian cancer cases and 761 controls of white, non-Hispanic race. Statistical analysis was two staged: a screen using marginal Bayes factors (BFs) for 484 SNPs and a modeling stage in which we calculated multivariate adjusted posterior probabilities of association for 77 SNPs that passed the screen. These probabilities were conditional on subject age at diagnosis/interview, batch, a DNA quality metric and genotypes of other SNPs and allowed for uncertainty in the genetic parameterizations of the SNPs and number of associated SNPs. Six SNPs had Bayes factors greater than 10 in favor of an association with invasive serous ovarian cancer. These included rs5762746 (median OR(odds ratio)(per allele) = 0.66; 95% credible interval (CI) = 0.44-1.00) and rs6005835 (median OR(per allele) = 0.69; 95% CI = 0.53-0.91) in CHEK2, rs2078486 (median OR(per allele) = 1.65; 95% CI = 1.21-2.25) and rs12951053 (median OR(per allele) = 1.65; 95% CI = 1.20-2.26) in TP53, rs411697 (median OR (rare homozygote) = 0.53; 95% CI = 0.35 - 0.79) in BACH1 and rs10131 (median OR( rare homozygote) = not estimable) in LIG4. The six most highly associated SNPs are either predicted to be functionally significant or are in LD with such a variant. The variants in TP53 were confirmed to be associated in a large follow-up study. CONCLUSIONS/SIGNIFICANCE: Based on our findings, further follow-up of the DNA repair and response pathways in a larger dataset is warranted to confirm these results

    Testing for Hardy–Weinberg equilibrium at biallelic genetic markers on the X chromosome

    Get PDF
    Testing genetic markers for Hardy–Weinberg equilibrium (HWE) is an important tool for detecting genotyping errors in large-scale genotyping studies. For markers at the X chromosome, typically the ¿2 or exact test is applied to the females only, and the hemizygous males are considered to be uninformative. In this paper we show that the males are relevant, because a difference in allele frequency between males and females may indicate HWE not to hold. The testing of markers on the X chromosome has received little attention, and in this paper we lay down the foundation for testing biallelic X-chromosomal markers for HWE. We develop four frequentist statistical test procedures for X-linked markers that take both males and females into account: the ¿2 test, likelihood ratio test, exact test and permutation test. Exact tests that include males are shown to have a better Type I error rate. Empirical data from the GENEVA project on venous thromboembolism is used to illustrate the proposed tests. Results obtained with the new tests differ substantially from tests that are based on female genotype counts only. The new tests detect differences in allele frequencies and seem able to uncover additional genotyping error that would have gone unnoticed in HWE tests based on females onlyPeer ReviewedPostprint (published version

    Compatible priors for Bayesian model comparison with an application to the Hardy-Weinberg equilibrium model

    No full text
    Suppose we entertain Bayesian inference under a collection of models. This requires assigning a corresponding collection of prior distributions, one for each model\u2019s parameter space. In this paper we address the issue of relating priors across models, and provide both a conceptual and a pragmatic justification for this task. Specifically, we consider the notion of \u201ccompatible\u201d priors across models, and discuss and compare several strategies to construct such distributions. To explicate the issues involved, we refer to a specific problem, namely, testing the Hardy\u2013Weinberg Equilibrium model, for which we provide a detailed analysis using Bayes factors
    corecore