99 research outputs found

    A genome-wide study of Hardy–Weinberg equilibrium with next generation sequence data

    Get PDF
    Statistical tests for Hardy–Weinberg equilibrium have been an important tool for detecting genotyping errors in the past, and remain important in the quality control of next generation sequence data. In this paper, we analyze complete chromosomes of the 1000 genomes project by using exact test procedures for autosomal and X-chromosomal variants. We find that the rate of disequilibrium largely exceeds what might be expected by chance alone for all chromosomes. Observed disequilibrium is, in about 60% of the cases, due to heterozygote excess. We suggest that most excess disequilibrium can be explained by sequencing problems, and hypothesize mechanisms that can explain exceptional heterozygosities. We report higher rates of disequilibrium for the MHC region on chromosome 6, regions flanking centromeres and p-arms of acrocentric chromosomes. We also detected long-range haplotypes and areas with incidental high disequilibrium. We report disequilibrium to be related to read depth, with variants having extreme read depths being more likely to be out of equilibrium. Disequilibrium rates were found to be 11 times higher in segmental duplications and simple tandem repeat regions. The variants with significant disequilibrium are seen to be concentrated in these areas. For next generation sequence data, Hardy–Weinberg disequilibrium seems to be a major indicator for copy number variation.Peer ReviewedPostprint (published version

    Measuring Nepotism through Shared Last Names: The Case of Italian Academia

    Get PDF
    Nepotistic practices are detrimental for academia. Here I show how disciplines with a high likelihood of nepotism can be detected using standard statistical techniques based on shared last names among professors. As an example, I analyze the set of all 61,340 Italian academics. I find that nepotism is prominent in Italy, with particular disciplinary sectors being detected as especially problematic. Out of 28 disciplines, 9 – accounting for more than half of Italian professors – display a significant paucity of last names. Moreover, in most disciplines a clear north-south trend emerges, with likelihood of nepotism increasing with latitude. Even accounting for the geographic clustering of last names, I find that for many disciplines the probability of name-sharing is boosted when professors work in the same institution or sub-discipline. Using these techniques policy makers can target cuts and funding in order to promote fair practices

    The combined effect of SNP-marker and phenotype attributes in genome-wide association studies

    Get PDF
    The last decade has seen rapid improvements in high-throughput single nucleotide polymorphism (SNP) genotyping technologies that have consequently made genome-wide association studies (GWAS) possible. With tens to hundreds of thousands of SNP markers being tested simultaneously in GWAS, it is imperative to appropriately pre-process, or filter out, those SNPs that may lead to false associations. This paper explores the relationships between various SNP genotype and phenotype attributes and their effects on false associations. We show that (i) uniformly distributed ordinal data as well as binary data are more easily influenced, though not necessarily negatively, by differences in various SNP attributes compared with normally distributed data; (ii) filtering SNPs on minor allele frequency (MAF) and extent of Hardy–Weinberg equilibrium (HWE) deviation has little effect on the overall false positive rate; (iii) in some cases, filtering on MAF only serves to exclude SNPs from the analysis without reduction of the overall proportion of false associations; and (iv) HWE, MAF and heterozygosity are all dependent on minor genotype frequency, a newly proposed measure for genotype integrity

    Quantitative Analysis of Single Nucleotide Polymorphisms within Copy Number Variation

    Get PDF
    BACKGROUND: Single nucleotide polymorphisms (SNPs) have been used extensively in genetics and epidemiology studies. Traditionally, SNPs that did not pass the Hardy-Weinberg equilibrium (HWE) test were excluded from these analyses. Many investigators have addressed possible causes for departure from HWE, including genotyping errors, population admixture and segmental duplication. Recent large-scale surveys have revealed abundant structural variations in the human genome, including copy number variations (CNVs). This suggests that a significant number of SNPs must be within these regions, which may cause deviation from HWE. RESULTS: We performed a Bayesian analysis on the potential effect of copy number variation, segmental duplication and genotyping errors on the behavior of SNPs. Our results suggest that copy number variation is a major factor of HWE violation for SNPs with a small minor allele frequency, when the sample size is large and the genotyping error rate is 0~1%. CONCLUSIONS: Our study provides the posterior probability that a SNP falls in a CNV or a segmental duplication, given the observed allele frequency of the SNP, sample size and the significance level of HWE testing

    A straightforward multiallelic significance test for the Hardy-Weinberg equilibrium law

    Get PDF
    Much forensic inference based upon DNA evidence is made assuming Hardy-Weinberg Equilibrium (HWE) for the genetic loci being used. Several statistical tests to detect and measure deviation from HWE have been devised, and their limitations become more obvious when testing for deviation within multiallelic DNA loci. The most popular methods-Chi-square and Likelihood-ratio tests-are based on asymptotic results and cannot guarantee a good performance in the presence of low frequency genotypes. Since the parameter space dimension increases at a quadratic rate on the number of alleles, some authors suggest applying sequential methods, where the multiallelic case is reformulated as a sequence of “biallelic” tests. However, in this approach it is not obvious how to assess the general evidence of the original hypothesis; nor is it clear how to establish the significance level for its acceptance/rejection. In this work, we introduce a straightforward method for the multiallelic HWE test, which overcomes the aforementioned issues of sequential methods. The core theory for the proposed method is given by the Full Bayesian Significance Test (FBST), an intuitive Bayesian approach which does not assign positive probabilities to zero measure sets when testing sharp hypotheses. We compare FBST performance to Chi-square, Likelihood-ratio and Markov chain tests, in three numerical experiments. The results suggest that FBST is a robust and high performance method for the HWE test, even in the presence of several alleles and small sample sizes

    Association between DNA Damage Response and Repair Genes and Risk of Invasive Serous Ovarian Cancer

    Get PDF
    BACKGROUND: We analyzed the association between 53 genes related to DNA repair and p53-mediated damage response and serous ovarian cancer risk using case-control data from the North Carolina Ovarian Cancer Study (NCOCS), a population-based, case-control study. METHODS/PRINCIPAL FINDINGS: The analysis was restricted to 364 invasive serous ovarian cancer cases and 761 controls of white, non-Hispanic race. Statistical analysis was two staged: a screen using marginal Bayes factors (BFs) for 484 SNPs and a modeling stage in which we calculated multivariate adjusted posterior probabilities of association for 77 SNPs that passed the screen. These probabilities were conditional on subject age at diagnosis/interview, batch, a DNA quality metric and genotypes of other SNPs and allowed for uncertainty in the genetic parameterizations of the SNPs and number of associated SNPs. Six SNPs had Bayes factors greater than 10 in favor of an association with invasive serous ovarian cancer. These included rs5762746 (median OR(odds ratio)(per allele) = 0.66; 95% credible interval (CI) = 0.44-1.00) and rs6005835 (median OR(per allele) = 0.69; 95% CI = 0.53-0.91) in CHEK2, rs2078486 (median OR(per allele) = 1.65; 95% CI = 1.21-2.25) and rs12951053 (median OR(per allele) = 1.65; 95% CI = 1.20-2.26) in TP53, rs411697 (median OR (rare homozygote) = 0.53; 95% CI = 0.35 - 0.79) in BACH1 and rs10131 (median OR( rare homozygote) = not estimable) in LIG4. The six most highly associated SNPs are either predicted to be functionally significant or are in LD with such a variant. The variants in TP53 were confirmed to be associated in a large follow-up study. CONCLUSIONS/SIGNIFICANCE: Based on our findings, further follow-up of the DNA repair and response pathways in a larger dataset is warranted to confirm these results

    Association between TCF7L2 gene polymorphisms and susceptibility to Type 2 Diabetes Mellitus: a large Human Genome Epidemiology (HuGE) review and meta-analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcription factor 7-like 2 (<it>TCF7L2</it>) has been shown to be associated with type 2 diabetes mellitus (T2MD) in multiple ethnic groups in the past two years, but, contradictory results were reported for Chinese and Pima Indian populations. The authors then performed a large meta-analysis of 36 studies examining the association of type 2 diabetes mellitus (T2DM) with polymorphisms in the <it>TCF7L2 </it>gene in various ethnicities, containing rs7903146 C-to-T (IVS3C>T), rs7901695 T-to-C (IVS3T>C), a rs12255372 G-to-T (IVS4G>T), and rs11196205 G-to-C (IVS4G>C) polymorphisms and to evaluate the size of gene effect and the possible genetic mode of action.</p> <p>Methods</p> <p>Literature-based searching was conducted to collect data and three methods, that is, fixed-effects, random-effects and Bayesian multivariate mete-analysis, were performed to pool the odds ratio (<it>OR</it>). Publication bias and study-between heterogeneity were also examined.</p> <p>Results</p> <p>The studies included 35,843 cases of T2DM and 39,123 controls, using mainly primary data. For T2DM and IVS3C>T polymorphism, the Bayesian <it>OR </it>for TT homozygotes and TC heterozygotes versus CC homozygote was 1.968 (95% credible interval (<it>CrI</it>): 1.790, 2.157), 1.406 (95% <it>CrI</it>: 1.341, 1.476), respectively, and the population attributable risk (PAR) for the TT/TC genotypes of this variant is 16.9% for overall. For T2DM and IVS4G>T polymorphism, TT homozygotes and TG heterozygotes versus GG homozygote was 1.885 (95%<it>CrI</it>: 1.698, 2.088), 1.360 (95% <it>CrI</it>: 1.291, 1.433), respectively. Four <it>OR</it>s among these two polymorphisms all yielded significant between-study heterogeneity (P < 0.05) and the main source of heterogeneity was ethnic differences. Data also showed significant associations between T2DM and the other two polymorphisms, but with low heterogeneity (<it>P </it>> 0.10). Pooled <it>OR</it>s fit a codominant, multiplicative genetic model for all the four polymorphisms of <it>TCF7L2 </it>gene, and this model was also confirmed in different ethnic populations when stratification of IVS3C>T and IVS4G>T polymorphisms except for Africans, where a dominant, additive genetic mode is suggested for IVS3C>T polymorphism.</p> <p>Conclusion</p> <p>This meta-analysis demonstrates that four variants of <it>TCF7L2 </it>gene are all associated with T2DM, and indicates a multiplicative genetic model for all the four polymorphisms, as well as suggests the <it>TCF7L2 </it>gene involved in near 1/5 of all T2MD. Potential gene-gene and gene-environmental interactions by which common variants in the <it>TCF7L2 </it>gene influence the risk of T2MD need further exploration.</p

    The Framingham Heart Study 100K SNP genome-wide association study resource: overview of 17 phenotype working group reports

    Get PDF
    Background: The Framingham Heart Study (FHS), founded in 1948 to examine the epidemiology of cardiovascular disease, is among the most comprehensively characterized multi-generational studies in the world. Many collected phenotypes have substantial genetic contributors; yet most genetic determinants remain to be identified. Using single nucleotide polymorphisms (SNPs) from a 100K genome-wide scan, we examine the associations of common polymorphisms with phenotypic variation in this community-based cohort and provide a full-disclosure, web-based resource of results for future replication studies. Methods: Adult participants (n = 1345) of the largest 310 pedigrees in the FHS, many biologically related, were genotyped with the 100K Affymetrix GeneChip. These genotypes were used to assess their contribution to 987 phenotypes collected in FHS over 56 years of follow up, including: cardiovascular risk factors and biomarkers; subclinical and clinical cardiovascular disease; cancer and longevity traits; and traits in pulmonary, sleep, neurology, renal, and bone domains. We conducted genome-wide variance components linkage and population-based and family-based association tests. Results: The participants were white of European descent and from the FHS Original and Offspring Cohorts (examination 1 Offspring mean age 32 ± 9 years, 54% women). This overview summarizes the methods, selected findings and limitations of the results presented in the accompanying series of 17 manuscripts. The presented association results are based on 70,897 autosomal SNPs meeting the following criteria: minor allele frequency ≥ 10%, genotype call rate ≥ 80%, Hardy-Weinberg equilibrium p-value ≥ 0.001, and satisfying Mendelian consistency. Linkage analyses are based on 11,200 SNPs and short-tandem repeats. Results of phenotype-genotype linkages and associations for all autosomal SNPs are posted on the NCBI dbGaP website at http:// www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007. Conclusion: We have created a full-disclosure resource of results, posted on the dbGaP website, from a genome-wide association study in the FHS. Because we used three analytical approaches to examine the association and linkage of 987 phenotypes with thousands of SNPs, our results must be considered hypothesis-generating and need to be replicated. Results from the FHS 100K project with NCBI web posting provides a resource for investigators to identify high priority findings for replication.Molecular and Cellular Biolog
    corecore