16 research outputs found

    A systematic search for SNPs/haplotypes associated with disease phenotypes using a haplotype-based stepwise procedure

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genotyping technologies enable us to genotype multiple Single Nucleotide Polymorphisms (SNPs) within selected genes/regions, providing data for haplotype association analysis. While haplotype-based association analysis is powerful for detecting untyped causal alleles in linkage-disequilibrium (LD) with neighboring SNPs/haplotypes, the inclusion of extraneous SNPs could reduce its power by increasing the number of haplotypes with each additional SNP.</p> <p>Methods</p> <p>Here, we propose a haplotype-based stepwise procedure (HBSP) to eliminate extraneous SNPs. To evaluate its properties, we applied HBSP to both simulated and real data, generated from a study of genetic associations of the bactericidal/permeability-increasing (BPI) gene with pulmonary function in a cohort of patients following bone marrow transplantation.</p> <p>Results</p> <p>Under the null hypothesis, use of the HBSP gave results that retained the desired false positive error rates when multiple comparisons were considered. Under various alternative hypotheses, HBSP had adequate power to detect modest genetic associations in case-control studies with 500, 1,000 or 2,000 subjects. In the current application, HBSP led to the identification of two specific SNPs with a positive validation.</p> <p>Conclusion</p> <p>These results demonstrate that HBSP retains the essence of haplotype-based association analysis while improving analytic power by excluding extraneous SNPs. Minimizing the number of SNPs also enables simpler interpretation and more cost-effective applications.</p

    Sequencing genes in silico using single nucleotide polymorphisms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The advent of high throughput sequencing technology has enabled the 1000 Genomes Project Pilot 3 to generate complete sequence data for more than 906 genes and 8,140 exons representing 697 subjects. The 1000 Genomes database provides a critical opportunity for further interpreting disease associations with single nucleotide polymorphisms (SNPs) discovered from genetic association studies. Currently, direct sequencing of candidate genes or regions on a large number of subjects remains both cost- and time-prohibitive.</p> <p>Results</p> <p>To accelerate the translation from discovery to functional studies, we propose an in silico gene sequencing method (ISS), which predicts phased sequences of intragenic regions, using SNPs. The key underlying idea of our method is to infer diploid sequences (a pair of phased sequences/alleles) at every functional locus utilizing the deep sequencing data from the 1000 Genomes Project and SNP data from the HapMap Project, and to build prediction models using flanking SNPs. Using this method, we have developed a database of prediction models for 611 known genes. Sequence prediction accuracy for these genes is 96.26% on average (ranges 79%-100%). This database of prediction models can be enhanced and scaled up to include new genes as the 1000 Genomes Project sequences additional genes on additional individuals. Applying our predictive model for the KCNJ11 gene to the Wellcome Trust Case Control Consortium (WTCCC) Type 2 diabetes cohort, we demonstrate how the prediction of phased sequences inferred from GWAS SNP genotype data can be used to facilitate interpretation and identify a probable functional mechanism such as protein changes.</p> <p>Conclusions</p> <p>Prior to the general availability of routine sequencing of all subjects, the ISS method proposed here provides a time- and cost-effective approach to broadening the characterization of disease associated SNPs and regions, and facilitating the prioritization of candidate genes for more detailed functional and mechanistic studies.</p

    A Common Dominant TLR5 Stop Codon Polymorphism Abolishes Flagellin Signaling and Is Associated with Susceptibility to Legionnaires' Disease

    Get PDF
    Although Toll-like receptors (TLRs) are critical mediators of the immune response to pathogens, the influence of polymorphisms in this gene family on human susceptibility to infection is poorly understood. We demonstrated recently that TLR5 recognizes flagellin, a potent inflammatory stimulus present in the flagellar structure of many bacteria. Here, we show that a common stop codon polymorphism in the ligand-binding domain of TLR5 (TLR5392STOP) is unable to mediate flagellin signaling, acts in a dominant fashion, and is associated with susceptibility to pneumonia caused by Legionella pneumophila, a flagellated bacterium. We also show that flagellin is a principal stimulant of proinflammatory cytokine production in lung epithelial cells. Together, these observations suggest that TLR5392STOP increases human susceptibility to infection through an unusual dominant mechanism that compromises TLR5's essential role as a regulator of the lung epithelial innate immune response

    Empirical vs Bayesian approach for estimating haplotypes from genotypes of unrelated individuals

    Get PDF
    BACKGROUND: The completion of the HapMap project has stimulated further development of haplotype-based methodologies for disease associations. A key aspect of such development is the statistical inference of individual diplotypes from unphased genotypes. Several methodologies for inferring haplotypes have been developed, but they have not been evaluated extensively to determine which method not only performs well, but also can be easily incorporated in downstream haplotype-based association analyses. In this paper, we attempt to do so. Our evaluation was carried out by comparing the two leading Bayesian methods, implemented in PHASE and HAPLOTYPER, and the two leading empirical methods, implemented in PL-EM and HPlus. We used these methods to analyze real data, namely the dense genotypes on X-chromosome of 30 European and 30 African trios provided by the International HapMap Project, and simulated genotype data. Our conclusions are based on these analyses. RESULTS: All programs performed very well on X-chromosome data, with an average similarity index of 0.99 and an average prediction rate of 0.99 for both European and African trios. On simulated data with approximation of coalescence, PHASE implementing the Bayesian method based on the coalescence approximation outperformed other programs on small sample sizes. When the sample size increased, other programs performed as well as PHASE. PL-EM and HPlus implementing empirical methods required much less running time than the programs implementing the Bayesian methods. They required only one hundredth or thousandth of the running time required by PHASE, particularly when analyzing large sample sizes and large umber of SNPs. CONCLUSION: For large sample sizes (hundreds or more), which most association studies require, the two empirical methods might be used since they infer the haplotypes as accurately as any Bayesian methods and can be incorporated easily into downstream haplotype-based analyses such as haplotype-association analyses

    A Method for the Assessment of Disease Associations with Single-Nucleotide Polymorphism Haplotypes and Environmental Variables in Case-Control Studies

    Get PDF
    The rough draft of the human genome map has been used to identify most of the functional genes in the human genome, as well as to identify nucleotide variations, known as “single-nucleotide polymorphisms” (SNPs), in these genes. By use of advanced biotechnologies, researchers are beginning to genotype thousands of SNPs from biological samples. Among the many possible applications, one of them is the study of SNP associations with complex human diseases, such as cancers or coronary heart diseases, by using a case-control study design. Through the gathering of environmental risk factors and other lifestyle factors, such a study can be effectively used to investigate interactions between genes and environmental factors in their associations with disease phenotype. Earlier, we developed a method to statistically construct individuals’ haplotypes and to estimate the distribution of haplotypes of multiple SNPs in a defined population, by use of estimating-equation techniques. Extending this idea, we describe here an analytic method for assessing the association between the constructed haplotypes along with environmental factors and the disease phenotype. This method is also robust to the model assumptions and is scalable to a large number of SNPs. Asymptotic properties of estimations in the method are proved theoretically and are tested for finite sample sizes by use of simulations. To demonstrate the use of the method, we applied it to assess the possible association between apolipoprotein CIII (six coding SNPs) and restenosis by using a case-control data set. Our analysis revealed two haplotypes that may reduce the risk of restenosis

    Genetic Association of the Antiviral Restriction Factor TRIM5α with Human Immunodeficiency Virus Type 1 Infection

    No full text
    The innate antiviral factor TRIM5α restricts the replication of some retroviruses through its interaction with the viral capsid protein, leading to abortive infection. While overexpression of human TRIM5α results in modest restriction of human immunodeficiency virus type 1 (HIV-1), this inhibition is insufficient to block productive infection of human cells. We hypothesized that polymorphisms within TRIM5 may result in increased restriction of HIV-1 infection. We sequenced the TRIM5 gene (excluding exon 5) and the 4.8-kb 5′ putative regulatory region in genomic DNA from 110 HIV-1-infected subjects and 96 exposed seronegative persons, along with targeted gene sequencing in a further 30 HIV-1-infected individuals. Forty-eight single nucleotide polymorphisms (SNPs), including 20 with allele frequencies of >1.0%, were identified. Among these were two synonymous and eight nonsynonymous coding polymorphisms. We observed no association between TRIM5 polymorphism in HIV-1-infected subjects and their set-point viral load after acute infection, although one TRIM5 haplotype was weakly associated with more rapid CD4(+) T-cell loss. Importantly, a TRIM5 haplotype containing the nonsynonymous SNP R136Q showed increased frequency among HIV-1-infected subjects relative to exposed seronegative persons, with an odds ratio of 5.49 (95% confidence interval = 1.83 to 16.45; P = 0.002). Nonetheless, we observed no effect of individual TRIM5α nonsynonymous mutations on the in vitro HIV-1 susceptibility of CD4(+) T cells. Therefore, any effect of TRIM5α polymorphism on HIV-1 infection in primary lymphocytes may depend on combinations of SNPs or on DNA sequences in linkage disequilibrium with the TRIM5α coding sequence
    corecore