110,088 research outputs found

    Characterisation of the genomic architecture of human chromosome 17q and evaluation of different methods for haplotype block definition

    Get PDF
    BACKGROUND: The selection of markers in association studies can be informed through the use of haplotype blocks. Recent reports have determined the genomic architecture of chromosomal segments through different haplotype block definitions based on linkage disequilibrium (LD) measures or haplotype diversity criteria. The relative applicability of distinct block definitions to association studies, however, remains unclear. We compared different block definitions in 6.1 Mb of chromosome 17q in 189 unrelated healthy individuals. Using 137 single nucleotide polymorphisms (SNPs), at a median spacing of 15.5 kb, we constructed haplotype block maps using published methods and additional methods we have developed. Haplotype tagging SNPs (htSNPs) were identified for each map. RESULTS: Blocks were found to be shorter and coverage of the region limited with methods based on LD measures, compared to the method based on haplotype diversity. Although the distribution of blocks was highly variable, the number of SNPs that needed to be typed in order to capture the maximum number of haplotypes was consistent. CONCLUSION: For the marker spacing used in this study, choice of block definition is not important when used as an initial screen of the region to identify htSNPs. However, choice of block definition has consequences for the downstream interpretation of association study results

    Haplotype reconstruction error as a classical misclassification problem

    Get PDF
    Statistically reconstructing haplotypes from single nucleotide polymorphism (SNP) genotypes, can lead to falsely classified haplotypes. This can be an issue when interpreting haplotype association results or when selecting subjects with certain haplotypes for subsequent functional studies. It was our aim to quantify haplotype reconstruction error and to provide tools for it. By numerous simulation scenarios, we systematically investigated several error measures, including discrepancy, error rate, and R(2), and introduced the sensitivity and specificity to this context. We exemplified several measures in the KORA study, a large population-based study from Southern Germany. We find that the specificity is slightly reduced only for common haplotypes, while the sensitivity was decreased for some, but not all rare haplotypes. The overall error rate was generally increasing with increasing number of loci, increasing minor allele frequency of SNPs, decreasing correlation between the alleles and increasing ambiguity. We conclude that, with the analytical approach presented here, haplotype-specific error measures can be computed to gain insight into the haplotype uncertainty. This method provides the information, if a specific risk haplotype can be expected to be reconstructed with rather no or high misclassification and thus on the magnitude of expected bias in association estimates. We also illustrate that sensitivity and specificity separate two dimensions of the haplotype reconstruction error, which completely describe the misclassification matrix and thus provide the prerequisite for methods accounting for misclassification

    Association of FCGR3A and FCGR3B haplotypes with rheumatoid arthritis and primary Sjögren's syndrome [POSTER PRESENTATION]

    Get PDF
    Background Rheumatoid arthritis (RA) is an autoimmune disease that is thought to arise from a complex interaction between multiple genetic factors and environmental triggers. We have previously demonstrated an association between a Fc gamma receptor (FcγR) haplotype and RA in a cross-sectional cohort of RA patients. We have sought to confirm this association in an inception cohort of RA patients and matched controls. We also extended our study to investigate a second autoanti-body associated rheumatic disease, primary Sjögren's syndrome (PSS). Methods The FCGR3A-158F/V and FCGR3B-NA1/NA2 functional polymorphisms were examined for association in an inception cohort of RA patients (n = 448), and a well-characterised PSS cohort (n = 83) from the United Kingdom. Pairwise disequilibrium coefficients (D') were calculated in 267 Blood Service healthy controls. The EHPlus program was used to estimate haplotype frequencies for patients and controls and to determine whether significant linkage disequilibrium was present. A likelihood ratio test is performed to test for differences between the haplotype frequencies in cases and controls. A permutation procedure implemented in this program enabled 1000 permutations to be performed on all haplotype associations to assess significance. Results There was significant linkage disequilibrium between FCGR3A and FCGR3B (D' = -0.445, P = 0.001). There was no significant difference in the FCGR3A or FCGR3B allele or genotype frequencies in the RA or PSS patients compared with controls. However, there was a significant difference in the FCGR3A-FCGR3B haplotype distributions with increased homozygosity for the FCGR3A-FCGR3B 158V-NA2 haplotype in both our inception RA cohort (odds ratio = 2.15, 95% confidence interval = 1.1–4.2 P = 0.027) and PSS (odds ratio = 2.83, 95% confidence interval = 1.0–8.2, P = 0.047) compared with controls. The reference group for these analyses comprised individuals who did not possess a copy of the FCGR3A-FCGR3B 158V-NA2 haplotype. Conclusions We have confirmed our original findings of association between the FCGR3A-FCGR3B 158V-NA2 haplotype and RA in a new inception cohort of RA patients. This suggests that there may be an RA-susceptibility gene at this locus. The significant increased frequency of an identical haplotype in PSS suggests the FcγR genetic locus may contribute to the pathogenesis of diverse autoantibody-mediated rheumatic diseases

    Joint Haplotype Assembly and Genotype Calling via Sequential Monte Carlo Algorithm

    Get PDF
    Genetic variations predispose individuals to hereditary diseases, play important role in the development of complex diseases, and impact drug metabolism. The full information about the DNA variations in the genome of an individual is given by haplotypes, the ordered lists of single nucleotide polymorphisms (SNPs) located on chromosomes. Affordable high-throughput DNA sequencing technologies enable routine acquisition of data needed for the assembly of single individual haplotypes. However, state-of-the-art high-throughput sequencing platforms generate data that is erroneous, which induces uncertainty in the SNP and genotype calling procedures and, ultimately, adversely affect the accuracy of haplotyping. When inferring haplotype phase information, the vast majority of the existing techniques for haplotype assembly assume that the genotype information is correct. This motivates the development of methods capable of joint genotype calling and haplotype assembly. Results: We present a haplotype assembly algorithm, ParticleHap, that relies on a probabilistic description of the sequencing data to jointly infer genotypes and assemble the most likely haplotypes. Our method employs a deterministic sequential Monte Carlo algorithm that associates single nucleotide polymorphisms with haplotypes by exhaustively exploring all possible extensions of the partial haplotypes. The algorithm relies on genotype likelihoods rather than on often erroneously called genotypes, thus ensuring a more accurate assembly of the haplotypes. Results on both the 1000 Genomes Project experimental data as well as simulation studies demonstrate that the proposed approach enables highly accurate solutions to the haplotype assembly problem while being computationally efficient and scalable, generally outperforming existing methods in terms of both accuracy and speed. Conclusions: The developed probabilistic framework and sequential Monte Carlo algorithm enable joint haplotype assembly and genotyping in a computationally efficient manner. Our results demonstrate fast and highly accurate haplotype assembly aided by the re-examination of erroneously called genotypes.National Science Foundation CCF-1320273Electrical and Computer Engineerin

    Analysis of Fcγ receptor haplotypes in rheumatoid arthritis: FCGR3A remains a major susceptibility gene at this locus, with an additional contribution from FCGR3B

    Get PDF
    The Fcγ receptors play important roles in the initiation and regulation of many immunological and inflammatory processes, and genetic variants (FCGR) have been associated with numerous autoimmune and infectious diseases. The data in rheumatoid arthritis (RA) are conflicting and we previously demonstrated an association between FCGR3A and RA. In view of the close molecular proximity with FCGR2A, FCGR2B and FCGR3B, additional polymorphisms within these genes and FCGR haplotypes were examined to refine the extent of association with RA. Biallelic polymorphisms in FCGR2A, FCGR2B and FCGR3B were examined for association with RA in two well characterized UK Caucasian and North Indian/Pakistani cohorts, in which FCGR3A genotyping had previously been undertaken. Haplotype frequencies and linkage disequilibrium were estimated across the FCGR locus and a model-free analysis was performed to determine association with RA. This was followed by regression analysis, allowing for phase uncertainty, to identify the particular haplotype(s) that influences disease risk. Our results reveal that FCGR2A, FCGR2B and FCGR3B were not associated with RA. The haplotype with the strongest association with RA susceptibility was the FCGR3A–FCGR3B 158V-NA2 haplotype (odds ratio 3.18, 95% confidence interval 1.13–8.92 [P = 0.03] for homozygotes compared with all genotypes). The association was stronger in the presence of nodules (odds ratio 5.03, 95% confidence interval 1.44–17.56; P = 0.01). This haplotype was also more common in North Indian/Pakistani RA patients than in control individuals, but not significantly so. Logistic regression analyses suggested that FCGR3A remained the most significant gene at this locus. The increased association with an FCGR3A–FCGR3B haplotype suggests that other polymorphic variants within FCGR3A or FCGR3B, or in linkage disequilibrium with this haplotype, may additionally contribute to disease pathogenesis

    Haplotype Assembly: An Information Theoretic View

    Full text link
    This paper studies the haplotype assembly problem from an information theoretic perspective. A haplotype is a sequence of nucleotide bases on a chromosome, often conveniently represented by a binary string, that differ from the bases in the corresponding positions on the other chromosome in a homologous pair. Information about the order of bases in a genome is readily inferred using short reads provided by high-throughput DNA sequencing technologies. In this paper, the recovery of the target pair of haplotype sequences using short reads is rephrased as a joint source-channel coding problem. Two messages, representing haplotypes and chromosome memberships of reads, are encoded and transmitted over a channel with erasures and errors, where the channel model reflects salient features of high-throughput sequencing. The focus of this paper is on the required number of reads for reliable haplotype reconstruction, and both the necessary and sufficient conditions are presented with order-wise optimal bounds.Comment: 30 pages, 5 figures, 1 tabel, journa

    Search for Risk Haplotype Segments with GWAS Data by Use of Finite Mixture Models

    Get PDF
    The region-based association analysis has been proposed to capture the collective behavior of sets of variants by testing the association of each set instead of individual variants with the disease. Such an analysis typically involves a list of unphased multiple-locus genotypes with potentially sparse frequencies in cases and controls. To tackle the problem of the sparse distribution, a two-stage approach was proposed in literature: In the first stage, haplotypes are computationally inferred from genotypes, followed by a haplotype co-classification. In the second stage, the association analysis is performed on the inferred haplotype groups. If a haplotype is unevenly distributed between the case and control samples, this haplotype is labeled as a risk haplotype. Unfortunately, the in-silico reconstruction of haplotypes might produce a proportion of false haplotypes which hamper the detection of rare but true haplotypes. Here, to address the issue, we propose an alternative approach: In Stage 1, we cluster genotypes instead of inferred haplotypes and estimate the risk genotypes based on a finite mixture model. In Stage 2, we infer risk haplotypes from risk genotypes inferred from the previous stage. To estimate the finite mixture model, we propose an EM algorithm with a novel data partition-based initialization. The performance of the proposed procedure is assessed by simulation studies and a real data analysis. Compared to the existing multiple Z-test procedure, we find that the power of genome-wide association studies can be increased by using the proposed procedure
    • …
    corecore