28 research outputs found
Common <i>a1</i>-containing extended haplotypes in the WTCCC<sup>††</sup>.
<p>Common <i>a1</i>-containing extended haplotypes in the WTCCC<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0190043#t002fn001" target="_blank"><sup>††</sup></a>.</p
The WTCCC dataset consists of 59,884 haplotypes, of which 10,078 represent different (unique) combinations of the 5 HLA alleles and the SNP haplotypes (see text).
<p>For the purpose of this graph, these unique haplotypes (CEHs) have been sorted according to their descending frequency of occurrence in the WTCCC dataset. The cumulative number of unique haplotypes (beginning with the highest frequency haplotype) has been plotted against the percentage of total number of haplotypes in the population. As can be appreciated from the graph, the large majority (~80%) of the different CEHs have only a very low frequency, whereas 80% of the haplotypes in the population are accounted for by only small number of very common CEHs (i.e., ~10 haplotypes).</p
Plot of the proportion of carriers of the <i>HLA</i>-<i>DRB1*15</i>:<i>01~HLA</i>-<i>DQB1*06</i>:<i>02</i> haplotype at different hamming distances from the (<i>a1</i>) SNP haplotype.
<p>The magenta line represents the average of all haplotypes at a given Hamming distance. Also plotted are the subgroups of haplotypes carrying <i>HLA</i>-<i>DRB1*15</i>:<i>01~HLA</i>-<i>DQB1*06</i>:<i>02</i> less than 10 percent of the time (blue) and those carrying this HLA haplotype 10 or more percent of the time (orange line). Black dots represent individual observations. Certainly, as hamming distance increased, the percentage of haplotypes carrying <i>HLA</i>-<i>DRB1*15</i>:<i>01~HLA</i>-<i>DQB1*06</i>:<i>02</i> diminishes (magenta). However, even at a hamming distance of 4, some specific SNP haplotypes carry this HLA haplotype almost half of the time.</p
Common <i>a2-</i>, <i>a6-</i>, or <i>a14-</i>containing (or other) extended haplotypes<sup>††</sup>.
<p>Common <i>a2-</i>, <i>a6-</i>, or <i>a14-</i>containing (or other) extended haplotypes<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0190043#t003fn001" target="_blank"><sup>††</sup></a>.</p
Highly conserved extended haplotypes of the major histocompatibility complex and their relationship to multiple sclerosis susceptibility - Fig 5
<p>Disease-associations for the different SNP-haplotype combinations with the Class II HLA haplotypes of: (A) <i>DRB1*1501~DQB1*0602</i> and: (B) <i>DRB1*03</i>:<i>01~DQB1*02</i>:<i>01 & DRB1*13</i>:<i>03~DQB1*03</i>:<i>01</i>. The odds ratios (OR) are given comparing cases to controls with regard to carrying either one or two copies of the risk-haplotype as opposed to carrying zero copies. In these circumstances, the disease association varied markedly, depending upon which SNP-haplotype carried the HLA-haplotype. Such an observation indicates that the observed disease-associations were not due to these specific HLA alleles but, rather, to something else, which was present on these SNP-haplotypes (see text). For unclear reasons, this data set did not replicate the findings of Chao and coworkers [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0190043#pone.0190043.ref019" target="_blank">19</a>] with respect to the <i>HLA-B*08</i>, <i>HLA-B*13</i>, <i>HLA-B*27</i>, <i>HLA-B*32</i>, and <i>HLA-B*52</i> haplotypes (see text). In the WTCCC data, however, vast majority (96−100%) of the haplotypes that carried these <i>HLA-B</i> alleles, when they included the <i>HLA-DRB1*15</i>:<i>01</i> allele, also carried the (<i>a1</i>) SNP haplotype. As a result, because they also carried the (<i>a1</i>) SNP haplotype, each of these haplotypes was strongly associated with an increased MS-risk except for the extremely rare <i>HLA-B*52~HLA-DRB1*15</i>:<i>01~a1</i> haplotype (where OR = 1.01).</p
Selected SNP haplotypes in the Class II region of chromosome 6<sup>†</sup>.
<p>Selected SNP haplotypes in the Class II region of chromosome 6<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0190043#t001fn001" target="_blank"><sup>†</sup></a>.</p
Highly conserved extended haplotypes of the major histocompatibility complex and their relationship to multiple sclerosis susceptibility - Fig 2
<p>The HLA haplotype/SNP haplotype associations–both by SNP haplotype (A) and also by HLA haplotype (B)–for selected SNP haplotypes (some of which are presented in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0190043#pone.0190043.t001" target="_blank">Table 1</a>). Other haplotypes not presented also had very specific haplotype associations [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0190043#pone.0190043.ref032" target="_blank">32</a>].</p
Location of the 11 SNPs in the haplotype surrounding the Class II DRB1 gene on chromosome 6 (6p21.3), which had the greatest disease association of any SNP haplotype in the region (see text).
<p>The blue rectangles span the regions from the start to the stop points of the Class II genes: <i>HLA</i>-<i>DRB5</i>, <i>HLA</i>-<i>DRB1</i>, <i>HLA</i>-<i>DQA1</i>, and <i>HLA</i>-<i>DQB1</i>. The centromere of Chromosome 6 lies to the right of this portion of 6p21.3.</p
High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs
<div><p>Genetic variation at the Human Leucocyte Antigen (HLA) genes is associated with many autoimmune and infectious disease phenotypes, is an important element of the immunological distinction between self and non-self, and shapes immune epitope repertoires. Determining the allelic state of the HLA genes (HLA typing) as a by-product of standard whole-genome sequencing data would therefore be highly desirable and enable the immunogenetic characterization of samples in currently ongoing population sequencing projects. Extensive hyperpolymorphism and sequence similarity between the HLA genes, however, pose problems for accurate read mapping and make HLA type inference from whole-genome sequencing data a challenging problem. We describe how to address these challenges in a Population Reference Graph (PRG) framework. First, we construct a PRG for 46 (mostly HLA) genes and pseudogenes, their genomic context and their characterized sequence variants, integrating a database of over 10,000 known allele sequences. Second, we present a sequence-to-PRG paired-end read mapping algorithm that enables accurate read mapping for the HLA genes. Third, we infer the most likely pair of underlying alleles at G group resolution from the IMGT/HLA database at each locus, employing a simple likelihood framework. We show that HLA*PRG, our algorithm, outperforms existing methods by a wide margin. We evaluate HLA*PRG on six classical class I and class II HLA genes (HLA-A, -B, -C, -DQA1, -DQB1, -DRB1) and on a set of 14 samples (3 samples with 2 x 100bp, 11 samples with 2 x 250bp Illumina HiSeq data). Of 158 alleles tested, we correctly infer 157 alleles (99.4%). We also identify and re-type two erroneous alleles in the original validation data. We conclude that HLA*PRG for the first time achieves accuracies comparable to gold-standard reference methods from standard whole-genome sequencing data, though high computational demands (currently ~30–250 CPU hours per sample) remain a significant challenge to practical application.</p></div
Runtime and memory requirements comparison of HLA*PRG, PHLAT and HLAReporter on NA12878.
<p>Upper part: NA12878 2 x 100bp reads from the Platinum cohort; lower part: NA12878 2 x 250bp reads from the 1000 Genomes cohort. We provide a detailed analysis in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005151#pcbi.1005151.s014" target="_blank">S1 Text</a>.</p