40 research outputs found

    Common, Intermediate and Well-Documented HLA Alleles in World Populations: CIWD Version 3.0.0

    Get PDF
    A catalog of common, intermediate and well-documented (CIWD) HLA-A, -B, -C, -DRB1, -DRB3, -DRB4, -DRB5, -DQB1 and -DPB1 alleles has been compiled from over 8 million individuals using data from 20 unrelated hematopoietic stem cell volunteer donor registries. Individuals are divided into seven geographic/ancestral/ethnic groups and data are summarized for each group and for the total population. P (two-field) and G group assignments are divided into one of four frequency categories: common (≥1 in 10 000), intermediate (≥1 in 100 000), well-documented (≥5 occurrences) or not-CIWD. Overall 26% of alleles in IPD-IMGT/HLA version 3.31.0 at P group resolution fall into the three CIWD categories. The two-field catalog includes 18% (n = 545) common, 17% (n = 513) intermediate, and 65% (n = 1997) well-documented alleles. Full-field allele frequency data are provided but are limited in value by the variations in resolution used by the registries. A recommended CIWD list is based on the most frequent category in the total or any of the seven geographic/ancestral/ethnic groups. Data are also provided so users can compile a catalog specific to the population groups that they serve. Comparisons are made to three previous CWD reports representing more limited population groups. This catalog, CIWD version 3.0.0, is a step closer to the collection of global HLA frequencies and to a clearer view of HLA diversity in the human population as a whole

    Schematic representation of HLA type inference using HLA*PRG.

    No full text
    <p><b>a</b> Broad-scale structure of the HLA PRG. The included genes are separated by spacer blocks consisting of N characters. <b>b</b> Fine-scale structure of the PRG input sequences. Exons, introns and UTRs are embedded in regional haplotypes (padding sequence). Exon sequences typically outnumber intron sequences. The red line indicates the region covered by IMGT genomic sequences. X-axis not to scale. <b>c</b> For each gene represented in the PRG, multiple sequence alignments representing up to 3 sources of sequence data are merged for PRG construction: exonic sequences, genomic (UTR, exons, introns) sequences, regional haplotypes (“xMHC Ref.”). Using alleles present in both the current and the next-higher-level MSA (identifiers printed in red), the merging algorithm determines consensus boundaries (blue bars) to connect the MSAs of different input sequence types. For each segment so-defined, we use the MSA corresponding to the highest-resolution input sequence type (sequence characters therefore ignored are printed in grey). <b>d</b> The PRG corresponding to the input sequences shown in c, and a seed-and-extend alignment of a sequencing read to the PRG. PRG nodes are represented by boxes and edges by labelled arrows. The four blue markers correspond to the consensus MSA boundaries shown in c. The aligned sequence of the read is displayed below the PRG, and the alignment path (the sequence of edges and nodes traversed in the PRG) is highlighted. The red component of the alignment path corresponds to the exact-match “seed” component of the alignment (spanning a graph-encoded gap), whereas the orange components correspond to the “extend” component of the alignment (where mismatches are allowed).</p

    Runtime and memory requirements comparison of HLA*PRG, PHLAT and HLAReporter on NA12878.

    No full text
    <p>Upper part: NA12878 2 x 100bp reads from the Platinum cohort; lower part: NA12878 2 x 250bp reads from the 1000 Genomes cohort. We provide a detailed analysis in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005151#pcbi.1005151.s014" target="_blank">S1 Text</a>.</p

    Distinguishing functional polymorphism from random variation in the sequences of >10,000 <i>HLA-A</i>, <i>-B</i> and <i>-C</i> alleles

    No full text
    <div><p>HLA class I glycoproteins contain the functional sites that bind peptide antigens and engage lymphocyte receptors. Recently, clinical application of sequence-based <i>HLA</i> typing has uncovered an unprecedented number of novel <i>HLA</i> class I alleles. Here we define the nature and extent of the variation in 3,489 <i>HLA-A</i>, 4,356 <i>HLA-B</i> and 3,111 <i>HLA-C</i> alleles. This analysis required development of suites of methods, having general applicability, for comparing and analyzing large numbers of homologous sequences. At least three amino-acid substitutions are present at every position in the polymorphic α<sub>1</sub> and α<sub>2</sub> domains of HLA-A, -B and -C. A minority of positions have an incidence >1% for the ‘second’ most frequent nucleotide, comprising 70 positions in <i>HLA-A</i>, 85 in <i>HLA-B</i> and 54 in <i>HLA-C</i>. The majority of these positions have three or four alternative nucleotides. These positions were subject to positive selection and correspond to binding sites for peptides and receptors. Most alleles of <i>HLA</i> class I (>80%) are very rare, often identified in one person or family, and they differ by point mutation from older, more common alleles. These alleles with single nucleotide polymorphisms reflect the germ-line mutation rate. Their frequency predicts the human population harbors 8–9 million <i>HLA</i> class I variants. The common alleles of human populations comprise 42 core alleles, which represent all selected polymorphism, and recombinants that have assorted this polymorphism.</p></div

    Frequencies of the second most common nucleotide at positions in exons 2 and 3.

    No full text
    <p>The histograms show the frequency for <i>HLA-A</i> (top) –<i>B</i> (center) and –<i>C</i> (bottom) of the second most common nucleotide at each position in exons 2 and 3.</p

    Pairwise comparison defines allele groups with high sequence similarity.

    No full text
    <p>The dot plots show the results of pairwise comparison of nucleotide sequences within <i>HLA-A</i>, (<b>Panel A</b>) <i>HLA-B</i> <b>(Panel B</b>) <i>HLA-C</i> (<b>Panel C</b>) and all three combined (<b>Panel D</b>). A color scale indicates the number of nucleotide differences in each pair compared with red representing the most closely related alleles. The diagonal labeling in the individual gene plots indicates the allele groups, i.e. 01 in panel A is <i>HLA-A*01</i>, N indicates the number of alleles in each group that were used in the analysis. The diagonal labels do not show the following groups where N is less than 25: <i>A*34</i>, <i>A*36</i>, <i>A*43</i>, <i>A*66</i>, <i>A*69</i>, <i>A*74</i>, <i>A*80</i>, <i>B*42</i>, <i>B*45</i>, <i>B*47</i>, <i>B*49</i>, <i>B*50</i>, <i>B*54</i>, <i>B*59</i>, <i>B*67</i>, <i>B*73</i>, <i>B*78</i>, <i>B*81</i>, <i>B*82</i>, <i>B*83</i> and <i>C*18</i>.</p

    Most alleles within a SEG are related by single point substitutions.

    No full text
    <p>Shows all of the members of the final <i>HLA-A*02</i>:<i>01</i>:<i>01</i>:<i>01</i> SEG. <b>A</b> The parental <i>HLA-A*02</i>:<i>01</i>:<i>01</i>:<i>01</i> allele has 423 “child” alleles that vary from <i>A*02</i>:<i>01</i>:<i>01</i>:<i>01</i> by a point substitution (green). Additional alleles can be connected by two or more point substitutions (red). In the algorithm the intermediate SEGs, for example <i>HLA-A*02</i>:<i>07</i>:<i>01</i>, are constructed and subsequently added to other larger SEGs. <b>B</b> Ten other <i>HLA-A*02</i> SEGs were identified that could not directly be linked to the <i>HLA-A*02</i>:<i>01</i>:<i>01</i>:<i>01</i> SEG because they differed from it by more than one point substitution and no intermediate alleles were identified. All of the SEGS with more than a single child are derived by intragenic recombination. Six of the seven are based on an intragenic recombinant involving the <i>HLA-A*02</i>:<i>01</i>:<i>01</i>:<i>01</i> SEG. The seventh is the core allele HLA-A*02:05:01.</p
    corecore