132 research outputs found
ATHLATES: accurate typing of human leukocyte antigen through exome sequencing
Human leukocyte antigen (HLA) typing at the allelic level can in theory be achieved using whole exome sequencing (exome-seq) data with no added cost but has been hindered by its computational challenge. We developed ATHLATES, a program that applies assembly, allele identification and allelic pair inference to short read sequences, and applied it to data from Illumina platforms. In 15 data sets with adequate coverage for HLA-A, -B, -C, -DRB1 and -DQB1 genes, ATHLATES correctly reported 74 out of 75 allelic pairs with an overall concordance rate of 99% compared with conventional typing. This novel approach should be broadly applicable to research and clinical laboratories
Whole Genome Pyrosequencing of Rare Hepatitis C Virus Genotypes Enhances Subtype Classification and Identification of Naturally Occurring Drug Resistance Variants
Background. Infection with hepatitis C virus (HCV) is a burgeoning worldwide public health problem, with 170 million infected individuals and an estimated 20 million deaths in the coming decades. While 6 main genotypes generally distinguish the global geographic diversity of HCV, a multitude of closely related subtypes within these genotypes are poorly defined and may influence clinical outcome and treatment options. Unfortunately, the paucity of genetic data from many of these subtypes makes time-consuming primer walking the limiting step for sequencing understudied subtypes. Methods. Here we combined long-range polymerase chain reaction amplification with pyrosequencing for a rapid approach to generate the complete viral coding region of 31 samples representing poorly defined HCV subtypes. Results. Phylogenetic classification based on full genome sequences validated previously identified HCV subtypes, identified a recombinant sequence, and identified a new distinct subtype of genotype 4. Unlike conventional sequencing methods, use of deep sequencing also facilitated characterization of minor drug resistance variants within these uncommon or, in some cases, previously uncharacterized HCV subtypes. Conclusions. These data aid in the classification of uncommon HCV subtypes while also providing a high-resolution view of viral diversity within infected patients, which may be relevant to the development of therapeutic regimens to minimize drug resistanc
Identifying novel constrained elements by exploiting biased substitution patterns
Motivation: Comparing the genomes from closely related species provides a powerful tool to identify functional elements in a reference genome. Many methods have been developed to identify conserved sequences across species; however, existing methods only model conservation as a decrease in the rate of mutation and have ignored selection acting on the pattern of mutations
The Rose-comb Mutation in Chickens Constitutes a Structural Rearrangement Causing Both Altered Comb Morphology and Defective Sperm Motility
Rose-comb, a classical monogenic trait of chickens, is characterized by a drastically altered comb morphology compared to the single-combed wild-type. Here we show that Rose-comb is caused by a 7.4 Mb inversion on chromosome 7 and that a second Rose-comb allele arose by unequal crossing over between a Rose-comb and wild-type chromosome. The comb phenotype is caused by the relocalization of the MNR2 homeodomain protein gene leading to transient ectopic expression of MNR2 during comb development. We also provide a molecular explanation for the first example of epistatic interaction reported by Bateson and Punnett 104 years ago, namely that walnut-comb is caused by the combined effects of the Rose-comb and Pea-comb alleles. Transient ectopic expression of MNR2 and SOX5 (causing the Pea-comb phenotype) occurs in the same population of mesenchymal cells and with at least partially overlapping expression in individual cells in the comb primordium. Rose-comb has pleiotropic effects, as homozygosity in males has been associated with poor sperm motility. We postulate that this is caused by the disruption of the CCDC108 gene located at one of the inversion breakpoints. CCDC108 is a poorly characterized protein, but it contains a MSP (major sperm protein) domain and is expressed in testis. The study illustrates several characteristic features of the genetic diversity present in domestic animals, including the evolution of alleles by two or more consecutive mutations and the fact that structural changes have contributed to fast phenotypic evolution
Recommended from our members
Comparison of Illumina and 454 Deep Sequencing in Participants Failing Raltegravir-Based Antiretroviral Therapy
Background: The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. Methods: A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Results: Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. Conclusions: In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls
Recommended from our members
A high-resolution map of human evolutionary constraint using 29 mammals.
The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease
Highly Sensitive and Specific Detection of Rare Variants in Mixed Viral Populations from Massively Parallel Sequence Data
Viruses diversify over time within hosts, often undercutting the effectiveness of host defenses and therapeutic interventions. To design successful vaccines and therapeutics, it is critical to better understand viral diversification, including comprehensively characterizing the genetic variants in viral intra-host populations and modeling changes from transmission through the course of infection. Massively parallel sequencing technologies can overcome the cost constraints of older sequencing methods and obtain the high sequence coverage needed to detect rare genetic variants (<1%) within an infected host, and to assay variants without prior knowledge. Critical to interpreting deep sequence data sets is the ability to distinguish biological variants from process errors with high sensitivity and specificity. To address this challenge, we describe V-Phaser, an algorithm able to recognize rare biological variants in mixed populations. V-Phaser uses covariation (i.e. phasing) between observed variants to increase sensitivity and an expectation maximization algorithm that iteratively recalibrates base quality scores to increase specificity. Overall, V-Phaser achieved >97% sensitivity and >97% specificity on control read sets. On data derived from a patient after four years of HIV-1 infection, V-Phaser detected 2,015 variants across the ∼10 kb genome, including 603 rare variants (<1% frequency) detected only using phase information. V-Phaser identified variants at frequencies down to 0.2%, comparable to the detection threshold of allele-specific PCR, a method that requires prior knowledge of the variants. The high sensitivity and specificity of V-Phaser enables identifying and tracking changes in low frequency variants in mixed populations such as RNA viruses
- …