5,358 research outputs found

    A method for finding single-nucleotide polymorphisms with allele frequencies in sequences of deep coverage

    Get PDF
    BACKGROUND: The allele frequencies of single-nucleotide polymorphisms (SNPs) are needed to select an optimal subset of common SNPs for use in association studies. Sequence-based methods for finding SNPs with allele frequencies may need to handle thousands of sequences from the same genome location (sequences of deep coverage). RESULTS: We describe a computational method for finding common SNPs with allele frequencies in single-pass sequences of deep coverage. The method enhances a widely used program named PolyBayes in several aspects. We present results from our method and PolyBayes on eighteen data sets of human expressed sequence tags (ESTs) with deep coverage. The results indicate that our method used almost all single-pass sequences in computation of the allele frequencies of SNPs. CONCLUSION: The new method is able to handle single-pass sequences of deep coverage efficiently. Our work shows that it is possible to analyze sequences of deep coverage by using pairwise alignments of the sequences with the finished genome sequence, instead of multiple sequence alignments

    Systematic genetic analysis of the MHC region reveals mechanistic underpinnings of HLA type associations with disease.

    Get PDF
    The MHC region is highly associated with autoimmune and infectious diseases. Here we conduct an in-depth interrogation of associations between genetic variation, gene expression and disease. We create a comprehensive map of regulatory variation in the MHC region using WGS from 419 individuals to call eight-digit HLA types and RNA-seq data from matched iPSCs. Building on this regulatory map, we explored GWAS signals for 4083 traits, detecting colocalization for 180 disease loci with eQTLs. We show that eQTL analyses taking HLA type haplotypes into account have substantially greater power compared with only using single variants. We examined the association between the 8.1 ancestral haplotype and delayed colonization in Cystic Fibrosis, postulating that downregulation of RNF5 expression is the likely causal mechanism. Our study provides insights into the genetic architecture of the MHC region and pinpoints disease associations that are due to differential expression of HLA genes and non-HLA genes

    Inferring Genomic Sequences

    Get PDF
    Recent advances in next generation sequencing have provided unprecedented opportunities for high-throughput genomic research, inexpensively producing millions of genomic sequences in a single run. Analysis of massive volumes of data results in a more accurate picture of the genome complexity and requires adequate bioinformatics support. We explore computational challenges of applying next generation sequencing to particular applications, focusing on the problem of reconstructing viral quasispecies spectrum from pyrosequencing shotgun reads and problem of inferring informative single nucleotide polymorphisms (SNPs), statistically covering genetic variation of a genome region in genome-wide association studies. The genomic diversity of viral quasispecies is a subject of a great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software cannot be used to simultaneously assemble and estimate the abundance of multiple closely related (but non-identical) quasispecies sequences. Here, we introduce a new Viral Spectrum Assembler (ViSpA) for inferring quasispecies spectrum and compare it with the state-of-the-art ShoRAH tool on both synthetic and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. While ShoRAH has an advanced error correction algorithm, ViSpA is better at quasispecies assembling, producing more accurate reconstruction of a viral population. We also foresee ViSpA application to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations. Due to the large data volume in genome-wide association studies, it is desirable to find a small subset of SNPs (tags) that covers the genetic variation of the entire set. We explore the trade-off between the number of tags used per non-tagged SNP and possible overfitting and propose an efficient 2LR-Tagging heuristic

    Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Eight diverse sorghum (<it>Sorghum bicolor </it>L. Moench) accessions were subjected to short-read genome sequencing to characterize the distribution of single-nucleotide polymorphisms (SNPs). Two strategies were used for DNA library preparation. Missing SNP genotype data were imputed by local haplotype comparison. The effect of library type and genomic diversity on SNP discovery and imputation are evaluated.</p> <p>Results</p> <p>Alignment of eight genome equivalents (6 Gb) to the public reference genome revealed 283,000 SNPs at ≥82% confirmation probability. Sequencing from libraries constructed to limit sequencing to start at defined restriction sites led to genotyping 10-fold more SNPs in all 8 accessions, and correctly imputing 11% more missing data, than from semirandom libraries. The SNP yield advantage of the reduced-representation method was less than expected, since up to one fifth of reads started at noncanonical restriction sites and up to one third of restriction sites predicted <it>in silico </it>to yield unique alignments were not sampled at near-saturation. For imputation accuracy, the availability of a genomically similar accession in the germplasm panel was more important than panel size or sequencing coverage.</p> <p>Conclusions</p> <p>A sequence quantity of 3 million 50-base reads per accession using a <it>Bsr</it>FI library would conservatively provide satisfactory genotyping of 96,000 sorghum SNPs. For most reliable SNP-genotype imputation in shallowly sequenced genomes, germplasm panels should consist of pairs or groups of genomically similar entries. These results may help in designing strategies for economical genotyping-by-sequencing of large numbers of plant accessions.</p

    Heterogeneity of Human Neutrophil CD177 Expression Results from CD177P1 Pseudogene Conversion

    Get PDF
    Most humans harbor both CD177neg and CD177pos neutrophils but 1–10% of people are CD177null, placing them at risk for formation of anti-neutrophil antibodies that can cause transfusion-related acute lung injury and neonatal alloimmune neutropenia. By deep sequencing the CD177 locus, we catalogued CD177 single nucleotide variants and identified a novel stop codon in CD177null individuals arising from a single base substitution in exon 7. This is not a mutation in CD177 itself, rather the CD177null phenotype arises when exon 7 of CD177 is supplied entirely by the CD177 pseudogene (CD177P1), which appears to have resulted from allelic gene conversion. In CD177 expressing individuals the CD177 locus contains both CD177P1 and CD177 sequences. The proportion of CD177hi neutrophils in the blood is a heritable trait. Abundance of CD177hi neutrophils correlates with homozygosity for CD177 reference allele, while heterozygosity for ectopic CD177P1 gene conversion correlates with increased CD177neg neutrophils, in which both CD177P1 partially incorporated allele and paired intact CD177 allele are transcribed. Human neutrophil heterogeneity for CD177 expression arises by ectopic allelic conversion. Resolution of the genetic basis of CD177null phenotype identifies a method for screening for individuals at risk of CD177 isoimmunisation

    Heterogeneity of human Neutrophil CD177 expression results from CD177P1 Pseudogene Conversion

    Get PDF
    Most humans harbor both CD177neg and CD177pos neutrophils but 1–10% of people are CD177null, placing them at risk for formation of anti-neutrophil antibodies that can cause transfusion-related acute lung injury and neonatal alloimmune neutropenia. By deep sequencing the CD177 locus, we catalogued CD177 single nucleotide variants and identified a novel stop codon in CD177null individuals arising from a single base substitution in exon 7. This is not a mutation in CD177 itself, rather the CD177null phenotype arises when exon 7 of CD177 is supplied entirely by the CD177 pseudogene (CD177P1), which appears to have resulted from allelic gene conversion. In CD177 expressing individuals the CD177 locus contains both CD177P1 and CD177 sequences. The proportion of CD177hi neutrophils in the blood is a heritable trait. Abundance of CD177hi neutrophils correlates with homozygosity for CD177 reference allele, while heterozygosity for ectopic CD177P1 gene conversion correlates with increased CD177neg neutrophils, in which both CD177P1 partially incorporated allele and paired intact CD177 allele are transcribed. Human neutrophil heterogeneity for CD177 expression arises by ectopic allelic conversion. Resolution of the genetic basis of CD177null phenotype identifies a method for screening for individuals at risk of CD177 isoimmunisation
    • …
    corecore