Search CORE

5,358 research outputs found

A method for finding single-nucleotide polymorphisms with allele frequencies in sequences of deep coverage

Author: Huang Xiaoqiu
Wang Jianmin
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: The allele frequencies of single-nucleotide polymorphisms (SNPs) are needed to select an optimal subset of common SNPs for use in association studies. Sequence-based methods for finding SNPs with allele frequencies may need to handle thousands of sequences from the same genome location (sequences of deep coverage). RESULTS: We describe a computational method for finding common SNPs with allele frequencies in single-pass sequences of deep coverage. The method enhances a widely used program named PolyBayes in several aspects. We present results from our method and PolyBayes on eighteen data sets of human expressed sequence tags (ESTs) with deep coverage. The results indicate that our method used almost all single-pass sequences in computation of the allele frequencies of SNPs. CONCLUSION: The new method is able to handle single-pass sequences of deep coverage efficiently. Our work shows that it is possible to analyze sequences of deep coverage by using pairwise alignments of the sequences with the finished genome sequence, instead of multiple sequence alignments

Digital Repository @ Iowa State University (ISU)

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Worldwide genetic variation of the IGHV and TRBV immune receptor gene families in humans.

Author: Li Heng
Luo Shishi
Song Yun
Yu Jane
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

The immunoglobulin heavy variable (IGHV) and T cell beta variable (TRBV) loci are among the most complex and variable regions in the human genome. Generated through a process of gene duplication/deletion and diversification, these loci can vary extensively between individuals in copy number and contain genes that are highly similar, making their analysis technically challenging. Here, we present a comprehensive study of the functional gene segments in the IGHV and TRBV loci, quantifying their copy number and single-nucleotide variation in a globally diverse sample of 109 (IGHV) and 286 (TRBV) humans from over a 100 populations. We find that the IGHV and TRBV gene families exhibit starkly different patterns of variation. In addition to providing insight into the different evolutionary paths of the IGHV and TRBV loci, our results are also important to the adaptive immune repertoire sequencing community, where the lack of frequencies of common alleles and copy number variants is hampering existing analytical pipelines

eScholarship - University of California

Systematic genetic analysis of the MHC region reveals mechanistic underpinnings of HLA type associations with disease.

Author: Aguiar
Assis
Auton
Bijvelds
Blackwell
Bonder
D'Antonio
D'Antonio-Chronowska
DeBoever
Dechecchi
DeGiorgio
Dendrou
Diwakar
Dobin
Eguchi
Ernst
Fehrmann
Freudenberg
Gambino
Gensterblum-Miller
Giambartolomei
González-Galarza
Gough
Graffelman
Guo
Hardy
Harrow
Herrmann
Holoshitz
Huang
Jakubosky
Jakubosky
Jensen
Jia
Kilpinen
Kilpinen
Klein
Kontakioti
Kundaje
Laki
Lam
Lee
Leung
Li
Li
Li
Li
Li
Lyczak
Mahdi
Mall
Matzaraki
Mayba
McNicholas
Miretti
Morison
Munder
Nariai
Norman
Oldstone
Panopoulos
Panousis
Pier
Robinson
Sondo
Stegle
Stoltz
Streeter
Tan
Tomati
Trowsdale
Van der Auwera
Vicente
Wilke
Yin
Zhang
Zheng
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

The MHC region is highly associated with autoimmune and infectious diseases. Here we conduct an in-depth interrogation of associations between genetic variation, gene expression and disease. We create a comprehensive map of regulatory variation in the MHC region using WGS from 419 individuals to call eight-digit HLA types and RNA-seq data from matched iPSCs. Building on this regulatory map, we explored GWAS signals for 4083 traits, detecting colocalization for 180 disease loci with eQTLs. We show that eQTL analyses taking HLA type haplotypes into account have substantially greater power compared with only using single variants. We examined the association between the 8.1 ancestral haplotype and delayed colonization in Cystic Fibrosis, postulating that downregulation of RNF5 expression is the likely causal mechanism. Our study provides insights into the genetic architecture of the MHC region and pinpoints disease associations that are due to differential expression of HLA genes and non-HLA genes

Crossref

eScholarship - University of California

Inferring Genomic Sequences

Author: Astrovskaya Irina A
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2011
Field of study

Recent advances in next generation sequencing have provided unprecedented opportunities for high-throughput genomic research, inexpensively producing millions of genomic sequences in a single run. Analysis of massive volumes of data results in a more accurate picture of the genome complexity and requires adequate bioinformatics support. We explore computational challenges of applying next generation sequencing to particular applications, focusing on the problem of reconstructing viral quasispecies spectrum from pyrosequencing shotgun reads and problem of inferring informative single nucleotide polymorphisms (SNPs), statistically covering genetic variation of a genome region in genome-wide association studies. The genomic diversity of viral quasispecies is a subject of a great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software cannot be used to simultaneously assemble and estimate the abundance of multiple closely related (but non-identical) quasispecies sequences. Here, we introduce a new Viral Spectrum Assembler (ViSpA) for inferring quasispecies spectrum and compare it with the state-of-the-art ShoRAH tool on both synthetic and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. While ShoRAH has an advanced error correction algorithm, ViSpA is better at quasispecies assembling, producing more accurate reconstruction of a viral population. We also foresee ViSpA application to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations. Due to the large data volume in genome-wide association studies, it is desirable to find a small subset of SNPs (tags) that covers the genetic variation of the entire set. We explore the trade-off between the number of tags used per non-tagged SNP and possible overfitting and propose an efficient 2LR-Tagging heuristic

CiteSeerX

ScholarWorks @ Georgia State University

Single-nucleotide polymorphism discovery by high-throughput sequencing in sorghum

Author: A Howe
A Roberts
AH Paterson
AJ Amaral
AM Casa
C Castano-Sanchez
CLL Gowda
CP Van Tassell
D Altshuler
DA Nickerson
DL Hyten
DR Bentley
FA Feltus
FR Miller
Frank F White
Ginny Antony
H-M Lam
HHD Kerstens
IY Choi
J Lai
J Marchini
J Yu
JA Bedell
James C Nelson
JC Stephens
JD Faris
Jianming Yu
KL McNally
M Kimura
M Margulies
M Trick
MA Gore
Na Baird
NJ van Orsouw
PJ Brown
PJ Maughan
PJ Maughan
R Li
RM Clark
RT Wiedmann
S Atwell
S Deschamps
S Ossowski
Shichen Wang
SM Al-Janabi
T Murashige
T Sasaki
WB Barbazuk
WL Rooney
X Wu
XH Huang
XH Huang
Xianran Li
Y Arai-Kichise
Y Chutimanitsakun
Y Fu
Yuye Wu
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Eight diverse sorghum (<it>Sorghum bicolor </it>L. Moench) accessions were subjected to short-read genome sequencing to characterize the distribution of single-nucleotide polymorphisms (SNPs). Two strategies were used for DNA library preparation. Missing SNP genotype data were imputed by local haplotype comparison. The effect of library type and genomic diversity on SNP discovery and imputation are evaluated. Results Alignment of eight genome equivalents (6 Gb) to the public reference genome revealed 283,000 SNPs at ≥82% confirmation probability. Sequencing from libraries constructed to limit sequencing to start at defined restriction sites led to genotyping 10-fold more SNPs in all 8 accessions, and correctly imputing 11% more missing data, than from semirandom libraries. The SNP yield advantage of the reduced-representation method was less than expected, since up to one fifth of reads started at noncanonical restriction sites and up to one third of restriction sites predicted <it>in silico </it>to yield unique alignments were not sampled at near-saturation. For imputation accuracy, the availability of a genomically similar accession in the germplasm panel was more important than panel size or sequencing coverage. Conclusions A sequence quantity of 3 million 50-base reads per accession using a <it>Bsr</it>FI library would conservatively provide satisfactory genotyping of 96,000 sorghum SNPs. For most reliable SNP-genotype imputation in shallowly sequenced genomes, germplasm panels should consist of pairs or groups of genomically similar entries. These results may help in designing strategies for economical genotyping-by-sequencing of large numbers of plant accessions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Characterization of duplicate gene evolution in the recent natural allopolyploid Tragopogon miscellus by next-generation sequencing and Sequenom iPLEX MassARRAY genotyping

Author: Adams
Adams
Adams
Ainouche
Barbazuk
Bottley
Buggs
Buggs
Buggs
Cannon
Chaisson
Chaudhary
Chen
Chen
Cheung
Cheung
Cook
Cook
Dong
Doyle
Dunstan
Ellegren
Elmer
Emrich
Eveland
Feldman
Ferguson
Flagel
Flagel
Gabriel
Gaeta
Goetz
Guo
Gut
Harr
Hegarty
Hegarty
Hillier
Holt
Hudson
Johnson
Joly
Judd
Kashkush
Kihara
Kim
Kim
Kovarik
Kwok
Levy
Lim
Lim
Liu
Lynch
Margulies
Marth
Matyasek
Mavrodiev
Novaes
Novak
Ownbey
Pavy
Petit
Quackenbush
Renaut
Sadava
Slate
Soltis
Soltis
Soltis
Soltis
Soltis
Song
Stebbins
Stupar
Swanson-Wagner
Tate
Tate
Tate
Udall
Van Bers
Van Tassell
Vera
Wang
Wang
Whittall
Wolf
Publication venue: 'Wiley'
Publication date: 01/03/2010
Field of study

The definitive version is available at www.blackwell-synergy.co

Crossref

Queen Mary Research Online

Recommended from our members

Computational Tools for Immune Repertoire Characterization and Primer Set Design

Author: Yu Jane
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The enormous decrease in the cost of genomic sequencing over the past two decades has enabled researchers to revisit previously unaddressable questions in sequence analysis. However, this boom of genomic information has introduced new sets of problems that often demand computationally efficient methods. In this work, we describe computational tools for two such settings involving large-scale genomic data: 1) estimating copy number and allelic variation in two highly complex gene families, and 2) selective sequencing of a target genome in a complex DNA sample.We first describe a method that takes short reads from high-throughput sequencing and characterizes both copy number and allelic variation in the IGHV and TRBV loci. These two loci can vary extensively between individuals in copy number and contain genes that are highly similar, making their analysis technically challenging. Additionally, we have conducted the first study of a globally diverse sample of hundreds of individuals in these two loci from over a hundred populations. In addition to providing insight into the different evolutionary paths of the IGHV and TRBV loci, our results are also important to the adaptive immune repertoire sequencing community, where the lack of frequencies of common alleles and copy number variants is hampering existing analytical pipelines.In our second problem setting, we describe SOAPswga, an optimized and parallelized pipeline for primer design in the context of selective amplification. Unlike previous heuristic-based methods, SOAPswga uses machine learning methods to evaluate both individual primers and primer sets. Additionally, rather than brute force search for primer sets, such as in predecessor methods, SOAPswga uses branch-and-bound principles to pursue only the most promising sets. These optimizations, including the parallelization of each step, allow for a huge decrease in runtime from the order of weeks to minutes. We also discuss the results of our pipeline applied to the selective amplification of Mycobacterium tuberculosis in a sample of human blood. Lastly, we expand on the importance of this work, and in general, its potential usefulness to any setting consisting of targeted sequencing

eScholarship - University of California

Heterogeneity of Human Neutrophil CD177 Expression Results from CD177P1 Pseudogene Conversion

Author: Abhayaratna Walter
Andrews Thomas (Dan)
Cho Eun
Cook Matthew
Field Matthew
Gatenby Paul
Goodnow Christopher
Lam Wesley
Liang Rong
Ohnesorg Thomas
Perera L Chandima
Sinclair Andrew
Whittle Belinda
Wu Zuopeng
Zhang Yafei
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 29/11/2018
Field of study

Most humans harbor both CD177neg and CD177pos neutrophils but 1–10% of people are CD177null, placing them at risk for formation of anti-neutrophil antibodies that can cause transfusion-related acute lung injury and neonatal alloimmune neutropenia. By deep sequencing the CD177 locus, we catalogued CD177 single nucleotide variants and identified a novel stop codon in CD177null individuals arising from a single base substitution in exon 7. This is not a mutation in CD177 itself, rather the CD177null phenotype arises when exon 7 of CD177 is supplied entirely by the CD177 pseudogene (CD177P1), which appears to have resulted from allelic gene conversion. In CD177 expressing individuals the CD177 locus contains both CD177P1 and CD177 sequences. The proportion of CD177hi neutrophils in the blood is a heritable trait. Abundance of CD177hi neutrophils correlates with homozygosity for CD177 reference allele, while heterozygosity for ectopic CD177P1 gene conversion correlates with increased CD177neg neutrophils, in which both CD177P1 partially incorporated allele and paired intact CD177 allele are transcribed. Human neutrophil heterogeneity for CD177 expression arises by ectopic allelic conversion. Resolution of the genetic basis of CD177null phenotype identifies a method for screening for individuals at risk of CD177 isoimmunisation

The Australian National University