68 research outputs found

    Comprehensive assessment of sequence variation within the copy number variable defensin cluster on 8p23 by target enriched in-depth 454 sequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In highly copy number variable (CNV) regions such as the human defensin gene locus, comprehensive assessment of sequence variations is challenging. PCR approaches are practically restricted to tiny fractions, and next-generation sequencing (NGS) approaches of whole individual genomes e.g. by the 1000 Genomes Project is confined by an affordable sequence depth. Combining target enrichment with NGS may represent a feasible approach.</p> <p>Results</p> <p>As a proof of principle, we enriched a ~850 kb section comprising the CNV defensin gene cluster DEFB, the invariable DEFA part and 11 control regions from two genomes by sequence capture and sequenced it by 454 technology. 6,651 differences to the human reference genome were found. Comparison to HapMap genotypes revealed sensitivities and specificities in the range of 94% to 99% for the identification of variations.</p> <p>Using error probabilities for rigorous filtering revealed 2,886 unique single nucleotide variations (SNVs) including 358 putative novel ones. DEFB CN determinations by haplotype ratios were in agreement with alternative methods.</p> <p>Conclusion</p> <p>Although currently labor extensive and having high costs, target enriched NGS provides a powerful tool for the comprehensive assessment of SNVs in highly polymorphic CNV regions of individual genomes. Furthermore, it reveals considerable amounts of putative novel variations and simultaneously allows CN estimation.</p

    Sequencing of BAC pools by different next generation sequencing platforms and strategies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Next generation sequencing of BACs is a viable option for deciphering the sequence of even large and highly repetitive genomes. In order to optimize this strategy, we examined the influence of read length on the quality of Roche/454 sequence assemblies, to what extent Illumina/Solexa mate pairs (MPs) improve the assemblies by scaffolding and whether barcoding of BACs is dispensable.</p> <p>Results</p> <p>Sequencing four BACs with both FLX and Titanium technologies revealed similar sequencing accuracy, but showed that the longer Titanium reads produce considerably less misassemblies and gaps. The 454 assemblies of 96 barcoded BACs were improved by scaffolding 79% of the total contig length with MPs from a non-barcoded library.</p> <p>Assembly of the unmasked 454 sequences without separation by barcodes revealed chimeric contig formation to be a major problem, encompassing 47% of the total contig length. Masking the sequences reduced this fraction to 24%.</p> <p>Conclusion</p> <p>Optimal BAC pool sequencing should be based on the longest available reads, with barcoding essential for a comprehensive assessment of both repetitive and non-repetitive sequence information. When interest is restricted to non-repetitive regions and repeats are masked prior to assembly, barcoding is non-essential. In any case, the assemblies can be improved considerably by scaffolding with non-barcoded BAC pool MPs.</p

    An efficient approach to BAC based assembly of complex genomes

    Get PDF
    Background: There has been an exponential growth in the number of genome sequencing projects since the introduction of next generation DNA sequencing technologies. Genome projects have increasingly involved assembly of whole genome data which produces inferior assemblies compared to traditional Sanger sequencing of genomic fragments cloned into bacterial artificial chromosomes (BACs). While whole genome shotgun sequencing using next generation sequencing (NGS) is relatively fast and inexpensive, this method is extremely challenging for highly complex genomes, where polyploidy or high repeat content confounds accurate assembly, or where a highly accurate ‘gold’ reference is required. Several attempts have been made to improve genome sequencing approaches by incorporating NGS methods, to variable success. Results: We present the application of a novel BAC sequencing approach which combines indexed pools of BACs, Illumina paired read sequencing, a sequence assembler specifically designed for complex BAC assembly, and a custom bioinformatics pipeline. We demonstrate this method by sequencing and assembling BAC cloned fragments from bread wheat and sugarcane genomes. Conclusions: We demonstrate that our assembly approach is accurate, robust, cost effective and scalable, with applications for complete genome sequencing in large and complex genomes

    A next-generation sequencing method for overcoming the multiple gene copy problem in polyploid phylogenetics, applied to Poa grasses

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Polyploidy is important from a phylogenetic perspective because of its immense past impact on evolution and its potential future impact on diversification, survival and adaptation, especially in plants. Molecular population genetics studies of polyploid organisms have been difficult because of problems in sequencing multiple-copy nuclear genes using Sanger sequencing. This paper describes a method for sequencing a barcoded mixture of targeted gene regions using next-generation sequencing methods to overcome these problems.</p> <p>Results</p> <p>Using 64 3-bp barcodes, we successfully sequenced three chloroplast and two nuclear gene regions (each of which contained two gene copies with up to two alleles per individual) in a total of 60 individuals across 11 species of Australian <it>Poa </it>grasses. This method had high replicability, a low sequencing error rate (after appropriate quality control) and a low rate of missing data. Eighty-eight percent of the 320 gene/individual combinations produced sequence reads, and >80% of individuals produced sufficient reads to detect all four possible nuclear alleles of the homeologous nuclear loci with 95% probability.</p> <p>We applied this method to a group of sympatric Australian alpine <it>Poa </it>species, which we discovered to share an allopolyploid ancestor with a group of American <it>Poa </it>species. All markers revealed extensive allele sharing among the Australian species and so we recommend that the current taxonomy be re-examined. We also detected hypermutation in the <it>trn</it>H-<it>psb</it>A marker, suggesting it should not be used as a land plant barcode region. Some markers indicated differentiation between Tasmanian and mainland samples. Significant positive spatial genetic structure was detected at <100 km with chloroplast but not nuclear markers, which may be a result of restricted seed flow and long-distance pollen flow in this wind-pollinated group.</p> <p>Conclusions</p> <p>Our results demonstrate that 454 sequencing of barcoded amplicon mixtures can be used to reliably sample all alleles of homeologous loci in polyploid species and successfully investigate phylogenetic relationships among species, as well as to investigate phylogeographic hypotheses. This next-generation sequencing method is more affordable than and at least as reliable as bacterial cloning. It could be applied to any experiment involving sequencing of amplicon mixtures.</p

    Genetic Factors of the Disease Course after Sepsis: A Genome-Wide Study for 28Day Mortality.

    Get PDF
    Sepsis is the dysregulated host response to an infection which leads to life-threatening organ dysfunction that varies by host genomic factors. We conducted a genome-wide association study (GWAS) in 740 adult septic patients and focused on 28day mortality as outcome. Variants with suggestive evidence for an association (p≤10-5) were validated in two additional GWA studies (n=3470) and gene coding regions related to the variants were assessed in an independent exome sequencing study (n=74). In the discovery GWAS, we identified 243 autosomal variants which clustered in 14 loci (p≤10-5). The best association signal (rs117983287; p=8.16×10-8) was observed for a missense variant located at chromosome 9q21.2 in the VPS13A gene. VPS13A was further supported by additional GWAS (p=0.03) and sequencing data (p=0.04). Furthermore, CRISPLD2 (p=5.99×10-6) and a region on chromosome 13q21.33 (p=3.34×10-7) were supported by both our data and external biological evidence. We found 14 loci with suggestive evidence for an association with 28day mortality and found supportive, converging evidence for three of them in independent data sets. Elucidating the underlying biological mechanisms of VPS13A, CRISPLD2, and the chromosome 13 locus should be a focus of future research activities.The project was supported by the Paul-Martini-Sepsis Research Group, funded by the Thuringian Ministry of Education, Science and Culture (ProExcellence; grant PE 108-2); the public funded Thuringian Foundation for Technology, Innovation and Research (STIFT) and the German Sepsis Society (GSS); the Jena Center of Sepsis Control and Care (CSCC), funded by the German Ministry of Education and Research (BMBF; 01 EO 1002, 01 EO 1502). The VISEP and MAXSEP trials from the SepNet Study Group had been supported by a BMBF grant (01 KI 0106) and by unrestricted grants from B. Braun, HemoCue, Novo Nordisk, Astra Zeneca GmbH, Wedel, Germany and Bayer HealthCare, Leverkusen, Germany. The exome sequencing study was funded in part by the Hellenic Institute for the Study of Sepsis. The GenOSept study was supported by the European Union and benefits from the 6th framework programme of RTD funding. The PROGRESS study is supported by the German Federal Ministry of Education and Research, grant numbers 01KI07110 (Giessen), 01KI07111 (Jena), 01KI07113 (Leipzig), 01KI07114 (Berlin), 01KI1010I (Leipzig), and 01KI1010D (Greifswald)

    A chromosome conformation capture ordered sequence of the barley genome

    Get PDF
    201

    Sequence Composition and Gene Content of the Short Arm of Rye (Secale cereale) Chromosome 1

    Get PDF
    BACKGROUND: The purpose of the study is to elucidate the sequence composition of the short arm of rye chromosome 1 (Secale cereale) with special focus on its gene content, because this portion of the rye genome is an integrated part of several hundreds of bread wheat varieties worldwide. METHODOLOGY/PRINCIPAL FINDINGS: Multiple Displacement Amplification of 1RS DNA, obtained from flow sorted 1RS chromosomes, using 1RS ditelosomic wheat-rye addition line, and subsequent Roche 454FLX sequencing of this DNA yielded 195,313,589 bp sequence information. This quantity of sequence information resulted in 0.43× sequence coverage of the 1RS chromosome arm, permitting the identification of genes with estimated probability of 95%. A detailed analysis revealed that more than 5% of the 1RS sequence consisted of gene space, identifying at least 3,121 gene loci representing 1,882 different gene functions. Repetitive elements comprised about 72% of the 1RS sequence, Gypsy/Sabrina (13.3%) being the most abundant. More than four thousand simple sequence repeat (SSR) sites mostly located in gene related sequence reads were identified for possible marker development. The existence of chloroplast insertions in 1RS has been verified by identifying chimeric chloroplast-genomic sequence reads. Synteny analysis of 1RS to the full genomes of Oryza sativa and Brachypodium distachyon revealed that about half of the genes of 1RS correspond to the distal end of the short arm of rice chromosome 5 and the proximal region of the long arm of Brachypodium distachyon chromosome 2. Comparison of the gene content of 1RS to 1HS barley chromosome arm revealed high conservation of genes related to chromosome 5 of rice. CONCLUSIONS: The present study revealed the gene content and potential gene functions on this chromosome arm and demonstrated numerous sequence elements like SSRs and gene-related sequences, which can be utilised for future research as well as in breeding of wheat and rye

    Initial sequencing and analysis of the human genome

    Full text link
    The human genome holds an extraordinary trove of information about human development, physiology, medicine and evolution. Here we report the results of an international collaboration to produce and make freely available a draft sequence of the human genome. We also present an initial analysis of the data, describing some of the insights that can be gleaned from the sequence.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/62798/1/409860a0.pd
    corecore