51 research outputs found

    Exome sequencing of case-unaffected-parents trios reveals recessive and de novo genetic variants in sporadic ALS

    Get PDF
    The contribution of genetic variants to sporadic amyotrophic lateral sclerosis (ALS) remains largely unknown. Either recessive or de novo variants could result in an apparently sporadic occurrence of ALS. In an attempt to find such variants we sequenced the exomes of 44 ALS-unaffected-parents trios. Rare and potentially damaging compound heterozygous variants were found in 27% of ALS patients, homozygous recessive variants in 14% and coding de novo variants in 27%. In 20% of patients more than one of the above variants was present. Genes with recessive variants were enriched in nucleotide binding capacity, ATPase activity, and the dynein heavy chain. Genes with de novo variants were enriched in transcription regulation and cell cycle processes. This trio study indicates that rare private recessive variants could be a mechanism underlying some case of sporadic ALS, and that de novo mutations are also likely to play a part in the disease

    Discovery and genotyping of structural variation from long-read haploid genome sequence data

    Get PDF
    In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that &gt;89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF &gt; 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.</jats:p

    SeqAnt: A web service to rapidly identify and annotate DNA sequence variations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research.</p> <p>Results</p> <p>SeqAnt (<it>Seq</it>uence <it>An</it>notator) is an open source web service and software package that rapidly annotates DNA sequence variants and identifies recessive or compound heterozygous loci in human, mouse, fly, and worm genome sequencing experiments. Variants are characterized with respect to their functional type, frequency, and evolutionary conservation. Annotated variants can be viewed on a web browser, downloaded in a tab-delimited text file, or directly uploaded in a BED format to the UCSC genome browser. To demonstrate the speed of SeqAnt, we annotated a series of publicly available datasets that ranged in size from 37 to 3,439,107 variant sites. The total time to completely annotate these data completely ranged from 0.17 seconds to 28 minutes 49.8 seconds.</p> <p>Conclusion</p> <p>SeqAnt is an open source web service and software package that overcomes a critical bottleneck facing research and clinical geneticists using second-generation sequencing platforms. SeqAnt will prove especially useful for those investigators who lack dedicated bioinformatics personnel or infrastructure in their laboratories.</p

    Building and Improving Reference Genome Assemblies: This paper reviews the problems and algorithms of assembling a complete genome from millions of short DNA sequencing reads

    Get PDF
    A genome sequence assembly provides the foundation for studies of genotypic and phenotypic variation, genome structure, and evolution of the target organism. In the past four decades, there has been a surge of new sequencing technologies, and with these developments, computational scientists have developed new algorithms to improve genome assembly. Here we discuss the relationship between sequencing technology improvements and assembly algorithm development and how these are applied to extend and improve human and nonhuman genome assemblies. © 1963-2012 IEEE

    Single haplotype assembly of the human genome from a hydatidiform mole

    Get PDF
    A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly

    Predicting the public health benefit of vaccinating cattle against Escherichia coli O157

    Get PDF
    Identifying the major sources of risk in disease transmission is key to designing effective controls. However, understanding of transmission dynamics across species boundaries is typically poor, making the design and evaluation of controls particularly challenging for zoonotic pathogens. One such global pathogen is Escherichia coli O157, which causes a serious and sometimes fatal gastrointestinal illness. Cattle are the main reservoir for E. coli O157, and vaccines for cattle now exist. However, adoption of vaccines is being delayed by conflicting responsibilities of veterinary and public health agencies, economic drivers, and because clinical trials cannot easily test interventions across species boundaries, lack of information on the public health benefits. Here, we examine transmission risk across the cattle–human species boundary and show three key results. First, supershedding of the pathogen by cattle is associated with the genetic marker stx2. Second, by quantifying the link between shedding density in cattle and human risk, we show that only the relatively rare supershedding events contribute significantly to human risk. Third, we show that this finding has profound consequences for the public health benefits of the cattle vaccine. A naïve evaluation based on efficacy in cattle would suggest a 50% reduction in risk; however, because the vaccine targets the major source of human risk, we predict a reduction in human cases of nearly 85%. By accounting for nonlinearities in transmission across the human–animal interface, we show that adoption of these vaccines by the livestock industry could prevent substantial numbers of human E. coli O157 cases
    corecore