34,327 research outputs found

    Assembly of non-unique insertion content using next-generation sequencing

    Get PDF
    Recent studies in genomics have highlighted the significance of sequence insertions in determining individual variation. Efforts to discover the content of these sequence insertions have been limited to short insertions and long unique insertions. Much of the inserted sequence in the typical human genome, however, is a mixture of repeated and unique sequence. Current methods are designed to assemble only unique sequence insertions, using reads that do not map to the reference. These methods are not able to assemble repeated sequence insertions, as the reads will map to the reference in a different locus

    Identifying Structural Variation in Haploid Microbial Genomes from Short-Read Resequencing Data Using Breseq

    Get PDF
    Mutations that alter chromosomal structure play critical roles in evolution and disease, including in the origin of new lifestyles and pathogenic traits in microbes. Large-scale rearrangements in genomes are often mediated by recombination events involving new or existing copies of mobile genetic elements, recently duplicated genes, or other repetitive sequences. Most current software programs for predicting structural variation from short-read DNA resequencing data are intended primarily for use on human genomes. They typically disregard information in reads mapping to repeat sequences, and significant post-processing and manual examination of their output is often required to rule out false-positive predictions and precisely describe mutational events. Results: We have implemented an algorithm for identifying structural variation from DNA resequencing data as part of the breseq computational pipeline for predicting mutations in haploid microbial genomes. Our method evaluates the support for new sequence junctions present in a clonal sample from split-read alignments to a reference genome, including matches to repeat sequences. Then, it uses a statistical model of read coverage evenness to accept or reject these predictions. Finally, breseq combines predictions of new junctions and deleted chromosomal regions to output biologically relevant descriptions of mutations and their effects on genes. We demonstrate the performance of breseq on simulated Escherichia coli genomes with deletions generating unique breakpoint sequences, new insertions of mobile genetic elements, and deletions mediated by mobile elements. Then, we reanalyze data from an E. coli K-12 mutation accumulation evolution experiment in which structural variation was not previously identified. Transposon insertions and large-scale chromosomal changes detected by breseq account for similar to 25% of spontaneous mutations in this strain. In all cases, we find that breseq is able to reliably predict structural variation with modest read-depth coverage of the reference genome (>40-fold). Conclusions: Using breseq to predict structural variation should be useful for studies of microbial epidemiology, experimental evolution, synthetic biology, and genetics when a reference genome for a closely related strain is available. In these cases, breseq can discover mutations that may be responsible for important or unintended changes in genomes that might otherwise go undetected.U.S. National Institutes of Health R00-GM087550U.S. National Science Foundation (NSF) DEB-0515729NSF BEACON Center for the Study of Evolution in Action DBI-0939454Cancer Prevention & Research Institute of Texas (CPRIT) RP130124University of Texas at Austin startup fundsUniversity of Texas at AustinCPRIT Cancer Research TraineeshipMolecular Bioscience

    Comparative genome analysis of Wolbachia strain wAu

    Get PDF
    BACKGROUND: Wolbachia intracellular bacteria can manipulate the reproduction of their arthropod hosts, including inducing sterility between populations known as cytoplasmic incompatibility (CI). Certain strains have been identified that are unable to induce or rescue CI, including wAu from Drosophila. Genome sequencing and comparison with CI-inducing related strain wMel was undertaken in order to better understand the molecular basis of the phenotype. RESULTS: Although the genomes were broadly similar, several rearrangements were identified, particularly in the prophage regions. Many orthologous genes contained single nucleotide polymorphisms (SNPs) between the two strains, but a subset containing major differences that would likely cause inactivation in wAu were identified, including the absence of the wMel ortholog of a gene recently identified as a CI candidate in a proteomic study. The comparative analyses also focused on a family of transcriptional regulator genes implicated in CI in previous work, and revealed numerous differences between the strains, including those that would have major effects on predicted function. CONCLUSIONS: The study provides support for existing candidates and novel genes that may be involved in CI, and provides a basis for further functional studies to examine the molecular basis of the phenotype

    Evolutionary relationships in Panicoid grasses based on plastome phylogenomics (Panicoideae; Poaceae)

    Get PDF
    Background: Panicoideae are the second largest subfamily in Poaceae (grass family), with 212 genera and approximately 3316 species. Previous studies have begun to reveal relationships within the subfamily, but largely lack resolution and/or robust support for certain tribal and subtribal groups. This study aims to resolve these relationships, as well as characterize a putative mitochondrial insert in one linage. Results: 35 newly sequenced Panicoideae plastomes were combined in a phylogenomic study with 37 other species: 15 Panicoideae and 22 from outgroups. A robust Panicoideae topology largely congruent with previous studies was obtained, but with some incongruences with previously reported subtribal relationships. A mitochondrial DNA (mtDNA) to plastid DNA (ptDNA) transfer was discovered in the Paspalum lineage. Conclusions: The phylogenomic analysis returned a topology that largely supports previous studies. Five previously recognized subtribes appear on the topology to be non-monophyletic. Additionally, evidence for mtDNA to ptDNA transfer was identified in both Paspalum fimbriatum and P. dilatatum, and suggests a single rare event that took place in a common progenitor. Finally, the framework from this study can guide larger whole plastome sampling to discern the relationships in Cyperochloeae, Steyermarkochloeae, Gynerieae, and other incertae sedis taxa that are weakly supported or unresolved.Fil: Burke, Sean V.. Northern Illinois University; Estados UnidosFil: Wysocki, William P.. Northern Illinois University; Estados UnidosFil: Zuloaga, Fernando Omar. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Botánica Darwinion. Academia Nacional de Ciencias Exactas, Físicas y Naturales. Instituto de Botánica Darwinion; ArgentinaFil: Craine, Joseph M.. Jonah Ventures; Estados UnidosFil: Pires, J. Chris. University of Missouri; Estados UnidosFil: Edger, Patrick P.. Michigan State University; Estados UnidosFil: Mayfield Jones, Dustin. Donald Danforth Plant Science Center; Estados UnidosFil: Clark, Lynn G.. Iowa State University; Estados UnidosFil: Kelchner, Scot A.. University of Idaho; Estados UnidosFil: Duvall, Melvin R.. Northern Illinois University; Estados Unido

    Complete Sequences of Organelle Genomes from the Medicinal Plant Rhazya Stricta (Apocynaceae) and Contrasting Patterns of Mitochondrial Genome Evolution Across Asterids

    Get PDF
    Rhazya stricta is native to arid regions in South Asia and the Middle East and is used extensively in folk medicine to treat a wide range of diseases. In addition to generating genomic resources for this medicinally important plant, analyses of the complete plastid and mitochondrial genomes and a nuclear transcriptome from Rhazya provide insights into inter-compartmental transfers between genomes and the patterns of evolution among eight asterid mitochondrial genomes. Results: The 154,841 bp plastid genome is highly conserved with gene content and order identical to the ancestral organization of angiosperms. The 548,608 bp mitochondrial genome exhibits a number of phenomena including the presence of recombinogenic repeats that generate a multipartite organization, transferred DNA from the plastid and nuclear genomes, and bidirectional DNA transfers between the mitochondrion and the nucleus. The mitochondrial genes sdh3 and rps14 have been transferred to the nucleus and have acquired targeting presequences. In the case of rps14, two copies are present in the nucleus; only one has a mitochondrial targeting presequence and may be functional. Phylogenetic analyses of both nuclear and mitochondrial copies of rps14 across angiosperms suggests Rhazya has experienced a single transfer of this gene to the nucleus, followed by a duplication event. Furthermore, the phylogenetic distribution of gene losses and the high level of sequence divergence in targeting presequences suggest multiple, independent transfers of both sdh3 and rps14 across asterids. Comparative analyses of mitochondrial genomes of eight sequenced asterids indicates a complicated evolutionary history in this large angiosperm clade with considerable diversity in genome organization and size, repeat, gene and intron content, and amount of foreign DNA from the plastid and nuclear genomes. Conclusions: Organelle genomes of Rhazya stricta provide valuable information for improving the understanding of mitochondrial genome evolution among angiosperms. The genomic data have enabled a rigorous examination of the gene transfer events. Rhazya is unique among the eight sequenced asterids in the types of events that have shaped the evolution of its mitochondrial genome. Furthermore, the organelle genomes of R. stricta provide valuable genomic resources for utilizing this important medicinal plant in biotechnology applications.King Abdulaziz UniversityIntegrative Biolog

    De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations.

    Get PDF
    The human reference genome is used extensively in modern biological research. However, a single consensus representation is inadequate to provide a universal reference structure because it is a haplotype among many in the human population. Using 10× Genomics (10×G) "Linked-Read" technology, we perform whole genome sequencing (WGS) and de novo assembly on 17 individuals across five populations. We identify 1842 breakpoint-resolved non-reference unique insertions (NUIs) that, in aggregate, add up to 2.1 Mb of so far undescribed genomic content. Among these, 64% are considered ancestral to humans since they are found in non-human primate genomes. Furthermore, 37% of the NUIs can be found in the human transcriptome and 14% likely arose from Alu-recombination-mediated deletion. Our results underline the need of a set of human reference genomes that includes a comprehensive list of alternative haplotypes to depict the complete spectrum of genetic diversity across populations
    corecore