78 research outputs found

    A compartmentalized approach to the assembly of physical maps

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Physical maps have been historically one of the cornerstones of genome sequencing and map-based cloning strategies. They also support marker assisted breeding and EST mapping. The problem of building a high quality physical map is computationally challenging due to unavoidable noise in the input fingerprint data.</p> <p>Results</p> <p>We propose a novel compartmentalized method for the assembly of high quality physical maps from fingerprinted clones. The knowledge of genetic markers enables us to group clones into clusters so that clones in the same cluster are more likely to overlap. For each cluster of clones, a local physical map is first constructed using FingerPrinted Contigs (FPC). Then, all the individual maps are carefully merged into the final physical map. Experimental results on the genomes of rice and barley demonstrate that the compartmentalized assembly produces significantly more accurate maps, and that it can detect and isolate clones that would induce "chimeric" contigs if used in the final assembly.</p> <p>Conclusion</p> <p>The software is available for download at <url>http://www.cs.ucr.edu/~sbozdag/assembler/</url></p

    The Sequence of the Human Genome

    Get PDF
    A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly strategies—a whole-genome assembly and a regional chromosome assembly—were used, each combining sequence data from Celera and the publicly funded genome effort. The public data were shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly funded group. This brought the effective cov- erage in the assemblies to eightfold, reducing the number and size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two assembly strategies yielded very similar results that largely agree with independent mapping data. The assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of 10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for which there was strong corroborating evidence and an additiona

    BACCardI - a tool for the validation of genomic assemblies, assisting genome finishing and intergenome comparison

    Get PDF
    Bartels D, Kespohl S, Albaum S, et al. BACCardI - a tool for the validation of genomic assemblies, assisting genome finishing and intergenome comparison. Bioinformatics. 2005;21(7):853-859.Summary: We provide the graphical tool BACCardI for the construction of virtual clone maps from standard assembler output files or BLAST based sequence comparisons. This new tool has been applied to numerous genome projects to solve various problems including (a) validation of whole genome shotgun assemblies, (b) support for contig ordering in the finishing phase of a genome project, and (c) intergenome comparison between related strains when only one of the strains has been sequenced and a large insert library is available for the other. The BACCardI software can seamlessly interact with various sequence assembly packages. Motivation: Genomic assemblies generated from sequence information need to be validated by independent methods such as physical maps. The time-consuming task of building physical maps can be circumvented by virtual clone maps derived from read pair information of large insert libraries

    Application of a superword array in genome assembly

    Get PDF
    We introduce a data structure called a superword array for finding quickly matches between DNA sequences. The superword array possesses some desirable features of the lookup table and suffix array. We describe simple algorithms for constructing and using a superword array to find pairs of sequences that share a unique superword. The algorithms are implemented in a genome assembly program called PCAP.REP for computation of overlaps between reads. Experimental results produced by PCAP.REP and PCAP on a whole-genome dataset show that PCAP.REP produced a more accurate and contiguous assembly than PCAP

    Development and evaluation of an enterovirus D68 real-time reverse transcriptase PCR assay

    Get PDF
    We have developed and evaluated a real-time reverse transcriptase PCR (RT-PCR) assay for the detection of human enterovirus D68 (EV-D68) in clinical specimens. This assay was developed in response to the unprecedented 2014 nationwide EV-D68 outbreak in the United States associated with severe respiratory illness. As part of our evaluation of the outbreak, we sequenced and published the genome sequence of the EV-D68 virus circulating in St. Louis, MO. This sequence, along with other GenBank sequences from past EV-D68 occurrences, was used to computationally select a region of EV-D68 appropriate for targeting in a strain-specific RT-PCR assay. The RT-PCR assay amplifies a segment of the VP1 gene, with an analytic limit of detection of 4 copies per reaction, and it was more sensitive than commercially available assays that detect enteroviruses and rhinoviruses without distinguishing between the two, including three multiplex respiratory panels approved for clinical use by the FDA. The assay did not detect any other enteroviruses or rhinoviruses tested and did detect divergent strains of EV-D68, including the first EV-D68 strain (Fermon) identified in California in 1962. This assay should be useful for identifying and studying current and future outbreaks of EV-D68 viruses

    A Graph-Theoretical Approach to the Selection of the Minimum Tiling Path from a Physical Map

    Get PDF
    The problem of computing the minimum tiling path (MTP) from a set of clones arranged in a physical map is a cornerstone of hierarchical (clone-by-clone) genome sequencing projects. We formulate this problem in a graph theoretical framework, and then solve by a combination of minimum hitting set and minimum spanning tree algorithms. The tool implementing this strategy, called FMTP, shows improved performance compared to the widely used software FPC. When we execute FMTP and FPC on the same physical map, the MTP produced by FMTP covers a higher portion of the genome, and uses a smaller number of clones. For instance, on the rice genome the MTP produced by our tool would reduce by about 11 percent the cost of a clone-by-clone sequencing project. Source code, benchmark data sets, and documentation of FMTP are freely available at \u3ehttp://code.google.com/p/fingerprint-based-minimal-tiling-path/ under MIT license

    Novel Plasmids and Resistance Phenotypes in Yersinia pestis: Unique Plasmid Inventory of Strain Java 9 Mediates High Levels of Arsenic Resistance

    Get PDF
    Growing evidence suggests that the plasmid repertoire of Yersinia pestis is not restricted to the three classical virulence plasmids. The Java 9 strain of Y. pestis is a biovar Orientalis isolate obtained from a rat in Indonesia. Although it lacks the Y. pestis-specific plasmid pMT, which encodes the F1 capsule, it retains virulence in mouse and non-human primate animal models. While comparing diverse Y. pestis strains using subtractive hybridization, we identified sequences in Java 9 that were homologous to a Y. enterocolitica strain carrying the transposon Tn2502, which is known to encode arsenic resistance. Here we demonstrate that Java 9 exhibits high levels of arsenic and arsenite resistance mediated by a novel promiscuous class II transposon, named Tn2503. Arsenic resistance was self-transmissible from Java 9 to other Y. pestis strains via conjugation. Genomic analysis of the atypical plasmid inventory of Java 9 identified pCD and pPCP plasmids of atypical size and two previously uncharacterized cryptic plasmids. Unlike the Tn2502-mediated arsenic resistance encoded on the Y. enterocolitica virulence plasmid; the resistance loci in Java 9 are found on all four indigenous plasmids, including the two novel cryptic plasmids. This unique mobilome introduces more than 105 genes into the species gene pool. The majority of these are encoded by the two entirely novel self-transmissible plasmids, which show partial homology and synteny to other enterics. In contrast to the reductive evolution in Y. pestis, this study underlines the major impact of a dynamic mobilome and lateral acquisition in the genome evolution of the plague bacterium

    Assembly complexity of prokaryotic genomes using short reads

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>De Bruijn graphs are a theoretical framework underlying several modern genome assembly programs, especially those that deal with very short reads. We describe an application of de Bruijn graphs to analyze the global repeat structure of prokaryotic genomes.</p> <p>Results</p> <p>We provide the first survey of the repeat structure of a large number of genomes. The analysis gives an upper-bound on the performance of genome assemblers for <it>de novo </it>reconstruction of genomes across a wide range of read lengths. Further, we demonstrate that the majority of genes in prokaryotic genomes can be reconstructed uniquely using very short reads even if the genomes themselves cannot. The non-reconstructible genes are overwhelmingly related to mobile elements (transposons, IS elements, and prophages).</p> <p>Conclusions</p> <p>Our results improve upon previous studies on the feasibility of assembly with short reads and provide a comprehensive benchmark against which to compare the performance of the short-read assemblers currently being developed.</p
    • 

    corecore