365 research outputs found

    Minimus: a fast, lightweight genome assembler

    Get PDF
    BACKGROUND: Genome assemblers have grown very large and complex in response to the need for algorithms to handle the challenges of large whole-genome sequencing projects. Many of the most common uses of assemblers, however, are best served by a simpler type of assembler that requires fewer software components, uses less memory, and is far easier to install and run. RESULTS: We have developed the Minimus assembler to address these issues, and tested it on a range of assembly problems. We show that Minimus performs well on several small assembly tasks, including the assembly of viral genomes, individual genes, and BAC clones. In addition, we evaluate Minimus' performance in assembling bacterial genomes in order to assess its suitability as a component of a larger assembly pipeline. We show that, unlike other software currently used for these tasks, Minimus produces significantly fewer assembly errors, at the cost of generating a more fragmented assembly. CONCLUSION: We find that for small genomes and other small assembly tasks, Minimus is faster and far more flexible than existing tools. Due to its small size and modular design Minimus is perfectly suited to be a component of complex assembly pipelines. Minimus is released as an open-source software project and the code is available as part of the AMOS project at Sourceforge

    The driver landscape of sporadic chordoma.

    Get PDF
    Chordoma is a malignant, often incurable bone tumour showing notochordal differentiation. Here, we defined the somatic driver landscape of 104 cases of sporadic chordoma. We reveal somatic duplications of the notochordal transcription factor brachyury (T) in up to 27% of cases. These variants recapitulate the rearrangement architecture of the pathogenic germline duplications of T that underlie familial chordoma. In addition, we find potentially clinically actionable PI3K signalling mutations in 16% of cases. Intriguingly, one of the most frequently altered genes, mutated exclusively by inactivating mutation, was LYST (10%), which may represent a novel cancer gene in chordoma.Chordoma is a rare often incurable malignant bone tumour. Here, the authors investigate driver mutations of sporadic chordoma in 104 cases, revealing duplications in notochordal transcription factor brachyury (T), PI3K signalling mutations, and mutations in LYST, a potential novel cancer gene in chordoma

    ReCoil - an algorithm for compression of extremely large datasets of dna data

    Get PDF
    The growing volume of generated DNA sequencing data makes the problem of its long term storage increasingly important. In this work we present ReCoil - an I/O efficient external memory algorithm designed for compression of very large collections of short reads DNA data. Typically each position of DNA sequence is covered by multiple reads of a short read dataset and our algorithm makes use of resulting redundancy to achieve high compression rate

    Gene-Boosted Assembly of a Novel Bacterial Genome from Very Short Reads

    Get PDF
    Recent improvements in technology have made DNA sequencing dramatically faster and more efficient than ever before. The new technologies produce highly accurate sequences, but one drawback is that the most efficient technology produces the shortest read lengths. Short-read sequencing has been applied successfully to resequence the human genome and those of other species but not to whole-genome sequencing of novel organisms. Here we describe the sequencing and assembly of a novel clinical isolate of Pseudomonas aeruginosa, strain PAb1, using very short read technology. From 8,627,900 reads, each 33 nucleotides in length, we assembled the genome into one scaffold of 76 ordered contiguous sequences containing 6,290,005 nucleotides, including one contig spanning 512,638 nucleotides, plus an additional 436 unordered contigs containing 416,897 nucleotides. Our method includes a novel gene-boosting algorithm that uses amino acid sequences from predicted proteins to build a better assembly. This study demonstrates the feasibility of very short read sequencing for the sequencing of bacterial genomes, particularly those for which a related species has been sequenced previously, and expands the potential application of this new technology to most known prokaryotic species

    Selection of Metal-poor Giant Stars Using the Sloan Digital Sky Survey Photometric System

    Get PDF
    We present a method for photometric selection of metal-poor halo giants from the imaging data of the Sloan Digital Sky Survey (SDSS). These stars are offset from the stellar locus in the (g-r) vs. (u-g) color-color diagram. Based on a sample of 29 candidates for which spectra were taken, we derive a selection efficiency of the order of 50%, for stars brighter than r∼17mr \sim 17^m. The candidates selected in 400 deg2^2 of sky from the SDSS Early Data Release trace the known halo structures (tidal streams from the Sagittarius dwarf galaxy, the Draco dwarf spheroidal galaxy), indicating that such a color-selected sample can be used to study the halo structure even without spectroscopic information. This method, and supplemental techniques for selecting halo stars, such as RR Lyrae stars and other blue horizontal branch stars, can produce an unprecedented three-dimensional map of the Galactic halo based on the SDSS imaging survey.Comment: 8 pages, 3 figures, 1 table. Accepted by Ap

    Towards an Intelligent Tutor for Mathematical Proofs

    Get PDF
    Computer-supported learning is an increasingly important form of study since it allows for independent learning and individualized instruction. In this paper, we discuss a novel approach to developing an intelligent tutoring system for teaching textbook-style mathematical proofs. We characterize the particularities of the domain and discuss common ITS design models. Our approach is motivated by phenomena found in a corpus of tutorial dialogs that were collected in a Wizard-of-Oz experiment. We show how an intelligent tutor for textbook-style mathematical proofs can be built on top of an adapted assertion-level proof assistant by reusing representations and proof search strategies originally developed for automated and interactive theorem proving. The resulting prototype was successfully evaluated on a corpus of tutorial dialogs and yields good results.Comment: In Proceedings THedu'11, arXiv:1202.453

    Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzae pv. oryzae PXO99A

    Get PDF
    Background: Xanthomonas oryzae pv. oryzae causes bacterial blight of rice (Oryza sativa L.), a major disease that constrains production of this staple crop in many parts of the world. We report here on the complete genome sequence of strain PXO99A and its comparison to two previously sequenced strains, KACC10331 and MAFF311018, which are highly similar to one another. Results: The PXO99 A genome is a single circular chromosome of 5,240,075 bp, considerably longer than the genomes of the other strains (4,941,439 bp and 4,940,217 bp, respectively), and it contains 5083 protein-coding genes, including 87 not found in KACC10331 or MAFF311018. PXO99A contains a greater number of virulence-associated transcription activator-like effector genes and has at least ten major chromosomal rearrangements relative to KACC10331 and MAFF311018. PXO99 A contains numerous copies of diverse insertion sequence elements, members of which are associated with 7 out of 10 of the major rearrangements. A rapidly-evolving CRISPR (clustered regularly interspersed short palindromic repeats) region contains evidence of dozens of phage infections unique to the PXO99A lineage. PXO99A also contains a unique, near-perfect tandem repeat of 212 kilobases close to the replication terminus. Conclusion: Our results provide striking evidence of genome plasticity and rapid evolution within Xanthomonas oryzae pv. oryzae. The comparisons point to sources of genomic variation and candidates for strain-specific adaptations of this pathogen that help to explain the extraordinary diversity of Xanthomonas oryzae pv. oryzae genotypes and races that have been isolated from around the world. © 2008 Salzberg et al; licensee BioMed Central Ltd

    Revisiting Brain Atrophy and Its Relationship to Disability in Multiple Sclerosis

    Get PDF
    Brain atrophy is a well-accepted imaging biomarker of multiple sclerosis (MS) that partially correlates with both physical disability and cognitive impairment.Based on MRI scans of 60 MS cases and 37 healthy volunteers, we measured the volumes of white matter (WM) lesions, cortical gray matter (GM), cerebral WM, caudate nucleus, putamen, thalamus, ventricles, and brainstem using a validated and completely automated segmentation method. We correlated these volumes with the Expanded Disability Status Scale (EDSS), MS Severity Scale (MSSS), MS Functional Composite (MSFC), and quantitative measures of ankle strength and toe sensation. Normalized volumes of both cortical and subcortical GM structures were abnormally low in the MS group, whereas no abnormality was found in the volume of the cerebral WM. High physical disability was associated with low cerebral WM, thalamus, and brainstem volumes (partial correlation coefficients ~0.3-0.4) but not with low cortical GM volume. Thalamus volumes were inversely correlated with lesion load (r = -0.36, p<0.005).The GM is atrophic in MS. Although lower WM volume is associated with greater disability, as might be expected, WM volume was on average in the normal range. This paradoxical result might be explained by the presence of coexisting pathological processes, such as tissue damage and repair, that cause both atrophy and hypertrophy and that underlie the observed disability
    • …
    corecore