12 research outputs found

    A space-efficient construction of the Burrows Wheeler transform for genomic data

    No full text
    Algorithms for exact string matching have substantial application in computational biology. Time-efficient data structures which support a variety of exact string matching queries, such as the suffix tree and the suffix array, have been applied to such problems. As sequence databases grow, more space-efficient approaches to exact matching are becoming more important. One such data structure, the compressed suffix array (CSA), based on the Burrows-Wheeler transform, has been shown to require memory which is nearly equal to the memory requirements of the original database, while supporting common sorts of query problems time-efficiently. However, building a CSA from a sequence in efficient space and time is challenging. In 2002, the first space-efficient CSA construction algorithm was presented. That implementation used (1+2 log 2 |Σ|)(1+ɛ) bits per character (where ɛ is a small fraction). The construction algorithm ran in as much as twice that space, in O(|Σ|n log(n)) time. We have created an implementation which can also achieve these asymptotic bounds, but for small alphabets, only uses 1 (1 + |Σ|)(1 + ɛ) bits per character, a factor of 2 2 less space for nucleotide alphabets. We present time and space results for the CSA construction and querying of our implementation on publicly available genome data which demonstrate the practicality of this approach

    PARAMESH: A Parallel Adaptive Mesh Refinement Community Toolkit

    No full text
    In this paper, we describe a community toolkit which is designed to provide parallel support with adaptive mesh capability for a large and important class of computational models, those using structured, logically cartesian meshes. The package of Fortran 90 subroutines, called PARAMESH, is designed to provide an application developer with an easy route to extend an existing serial code which uses a logically cartesian structured mesh into a parallel code with adaptive mesh refinement. Alternatively, in its simplest use, and with minimal effort, it can operate as a domain decomposition tool for users who want to parallelize their serial codes, but who do not wish to use adaptivity. The package can provide them with an incremental evolutionary path for their code, converting it first to uniformly refined parallel code, and then later if they so desire, adding adaptivity

    2196 A Whole-Genome Assembly of Drosophila

    No full text
    We report on the quality of a whole-genome assembly of Drosophila melanogaster and the nature of the computer algorithms that accomplished it. Three independent external data sources essentially agree with and support the assembly’s sequence and ordering of contigs across the euchromatic portion of the genome. In addition, there are isolated contigs that we believe represent nonrepetitive pockets within the heterochromatin of the centromeres. Comparison with a previously sequenced 2.9megabase region indicates that sequencing accuracy within nonrepetitive segments is greater than 99.99 % without manual curation. As such, this initial reconstruction of the Drosophila sequence should be of substantial value to the scientific community. The primary obstacle to determining the sequence of a very large genome is that, with current technology, one can directly determine the sequence of at most a thousan

    Whole-genome shotgun assembly and comparison of human genome assemblies

    No full text
    We report a whole-genome shotgun assembly (called WGSA) of the human genome generated at Celera in 2001. The Celera-generated shotgun data set consisted of 27 million sequencing reads organized in pairs by virtue of end-sequencing 2-kbp, 10-kbp, and 50-kbp inserts from shotgun clone libraries. The quality-trimmed reads covered the genome 5.3 times, and the inserts from which pairs of reads were obtained covered the genome 39 times. With the nearly complete human DNA sequence [National Center for Biotechnology Information (NCBI) Build 34] now available, it is possible to directly assess the quality, accuracy, and completeness of WGSA and of the first reconstructions of the human genome reported in two landmark papers in February 2001 [Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M., Evans, C. A., Holt, R. A., et al. (2001) Science 291, 1304–1351; International Human Genome Sequencing Consortium (2001) Nature 409, 860–921]. The analysis of WGSA shows 97% order and orientation agreement with NCBI Build 34, where most of the 3% of sequence out of order is due to scaffold placement problems as opposed to assembly errors within the scaffolds themselves. In addition, WGSA fills some of the remaining gaps in NCBI Build 34. The early genome sequences all covered about the same amount of the genome, but they did so in different ways. The Celera results provide more order and orientation, and the consortium sequence provides better coverage of exact and nearly exact repeats

    Factors influencing the density of aerobic granular sludge.

    No full text
    Contains fulltext : 125448.pdf (publisher's version ) (Closed access)In the present study, the factors influencing density of granular sludge particles were evaluated. Granules consist of microbes, precipitates and of extracellular polymeric substance. The volume fractions of the bacterial layers were experimentally estimated by fluorescent in situ hybridisation staining. The volume fraction occupied by precipitates was determined by computed tomography scanning. PHREEQC was used to estimate potential formation of precipitates to determine a density of the inorganic fraction. Densities of bacteria were investigated by Percoll density centrifugation. The volume fractions were then coupled with the corresponding densities and the total density of a granule was calculated. The sensitivity of the density of the entire granule on the corresponding settling velocity was evaluated by changing the volume fractions of precipitates or bacteria in a settling model. Results from granules originating from a Nereda reactor for simultaneous phosphate COD and nitrogen removal revealed that phosphate-accumulating organisms (PAOs) had a higher density than glycogen-accumulating organisms leading to significantly higher settling velocities for PAO-dominated granules explaining earlier observations of the segregation of the granular sludge bed inside reactors. The model showed that a small increase in the volume fraction of precipitates (1-5 %) strongly increased the granular density and thereby the settling velocity. For nitritation-anammox granular sludge, mainly granular diameter and not density differences are causing a segregation of the biomass in the bed.1 augustus 201

    The genome sequence of the malaria mosquito Anopheles gambiae

    No full text
    Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect
    corecore