8,507 research outputs found

    Compressing DNA sequence databases with coil

    Get PDF
    Background: Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results: We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion: coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work

    France and the Bretton Woods International Monetary System: 1960-1968

    Get PDF
    We reinterpret the commonly held view in the U.S. that France, by following a policy from 1965 to 1968 of deliberately converting their dollar holdings into gold helped perpetuate the collapse of the Bretton Woods International Monetary System. We argue that French international monetary policy under Charles de Gaulle was consistent with strategies developed in the interwar period and the French Plan of 1943. France used proposals to return to an orthodox gold standard as well as conversions of its dollar reserves into gold as tactical threats to induce the United States to initiate the reform of the international monetary system towards a more symmetrical and cooperative gold-exchange standard regime.

    Legal Restraints and Responses to the Allocation and Distribution of Water

    Get PDF

    A Guide to the Examination of Water Tabulations

    Get PDF

    What Will It Take for Bank Insurance to Succeed in the United States

    Get PDF

    Target enrichment of ultraconserved elements from arthropods provides a genomic perspective on relationships among Hymenoptera

    Full text link
    Gaining a genomic perspective on phylogeny requires the collection of data from many putatively independent loci collected across the genome. Among insects, an increasingly common approach to collecting this class of data involves transcriptome sequencing, because few insects have high-quality genome sequences available; assembling new genomes remains a limiting factor; the transcribed portion of the genome is a reasonable, reduced subset of the genome to target; and the data collected from transcribed portions of the genome are similar in composition to the types of data with which biologists have traditionally worked (e.g., exons). However, molecular techniques requiring RNA as a template are limited to using very high quality source materials, which are often unavailable from a large proportion of biologically important insect samples. Recent research suggests that DNA-based target enrichment of conserved genomic elements offers another path to collecting phylogenomic data across insect taxa, provided that conserved elements are present in and can be collected from insect genomes. Here, we identify a large set (n==1510) of ultraconserved elements (UCE) shared among the insect order Hymenoptera. We use in silico analyses to show that these loci accurately reconstruct relationships among genome-enabled Hymenoptera, and we design a set of baits for enriching these loci that researchers can use with DNA templates extracted from a variety of sources. We use our UCE bait set to enrich an average of 721 UCE loci from 30 hymenopteran taxa, and we use these UCE loci to reconstruct phylogenetic relationships spanning very old (≥\geq220 MYA) to very young (≤\leq1 MYA) divergences among hymenopteran lineages. In contrast to a recent study addressing hymenopteran phylogeny using transcriptome data, we found ants to be sister to all remaining aculeate lineages with complete support
    • …
    corecore