    Compressing DNA sequence databases with coil

    Background: Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results: We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion: coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work

    The strength and timing of the mitochondrial bottleneck in salmon suggests a conserved mechanism in vertebrates

    In most species mitochondrial DNA (mtDNA) is inherited maternally in an apparently clonal fashion, although how this is achieved remains uncertain. Population genetic studies show not only that individuals can harbor more than one type of mtDNA (heteroplasmy) but that heteroplasmy is common and widespread across a diversity of taxa. Females harboring a mixture of mtDNAs may transmit varying proportions of each mtDNA type (haplotype) to their offspring. However, mtDNA variants are also observed to segregate rapidly between generations despite the high mtDNA copy number in the oocyte, which suggests a genetic bottleneck acts during mtDNA transmission. Understanding the size and timing of this bottleneck is important for interpreting population genetic relationships and for predicting the inheritance of mtDNA based disease, but despite its importance the underlying mechanisms remain unclear. Empirical studies, restricted to mice, have shown that the mtDNA bottleneck could act either at embryogenesis, oogenesis or both. To investigate whether the size and timing of the mitochondrial bottleneck is conserved between distant vertebrates, we measured the genetic variance in mtDNA heteroplasmy at three developmental stages (female, ova and fry) in chinook salmon and applied a new mathematical model to estimate the number of segregating units (N(e)) of the mitochondrial bottleneck between each stage. Using these data we estimate values for mtDNA Ne of 88.3 for oogenesis, and 80.3 for embryogenesis. Our results confirm the presence of a mitochondrial bottleneck in fish, and show that segregation of mtDNA variation is effectively complete by the end of oogenesis. Considering the extensive differences in reproductive physiology between fish and mammals, our results suggest the mechanism underlying the mtDNA bottleneck is conserved in these distant vertebrates both in terms of it magnitude and timing. This finding may lead to improvements in our understanding of mitochondrial disorders and population interpretations using mtDNA data

    Color Breaking Baryogenesis

    We propose a scenario that generates the observed baryon asymmetry of the Universe through a multi--step phase transition in which SU(3) color symmetry is first broken and then restored. A spontaneous violation of B−LB-L conservation leads to a contribution to the baryon asymmetry that becomes negligible in the final phase. The baryon asymmetry is therefore produced exclusively through the electroweak mechanism in the intermediate phase. We illustrate this scenario with a simple model that reproduces the observed baryon asymmetry. We discuss how future electric dipole moment and collider searches may probe this scenario, though future EDM searches would require an improved sensitivity of several orders of magnitude.Comment: Updated to comply with referees suggestions and mirror published versio

    The impact of origin region and internal migration on Italian fertility

    We examine the impact of population distribution on fertility in a nationally representative sample. We exploit detailed life-history data to conduct an event-history analysis of transition to first birth, examining mechanisms that might link migration and fertility: socialization, adaptation, selection, and disruption. Our multivariate analysis examines various socio-demographic traits, the place of birth, and interregional migration. Differences by region and migration stream are partly explained by compositional factors, such as female employment, union type, and education. The analysis presents much evidence for demographic selection and socialization and less for adaptation or disruption. The persistence of the region of origin differentials points to the continuing importance of the context.adaptations, event history analysis, fertility, international migration, selection

    Iphigenie auf Tauris, Torquato Tasso, and the imagery of character

    Publisher PDFPeer reviewe

    Homo Laborans: Work in Modern Catholic Social Thought

