11,042 research outputs found
Compressing DNA sequence databases with coil
Background: Publicly available DNA sequence databases such as GenBank are large, and are
growing at an exponential rate. The sheer volume of data being dealt with presents serious storage
and data communications problems. Currently, sequence data is usually kept in large "flat files,"
which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which
rarely achieves good compression ratios. While much research has been done on compressing
individual DNA sequences, surprisingly little has focused on the compression of entire databases
of such sequences. In this study we introduce the sequence database compression software coil.
Results: We have designed and implemented a portable software package, coil, for compressing
and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared
towards achieving high compression ratios at the expense of execution time and memory usage
during compression – the compression time represents a "one-off investment" whose cost is
quickly amortised if the resulting compressed file is transmitted many times. Decompression
requires little memory and is extremely fast. We demonstrate a 5% improvement in compression
ratio over state-of-the-art general-purpose compression tools for a large GenBank database file
containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental
additions to a sequence database.
Conclusion: coil presents a compelling alternative to conventional compression of flat files for the
storage and distribution of DNA sequence databases having a narrow distribution of sequence
lengths, such as EST data. Increasing compression levels for databases having a wide distribution of
sequence lengths is a direction for future work
The strength and timing of the mitochondrial bottleneck in salmon suggests a conserved mechanism in vertebrates
In most species mitochondrial DNA (mtDNA) is inherited maternally in an apparently clonal fashion, although how this is achieved remains uncertain. Population genetic studies show not only that individuals can harbor more than one type of mtDNA (heteroplasmy) but that heteroplasmy is common and widespread across a diversity of taxa. Females harboring a mixture of mtDNAs may transmit varying proportions of each mtDNA type (haplotype) to their offspring. However, mtDNA variants are also observed to segregate rapidly between generations despite the high mtDNA copy number in the oocyte, which suggests a genetic bottleneck acts during mtDNA transmission. Understanding the size and timing of this bottleneck is important for interpreting population genetic relationships and for predicting the inheritance of mtDNA based disease, but despite its importance the underlying mechanisms remain unclear. Empirical studies, restricted to mice, have shown that the mtDNA bottleneck could act either at embryogenesis, oogenesis or both. To investigate whether the size and timing of the mitochondrial bottleneck is conserved between distant vertebrates, we measured the genetic variance in mtDNA heteroplasmy at three developmental stages (female, ova and fry) in chinook salmon and applied a new mathematical model to estimate the number of segregating units (N(e)) of the mitochondrial bottleneck between each stage. Using these data we estimate values for mtDNA Ne of 88.3 for oogenesis, and 80.3 for embryogenesis. Our results confirm the presence of a mitochondrial bottleneck in fish, and show that segregation of mtDNA variation is effectively complete by the end of oogenesis. Considering the extensive differences in reproductive physiology between fish and mammals, our results suggest the mechanism underlying the mtDNA bottleneck is conserved in these distant vertebrates both in terms of it magnitude and timing. This finding may lead to improvements in our understanding of mitochondrial disorders and population interpretations using mtDNA data
Color Breaking Baryogenesis
We propose a scenario that generates the observed baryon asymmetry of the
Universe through a multi--step phase transition in which SU(3) color symmetry
is first broken and then restored. A spontaneous violation of
conservation leads to a contribution to the baryon asymmetry that becomes
negligible in the final phase. The baryon asymmetry is therefore produced
exclusively through the electroweak mechanism in the intermediate phase. We
illustrate this scenario with a simple model that reproduces the observed
baryon asymmetry. We discuss how future electric dipole moment and collider
searches may probe this scenario, though future EDM searches would require an
improved sensitivity of several orders of magnitude.Comment: Updated to comply with referees suggestions and mirror published
versio
The impact of origin region and internal migration on Italian fertility
We examine the impact of population distribution on fertility in a nationally representative sample. We exploit detailed life-history data to conduct an event-history analysis of transition to first birth, examining mechanisms that might link migration and fertility: socialization, adaptation, selection, and disruption. Our multivariate analysis examines various socio-demographic traits, the place of birth, and interregional migration. Differences by region and migration stream are partly explained by compositional factors, such as female employment, union type, and education. The analysis presents much evidence for demographic selection and socialization and less for adaptation or disruption. The persistence of the region of origin differentials points to the continuing importance of the context.adaptations, event history analysis, fertility, international migration, selection
- …