21,620 research outputs found
Characterization of DNA methylation as a function of biological complexity via dinucleotide inter-distances
We perform a statistical study of the distances between successive
occurrencies of a given dinucleotide in the DNA sequence for a number of
organisms of different complexity. Our analysis highlights peculiar features of
the dinucleotide CG distribution in mammalian DNA, pointing towards a
connection with the role of such dinucleotide in DNA methylation. While the CG
distributions of mammals exhibit exponential tails with comparable parameters,
the picture for the other organisms studied (e.g., fish, insects, bacteria and
viruses) is more heterogeneous, possibly because in these organisms DNA
methylation has different functional roles. Our analysis suggests that the
distribution of the distances between dinucleotides CG provides useful insights
in characterizing and classifying organisms in terms of methylation
functionalities.Comment: 13 pages, 5 figures. To be published in the Philosophical
Transactions A theme issue "DNA as information
Bayesian modeling of recombination events in bacterial populations
Background: We consider the discovery of recombinant segments jointly with their origins within multilocus DNA sequences from bacteria representing heterogeneous populations of fairly closely related species. The currently available methods for recombination detection capable of probabilistic characterization of uncertainty have a limited applicability in practice as the number of
strains in a data set increases.
Results: We introduce a Bayesian spatial structural model representing the continuum of origins over sites within the observed sequences, including a probabilistic characterization of uncertainty related to the origin of any particular site. To enable a statistically accurate and practically feasible approach to the analysis of large-scale data sets representing a single genus, we have developed a novel software tool (BRAT, Bayesian Recombination Tracker) implementing the model and the
corresponding learning algorithm, which is capable of identifying the posterior optimal structure and to estimate the marginal posterior probabilities of putative origins over the sites.
Conclusion: A multitude of challenging simulation scenarios and an analysis of real data from seven
housekeeping genes of 120 strains of genus Burkholderia are used to illustrate the possibilities
offered by our approach. The software is freely available for download at URL http://web.abo.fi/fak/
mnf//mate/jc/software/brat.html
BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction
A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN
Genomic Selective Constraints in Murid Noncoding DNA
Recent work has suggested that there are many more selectively constrained, functional noncoding than coding sites in mammalian genomes. However, little is known about how selective constraint varies amongst different classes of noncoding DNA. We estimated the magnitude of selective constraint on a large dataset of mouse-rat gene orthologs and their surrounding noncoding DNA. Our analysis indicates that there are more than three times as many selectively constrained, nonrepetitive sites within noncoding DNA as in coding DNA in murids. The majority of these constrained noncoding sites appear to be located within intergenic regions, at distances greater than 5 kilobases from known genes. Our study also shows that in murids, intron length and mean intronic selective constraint are negatively correlated with intron ordinal number. Our results therefore suggest that functional intronic sites tend to accumulate toward the 5' end of murid genes. Our analysis also reveals that mean number of selectively constrained noncoding sites varies substantially with the function of the adjacent gene. We find that, among others, developmental and neuronal genes are associated with the greatest numbers of putatively functional noncoding sites compared with genes involved in electron transport and a variety of metabolic processes. Combining our estimates of the total number of constrained coding and noncoding bases we calculate that over twice as many deleterious mutations have occurred in intergenic regions as in known genic sequence and that the total genomic deleterious point mutation rate is 0.91 per diploid genome, per generation. This estimated rate is over twice as large as a previous estimate in murids
Glassy transition in a disordered model for the RNA secondary structure
We numerically study a disordered model for the RNA secondary structure and
we find that it undergoes a phase transition, with a breaking of the replica
symmetry in the low temperature region (like in spin glasses). Our results are
based on the exact evaluation of the partition function.Comment: 4 pages, 3 figure
Extending colonic mucosal microbiome analysis - Assessment of colonic lavage as a proxy for endoscopic colonic biopsies
This study was supported through GI Research funds and MRC Grant Ref: MR/M00533X/1 to GH.Peer reviewedPublisher PD
Ground state and glass transition of the RNA secondary structure
RNA molecules form a sequence-specific self-pairing pattern at low
temperatures. We analyze this problem using a random pairing energy model as
well as a random sequence model that includes a base stacking energy in favor
of helix propagation. The free energy cost for separating a chain into two
equal halves offers a quantitative measure of sequence specific pairing. In the
low temperature glass phase, this quantity grows quadratically with the
logarithm of the chain length, but it switches to a linear behavior of entropic
origin in the high temperature molten phase. Transition between the two phases
is continuous, with characteristics that resemble those of a disordered elastic
manifold in two dimensions. For designed sequences, however, a power-law
distribution of pairing energies on a coarse-grained level may be more
appropriate. Extreme value statistics arguments then predict a power-law growth
of the free energy cost to break a chain, in agreement with numerical
simulations. Interestingly, the distribution of pairing distances in the ground
state secondary structure follows a remarkable power-law with an exponent -4/3,
independent of the specific assumptions for the base pairing energies
Free energy landscape and characteristic forces for the initiation of DNA unzipping
DNA unzipping, the separation of its double helix into single strands, is
crucial in modulating a host of genetic processes. Although the large-scale
separation of double-stranded DNA has been studied with a variety of
theoretical and experimental techniques, the minute details of the very first
steps of unzipping are still unclear. Here, we use atomistic molecular dynamics
(MD) simulations, coarse-grained simulations and a statistical-mechanical model
to study the initiation of DNA unzipping by an external force. The calculation
of the potential of mean force profiles for the initial separation of the first
few terminal base pairs in a DNA oligomer reveal that forces ranging between
130 and 230 pN are needed to disrupt the first base pair, values of an order of
magnitude larger than those needed to disrupt base pairs in partially unzipped
DNA. The force peak has an "echo," of approximately 50 pN, at the distance that
unzips the second base pair. We show that the high peak needed to initiate
unzipping derives from a free energy basin that is distinct from the basins of
subsequent base pairs because of entropic contributions and we highlight the
microscopic origin of the peak. Our results suggest a new window of exploration
for single molecule experiments.Comment: 25 pages, 6 figures , Accepted for publication in Biophysical Journa
- …