21,620 research outputs found

    Characterization of DNA methylation as a function of biological complexity via dinucleotide inter-distances

    Full text link
    We perform a statistical study of the distances between successive occurrencies of a given dinucleotide in the DNA sequence for a number of organisms of different complexity. Our analysis highlights peculiar features of the dinucleotide CG distribution in mammalian DNA, pointing towards a connection with the role of such dinucleotide in DNA methylation. While the CG distributions of mammals exhibit exponential tails with comparable parameters, the picture for the other organisms studied (e.g., fish, insects, bacteria and viruses) is more heterogeneous, possibly because in these organisms DNA methylation has different functional roles. Our analysis suggests that the distribution of the distances between dinucleotides CG provides useful insights in characterizing and classifying organisms in terms of methylation functionalities.Comment: 13 pages, 5 figures. To be published in the Philosophical Transactions A theme issue "DNA as information

    Bayesian modeling of recombination events in bacterial populations

    Get PDF
    Background: We consider the discovery of recombinant segments jointly with their origins within multilocus DNA sequences from bacteria representing heterogeneous populations of fairly closely related species. The currently available methods for recombination detection capable of probabilistic characterization of uncertainty have a limited applicability in practice as the number of strains in a data set increases. Results: We introduce a Bayesian spatial structural model representing the continuum of origins over sites within the observed sequences, including a probabilistic characterization of uncertainty related to the origin of any particular site. To enable a statistically accurate and practically feasible approach to the analysis of large-scale data sets representing a single genus, we have developed a novel software tool (BRAT, Bayesian Recombination Tracker) implementing the model and the corresponding learning algorithm, which is capable of identifying the posterior optimal structure and to estimate the marginal posterior probabilities of putative origins over the sites. Conclusion: A multitude of challenging simulation scenarios and an analysis of real data from seven housekeeping genes of 120 strains of genus Burkholderia are used to illustrate the possibilities offered by our approach. The software is freely available for download at URL http://web.abo.fi/fak/ mnf//mate/jc/software/brat.html

    BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

    Get PDF
    A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

    Genomic Selective Constraints in Murid Noncoding DNA

    Get PDF
    Recent work has suggested that there are many more selectively constrained, functional noncoding than coding sites in mammalian genomes. However, little is known about how selective constraint varies amongst different classes of noncoding DNA. We estimated the magnitude of selective constraint on a large dataset of mouse-rat gene orthologs and their surrounding noncoding DNA. Our analysis indicates that there are more than three times as many selectively constrained, nonrepetitive sites within noncoding DNA as in coding DNA in murids. The majority of these constrained noncoding sites appear to be located within intergenic regions, at distances greater than 5 kilobases from known genes. Our study also shows that in murids, intron length and mean intronic selective constraint are negatively correlated with intron ordinal number. Our results therefore suggest that functional intronic sites tend to accumulate toward the 5' end of murid genes. Our analysis also reveals that mean number of selectively constrained noncoding sites varies substantially with the function of the adjacent gene. We find that, among others, developmental and neuronal genes are associated with the greatest numbers of putatively functional noncoding sites compared with genes involved in electron transport and a variety of metabolic processes. Combining our estimates of the total number of constrained coding and noncoding bases we calculate that over twice as many deleterious mutations have occurred in intergenic regions as in known genic sequence and that the total genomic deleterious point mutation rate is 0.91 per diploid genome, per generation. This estimated rate is over twice as large as a previous estimate in murids

    Glassy transition in a disordered model for the RNA secondary structure

    Full text link
    We numerically study a disordered model for the RNA secondary structure and we find that it undergoes a phase transition, with a breaking of the replica symmetry in the low temperature region (like in spin glasses). Our results are based on the exact evaluation of the partition function.Comment: 4 pages, 3 figure

    Ground state and glass transition of the RNA secondary structure

    Full text link
    RNA molecules form a sequence-specific self-pairing pattern at low temperatures. We analyze this problem using a random pairing energy model as well as a random sequence model that includes a base stacking energy in favor of helix propagation. The free energy cost for separating a chain into two equal halves offers a quantitative measure of sequence specific pairing. In the low temperature glass phase, this quantity grows quadratically with the logarithm of the chain length, but it switches to a linear behavior of entropic origin in the high temperature molten phase. Transition between the two phases is continuous, with characteristics that resemble those of a disordered elastic manifold in two dimensions. For designed sequences, however, a power-law distribution of pairing energies on a coarse-grained level may be more appropriate. Extreme value statistics arguments then predict a power-law growth of the free energy cost to break a chain, in agreement with numerical simulations. Interestingly, the distribution of pairing distances in the ground state secondary structure follows a remarkable power-law with an exponent -4/3, independent of the specific assumptions for the base pairing energies

    Free energy landscape and characteristic forces for the initiation of DNA unzipping

    Get PDF
    DNA unzipping, the separation of its double helix into single strands, is crucial in modulating a host of genetic processes. Although the large-scale separation of double-stranded DNA has been studied with a variety of theoretical and experimental techniques, the minute details of the very first steps of unzipping are still unclear. Here, we use atomistic molecular dynamics (MD) simulations, coarse-grained simulations and a statistical-mechanical model to study the initiation of DNA unzipping by an external force. The calculation of the potential of mean force profiles for the initial separation of the first few terminal base pairs in a DNA oligomer reveal that forces ranging between 130 and 230 pN are needed to disrupt the first base pair, values of an order of magnitude larger than those needed to disrupt base pairs in partially unzipped DNA. The force peak has an "echo," of approximately 50 pN, at the distance that unzips the second base pair. We show that the high peak needed to initiate unzipping derives from a free energy basin that is distinct from the basins of subsequent base pairs because of entropic contributions and we highlight the microscopic origin of the peak. Our results suggest a new window of exploration for single molecule experiments.Comment: 25 pages, 6 figures , Accepted for publication in Biophysical Journa
    corecore