325 research outputs found
Numerical evidence for relevance of disorder in a Poland-Scheraga DNA denaturation model with self-avoidance: Scaling behavior of average quantities
We study numerically the effect of sequence heterogeneity on the
thermodynamic properties of a Poland-Scheraga model for DNA denaturation taking
into account self-avoidance, i.e. with exponent c_p=2.15 for the loop length
probability distribution. In complement to previous on-lattice Monte Carlo like
studies, we consider here off-lattice numerical calculations for large sequence
lengths, relying on efficient algorithmic methods. We investigate finite size
effects with the definition of an appropriate intrinsic length scale x,
depending on the parameters of the model. Based on the occurrence of large
enough rare regions, for a given sequence length N, this study provides a
qualitative picture for the finite size behavior, suggesting that the effect of
disorder could be sensed only with sequence lengths diverging exponentially
with x. We further look in detail at average quantities for the particular case
x=1.3, ensuring through this parameter choice the correspondence between the
off-lattice and the on-lattice studies. Taken together, the various results can
be cast in a coherent picture with a crossover between a nearly pure system
like behavior for small sizes N < 1000, as observed in the on-lattice
simulations, and the apparent asymptotic behavior indicative of disorder
relevance, with an (average) correlation length exponent \nu_r >= 2/d (=2).Comment: Latex, 33 pages with 15 postscript figure
Numerical study of the disordered Poland-Scheraga model of DNA denaturation
We numerically study the binary disordered Poland-Scheraga model of DNA
denaturation, in the regime where the pure model displays a first order
transition (loop exponent ). We use a Fixman-Freire scheme for the
entropy of loops and consider chain length up to , with
averages over samples. We present in parallel the results of various
observables for two boundary conditions, namely bound-bound (bb) and
bound-unbound (bu), because they present very different finite-size behaviors,
both in the pure case and in the disordered case. Our main conclusion is that
the transition remains first order in the disordered case: in the (bu) case,
the disorder averaged energy and contact densities present crossings for
different values of without rescaling. In addition, we obtain that these
disorder averaged observables do not satisfy finite size scaling, as a
consequence of strong sample to sample fluctuations of the pseudo-critical
temperature. For a given sample, we propose a procedure to identify its
pseudo-critical temperature, and show that this sample then obeys first order
transition finite size scaling behavior. Finally, we obtain that the disorder
averaged critical loop distribution is still governed by in
the regime , as in the pure case.Comment: 12 pages, 13 figures. Revised versio
Prayer: its psychological and philosophical aspects.
Thesis (M.A.)--Boston Universit
Evolution of proteomes: fundamental signatures and global trends in amino acid compositions
BACKGROUND: The evolutionary characterization of species and lifestyles at global levels is nowadays a subject of considerable interest, particularly with the availability of many complete genomes. Are there specific properties associated with lifestyles and phylogenies? What are the underlying evolutionary trends? One of the simplest analyses to address such questions concerns characterization of proteomes at the amino acids composition level. RESULTS: In this work, amino acid compositions of a large set of 208 proteomes, with significant number of representatives from the three phylogenetic domains and different lifestyles are analyzed, resorting to an appropriate multidimensional method: Correspondence analysis. The analysis reveals striking discrimination between eukaryotes, prokaryotic mesophiles and hyperthemophiles-themophiles, following amino acid usage. In sharp contrast, no similar discrimination is observed for psychrophiles. The observed distributional properties are compared with various inferred chronologies for the recruitment of amino acids into the genetic code. Such comparisons reveal correlations between the observed segregations of species following amino acid usage, and the separation of amino acids following early or late recruitment. CONCLUSION: A simple description of proteomes according to amino acid compositions reveals striking signatures, with sharp segregations or on the contrary non-discriminations following phylogenies and lifestyles. The distribution of species, following amino acid usage, exhibits a discrimination between [high GC]-[high optimal growth temperatures] and [low GC]-[moderate temperatures] characteristics. This discrimination appears to coincide closely with the separation of amino acids following their inferred early or late recruitment into the genetic code. Taken together the various results provide a consistent picture for the evolution of proteomes, in terms of amino acid usage
Probabilistic sequence alignments: realistic models with efficient algorithms
Alignment algorithms usually rely on simplified models of gaps for
computational efficiency. Based on an isomorphism between alignments and
physical helix-coil models, we show in statistical mechanics that alignments
with realistic laws for gaps can be computed with fast algorithms. Improved
performances of probabilistic alignments with realistic models of gaps are
illustrated. Probabilistic and optimization formulations are compared, with
potential implications in many fields and perspectives for computationally
efficient extensions to Markov models with realistic long-range interactions
A stitch in time: Efficient computation of genomic DNA melting bubbles
Background: It is of biological interest to make genome-wide predictions of
the locations of DNA melting bubbles using statistical mechanics models.
Computationally, this poses the challenge that a generic search through all
combinations of bubble starts and ends is quadratic.
Results: An efficient algorithm is described, which shows that the time
complexity of the task is O(NlogN) rather than quadratic. The algorithm
exploits that bubble lengths may be limited, but without a prior assumption of
a maximal bubble length. No approximations, such as windowing, have been
introduced to reduce the time complexity. More than just finding the bubbles,
the algorithm produces a stitch profile, which is a probabilistic graphical
model of bubbles and helical regions. The algorithm applies a probability peak
finding method based on a hierarchical analysis of the energy barriers in the
Poland-Scheraga model.
Conclusions: Exact and fast computation of genomic stitch profiles is thus
feasible. Sequences of several megabases have been computed, only limited by
computer memory. Possible applications are the genome-wide comparisons of
bubbles with promotors, TSS, viral integration sites, and other melting-related
regions.Comment: 16 pages, 10 figure
The Mystery of Two Straight Lines in Bacterial Genome Statistics. Release 2007
In special coordinates (codon position--specific nucleotide frequencies)
bacterial genomes form two straight lines in 9-dimensional space: one line for
eubacterial genomes, another for archaeal genomes. All the 348 distinct
bacterial genomes available in Genbank in April 2007, belong to these lines
with high accuracy. The main challenge now is to explain the observed high
accuracy. The new phenomenon of complementary symmetry for codon
position--specific nucleotide frequencies is observed. The results of analysis
of several codon usage models are presented. We demonstrate that the
mean--field approximation, which is also known as context--free, or complete
independence model, or Segre variety, can serve as a reasonable approximation
to the real codon usage. The first two principal components of codon usage
correlate strongly with genomic G+C content and the optimal growth temperature
respectively. The variation of codon usage along the third component is related
to the curvature of the mean-field approximation. First three eigenvalues in
codon usage PCA explain 59.1%, 7.8% and 4.7% of variation. The eubacterial and
archaeal genomes codon usage is clearly distributed along two third order
curves with genomic G+C content as a parameter.Comment: Significantly extended version with new data for all the 348 distinct
bacterial genomes available in Genbank in April 200
- …