254 research outputs found
Tracking repeats using significance and transitivty.
transitivity; extreme value distribution Motivation: Internal repeats in coding sequences correspond to structural and functional units of proteins. Moreover, duplication of fragments of coding sequences is known to be a mechanism to facilitate evolution. Identification of repeats is crucial to shed light on the function and structure of proteins, and explain their evolutionary past. The task is difficult because during the course of evolution many repeats diverged beyond recognition. Results: We introduce a new method TRUST, for ab-initio determination of internal repeats in proteins. It provides an improvement in prediction quality as compared to alternative state-of-the-art methods. The increased sensitivity and accuracy of the method is achieved by exploiting the concept of transitivity of alignments. Starting from significant local suboptimal alignments, the application of transitivity allows us to: 1) identify distant repeat homologues for which no alignments were found; 2) gain confidence about consistently well-aligned regions; and 3) recognize and reduce the contribution of nonhomologous repeats. This reassessment step enables us to derive a virtually noise-free profile representing a generalized repeat with high fidelity. We also obtained superior specificity by employing rigid statistical testing for self-sequence and profile-sequence alignments. Assessment was done using a database of repeat annotations based on structural superpositioning. The results show that TRUST is a useful and reliable tool for mining tandem and non-tandem repeats in protein sequence databases, able to predict multiple repeat types with varying intervening segments within a single sequence
Three-dimensional Ising model in the fixed-magnetization ensemble: a Monte Carlo study
We study the three-dimensional Ising model at the critical point in the
fixed-magnetization ensemble, by means of the recently developed geometric
cluster Monte Carlo algorithm. We define a magnetic-field-like quantity in
terms of microscopic spin-up and spin-down probabilities in a given
configuration of neighbors. In the thermodynamic limit, the relation between
this field and the magnetization reduces to the canonical relation M(h).
However, for finite systems, the relation is different. We establish a close
connection between this relation and the probability distribution of the
magnetization of a finite-size system in the canonical ensemble.Comment: 8 pages, 2 Postscript figures, uses RevTe
A Monte Carlo study of the triangular lattice gas with the first- and the second-neighbor exclusions
We formulate a Swendsen-Wang-like version of the geometric cluster algorithm.
As an application,we study the hard-core lattice gas on the triangular lattice
with the first- and the second-neighbor exclusions. The data are analyzed by
finite-size scaling, but the possible existence of logarithmic corrections is
not considered due to the limited data. We determine the critical chemical
potential as and the critical particle density as
. The thermal and magnetic exponents
and , estimated from Binder ratio and
susceptibility , strongly support the general belief that the model is in
the 4-state Potts universality class. On the other hand, the analyses of
energy-like quantities yield the thermal exponent ranging from
to . These values differ significantly from the expected value 3/2,
and thus imply the existence of logarithmic corrections.Comment: 4 figures 2 table
RNA structure prediction from evolutionary patterns of nucleotide composition
Structural elements in RNA molecules have a distinct nucleotide composition, which changes gradually over evolutionary time. We discovered certain features of these compositional patterns that are shared between all RNA families. Based on this information, we developed a structure prediction method that evaluates candidate structures for a set of homologous RNAs on their ability to reproduce the patterns exhibited by biological structures. The method is named SPuNC for āStructure Prediction using Nucleotide Compositionā. In a performance test on a diverse set of RNA families we demonstrate that the SPuNC algorithm succeeds in selecting the most realistic structures in an ensemble. The average accuracy of top-scoring structures is significantly higher than the average accuracy of all ensemble members (improvements of more than 20% observed). In addition, a consensus structure that includes the most reliable base pairs gleaned from a set of top-scoring structures is generally more accurate than a consensus derived from the full structural ensemble. Our method achieves better accuracy than existing methods on several RNA families, including novel riboswitches and ribozymes. The results clearly show that nucleotide composition can be used to reveal the quality of RNA structures and thus the presented technique should be added to the set of prediction tools
Graphical representations and cluster algorithms for critical points with fields
A two-replica graphical representation and associated cluster algorithm is
described that is applicable to ferromagnetic Ising systems with arbitrary
fields. Critical points are associated with the percolation threshold of the
graphical representation. Results from numerical simulations of the Ising model
in a staggered field are presented. The dynamic exponent for the algorithm is
measured to be less than 0.5.Comment: Revtex, 12 pages with 2 figure
Generalized Geometric Cluster Algorithm for Fluid Simulation
We present a detailed description of the generalized geometric cluster
algorithm for the efficient simulation of continuum fluids. The connection with
well-known cluster algorithms for lattice spin models is discussed, and an
explicit full cluster decomposition is derived for a particle configuration in
a fluid. We investigate a number of basic properties of the geometric cluster
algorithm, including the dependence of the cluster-size distribution on density
and temperature. Practical aspects of its implementation and possible
extensions are discussed. The capabilities and efficiency of our approach are
illustrated by means of two example studies.Comment: Accepted for publication in Phys. Rev. E. Follow-up to
cond-mat/041274
Numerical Solution of Hard-Core Mixtures
We study the equilibrium phase diagram of binary mixtures of hard spheres as
well as of parallel hard cubes. A superior cluster algorithm allows us to
establish and to access the demixed phase for both systems and to investigate
the subtle interplay between short-range depletion and long-range demixing.Comment: 4 pages, 2 figure
Aubergene - a sensitive genome alignment tool.
Motivation: The accumulation of genome sequences will only accelerate in the coming years. We aim to use this abundance of data to improve the quality of genomic alignments and devise a method which is capable of detecting regions evolving under weak or no evolutionary constraints. Results: We describe a genome alignment program AuberGene, which explores the idea of transitivity of local alignments. Assessment of the program was done based on a 2 Mbp genomic region containing the CFTR gene of 13 species. In this region, we can identify 53% of human sequence sharing common ancestry with mouse, as compared with 44% found using the usual pairwise alignment. Between human and tetraodon 93 orthologous exons are found, as compared with 77 detected by the pairwise human-tetraodon comparison. AuberGene allows the user to (1) identify distant, previously undetected, conserved orthogonal regions such as ORFs or regulatory regions; (2) identify neutrally evolving regions in related species which are often overlooked by other alignment programs; (3) recognize false orthologous genomic regions. The increased sensitivity of the method is not obtained at the cost of reduced specificity. Our results suggest that, over the CFTR region, human shares 10% more sequence with mouse than previously thought (ā¼50%, instead of 40% found with the pairwise alignment). Ā© 2006 Oxford University Press
Monte Carlo Renormalization of the 3-D Ising model: Analyticity and Convergence
We review the assumptions on which the Monte Carlo renormalization technique
is based, in particular the analyticity of the block spin transformations. On
this basis, we select an optimized Kadanoff blocking rule in combination with
the simulation of a d=3 Ising model with reduced corrections to scaling. This
is achieved by including interactions with second and third neighbors. As a
consequence of the improved analyticity properties, this Monte Carlo
renormalization method yields a fast convergence and a high accuracy. The
results for the critical exponents are y_H=2.481(1) and y_T=1.585(3).Comment: RevTeX, 4 PostScript file
- ā¦