50 research outputs found
Slow nucleic acid unzipping kinetics from sequence-defined barriers
Recent experiments on unzipping of RNA helix-loop structures by force have
shown that about 40-base molecules can undergo kinetic transitions between two
well-defined `open' and `closed' states, on a timescale = 1 sec [Liphardt et
al., Science 297, 733-737 (2001)]. Using a simple dynamical model, we show that
these phenomena result from the slow kinetics of crossing large free energy
barriers which separate the open and closed conformations. The dependence of
barriers on sequence along the helix, and on the size of the loop(s) is
analyzed. Some DNAs and RNAs sequences that could show dynamics on different
time scales, or three(or more)-state unzipping, are proposed.Comment: 8 pages Revtex, including 4 figure
Adaptive Cluster Expansion for Inferring Boltzmann Machines with Noisy Data
We introduce a procedure to infer the interactions among a set of binary
variables, based on their sampled frequencies and pairwise correlations. The
algorithm builds the clusters of variables contributing most to the entropy of
the inferred Ising model, and rejects the small contributions due to the
sampling noise. Our procedure successfully recovers benchmark Ising models even
at criticality and in the low temperature phase, and is applied to
neurobiological data.Comment: Accepted for publication in Physical Review Letters (2011
Large Pseudo-Counts and -Norm Penalties Are Necessary for the Mean-Field Inference of Ising and Potts Models
Mean field (MF) approximation offers a simple, fast way to infer direct
interactions between elements in a network of correlated variables, a common,
computationally challenging problem with practical applications in fields
ranging from physics and biology to the social sciences. However, MF methods
achieve their best performance with strong regularization, well beyond Bayesian
expectations, an empirical fact that is poorly understood. In this work, we
study the influence of pseudo-count and -norm regularization schemes on
the quality of inferred Ising or Potts interaction networks from correlation
data within the MF approximation. We argue, based on the analysis of small
systems, that the optimal value of the regularization strength remains finite
even if the sampling noise tends to zero, in order to correct for systematic
biases introduced by the MF approximation. Our claim is corroborated by
extensive numerical studies of diverse model systems and by the analytical
study of the -component spin model, for large but finite . Additionally
we find that pseudo-count regularization is robust against sampling noise, and
often outperforms -norm regularization, particularly when the underlying
network of interactions is strongly heterogeneous. Much better performances are
generally obtained for the Ising model than for the Potts model, for which only
couplings incoming onto medium-frequency symbols are reliably inferred.Comment: 25 pages, 17 figure
Optimal regularizations for data generation with probabilistic graphical models
Understanding the role of regularization is a central question in Statistical
Inference. Empirically, well-chosen regularization schemes often dramatically
improve the quality of the inferred models by avoiding overfitting of the
training data. We consider here the particular case of L 2 and L 1
regularizations in the Maximum A Posteriori (MAP) inference of generative
pairwise graphical models. Based on analytical calculations on Gaussian
multivariate distributions and numerical experiments on Gaussian and Potts
models we study the likelihoods of the training, test, and 'generated data'
(with the inferred models) sets as functions of the regularization strengths.
We show in particular that, at its maximum, the test likelihood and the
'generated' likelihood, which quantifies the quality of the generated samples,
have remarkably close values. The optimal value for the regularization strength
is found to be approximately equal to the inverse sum of the squared couplings
incoming on sites on the underlying network of interactions. Our results seem
largely independent of the structure of the true underlying interactions that
generated the data, of the regularization scheme considered, and are valid when
small fluctuations of the posterior distribution around the MAP estimator are
taken into account. Connections with empirical works on protein models learned
from homologous sequences are discussed
Exponentially hard problems are sometimes polynomial, a large deviation analysis of search algorithms for the random Satisfiability problem, and its application to stop-and-restart resolutions
A large deviation analysis of the solving complexity of random
3-Satisfiability instances slightly below threshold is presented. While finding
a solution for such instances demands an exponential effort with high
probability, we show that an exponentially small fraction of resolutions
require a computation scaling linearly in the size of the instance only. This
exponentially small probability of easy resolutions is analytically calculated,
and the corresponding exponent shown to be smaller (in absolute value) than the
growth exponent of the typical resolution time. Our study therefore gives some
theoretical basis to heuristic stop-and-restart solving procedures, and
suggests a natural cut-off (the size of the instance) for the restart.Comment: Revtex file, 4 figure
Solving satisfiability problems by fluctuations: The dynamics of stochastic local search algorithms
Stochastic local search algorithms are frequently used to numerically solve
hard combinatorial optimization or decision problems. We give numerical and
approximate analytical descriptions of the dynamics of such algorithms applied
to random satisfiability problems. We find two different dynamical regimes,
depending on the number of constraints per variable: For low constraintness,
the problems are solved efficiently, i.e. in linear time. For higher
constraintness, the solution times become exponential. We observe that the
dynamical behavior is characterized by a fast equilibration and fluctuations
around this equilibrium. If the algorithm runs long enough, an exponentially
rare fluctuation towards a solution appears.Comment: 21 pages, 18 figures, revised version, to app. in PRE (2003
The dynamics of proving uncolourability of large random graphs I. Symmetric Colouring Heuristic
We study the dynamics of a backtracking procedure capable of proving
uncolourability of graphs, and calculate its average running time T for sparse
random graphs, as a function of the average degree c and the number of vertices
N. The analysis is carried out by mapping the history of the search process
onto an out-of-equilibrium (multi-dimensional) surface growth problem. The
growth exponent of the average running time is quantitatively predicted, in
agreement with simulations.Comment: 5 figure
Multifractal analysis of perceptron learning with errors
Random input patterns induce a partition of the coupling space of a
perceptron into cells labeled by their output sequences. Learning some data
with a maximal error rate leads to clusters of neighboring cells. By analyzing
the internal structure of these clusters with the formalism of multifractals,
we can handle different storage and generalization tasks for lazy students and
absent-minded teachers within one unified approach. The results also allow some
conclusions on the spatial distribution of cells.Comment: 11 pages, RevTex, 3 eps figures, version to be published in Phys.
Rev. E 01Jan9
Relaxation and Metastability in the RandomWalkSAT search procedure
An analysis of the average properties of a local search resolution procedure
for the satisfaction of random Boolean constraints is presented. Depending on
the ratio alpha of constraints per variable, resolution takes a time T_res
growing linearly (T_res \sim tau(alpha) N, alpha < alpha_d) or exponentially
(T_res \sim exp(N zeta(alpha)), alpha > alpha_d) with the size N of the
instance. The relaxation time tau(alpha) in the linear phase is calculated
through a systematic expansion scheme based on a quantum formulation of the
evolution operator. For alpha > alpha_d, the system is trapped in some
metastable state, and resolution occurs from escape from this state through
crossing of a large barrier. An annealed calculation of the height zeta(alpha)
of this barrier is proposed. The polynomial/exponentiel cross-over alpha_d is
not related to the onset of clustering among solutions.Comment: 23 pages, 11 figures. A mistake in sec. IV.B has been correcte