76,753 research outputs found
Inferring processes underlying B-cell repertoire diversity
We quantify the VDJ recombination and somatic hypermutation processes in
human B-cells using probabilistic inference methods on high-throughput DNA
sequence repertoires of human B-cell receptor heavy chains. Our analysis
captures the statistical properties of the naive repertoire, first after its
initial generation via VDJ recombination and then after selection for
functionality. We also infer statistical properties of the somatic
hypermutation machinery (exclusive of subsequent effects of selection). Our
main results are the following: the B-cell repertoire is substantially more
diverse than T-cell repertoires, due to longer junctional insertions; sequences
that pass initial selection are distinguished by having a higher probability of
being generated in a VDJ recombination event; somatic hypermutations have a
non-uniform distribution along the V gene that is well explained by an
independent site model for the sequence context around the hypermutation site.Comment: acknowledgement adde
Three-helix-bundle Protein in a Ramachandran Model
We study the thermodynamic behavior of a model protein with 54 amino acids
that forms a three-helix bundle in its native state. The model contains three
types of amino acids and five to six atoms per amino acid and has the
Ramachandran torsional angles , as its degrees of freedom. The
force field is based on hydrogen bonds and effective hydrophobicity forces. For
a suitable choice of the relative strength of these interactions, we find that
the three-helix-bundle protein undergoes an abrupt folding transition from an
expanded state to the native state. Also shown is that the corresponding one-
and two-helix segments are less stable than the three-helix sequence.Comment: 15 pages, 7 figure
Cold and Warm Denaturation of Proteins
We introduce a simplified protein model where the water degrees of freedom
appear explicitly (although in an extremely simplified fashion). Using this
model we are able to recover both the warm and the cold protein denaturation
within a single framework, while addressing important issues about the
structure of model proteins
The Genetic Code as a Periodic Table: Algebraic Aspects
The systematics of indices of physico-chemical properties of codons and amino
acids across the genetic code are examined. Using a simple numerical labelling
scheme for nucleic acid bases, data can be fitted as low-order polynomials of
the 6 coordinates in the 64-dimensional codon weight space. The work confirms
and extends recent studies by Siemion of amino acid conformational parameters.
The connections between the present work, and recent studies of the genetic
code structure using dynamical symmetry algebras, are pointed out.Comment: 26 pages Latex, 10 figures (4 ps, 6 Tex). Refereed version, small
changes to discussion (conclusion unaltered). Minor alterations to format of
figures and tables. To appear in BioSystem
Selection of sequence motifs and generative Hopfield-Potts models for protein familiesilies
Statistical models for families of evolutionary related proteins have
recently gained interest: in particular pairwise Potts models, as those
inferred by the Direct-Coupling Analysis, have been able to extract information
about the three-dimensional structure of folded proteins, and about the effect
of amino-acid substitutions in proteins. These models are typically requested
to reproduce the one- and two-point statistics of the amino-acid usage in a
protein family, {\em i.e.}~to capture the so-called residue conservation and
covariation statistics of proteins of common evolutionary origin. Pairwise
Potts models are the maximum-entropy models achieving this. While being
successful, these models depend on huge numbers of {\em ad hoc} introduced
parameters, which have to be estimated from finite amount of data and whose
biophysical interpretation remains unclear. Here we propose an approach to
parameter reduction, which is based on selecting collective sequence motifs. It
naturally leads to the formulation of statistical sequence models in terms of
Hopfield-Potts models. These models can be accurately inferred using a mapping
to restricted Boltzmann machines and persistent contrastive divergence. We show
that, when applied to protein data, even 20-40 patterns are sufficient to
obtain statistically close-to-generative models. The Hopfield patterns form
interpretable sequence motifs and may be used to clusterize amino-acid
sequences into functional sub-families. However, the distributed collective
nature of these motifs intrinsically limits the ability of Hopfield-Potts
models in predicting contact maps, showing the necessity of developing models
going beyond the Hopfield-Potts models discussed here.Comment: 26 pages, 16 figures, to app. in PR
Thermodynamics of alpha- and beta-structure formation in proteins
An atomic protein model with a minimalistic potential is developed and then
tested on an alpha-helix and a beta-hairpin, using exactly the same parameters
for both peptides. We find that melting curves for these sequences to a good
approximation can be described by a simple two-state model, with parameters
that are in reasonable quantitative agreement with experimental data. Despite
the apparent two-state character of the melting curves, the energy
distributions are found to lack a clear bimodal shape, which is discussed in
some detail. We also perform a Monte Carlo-based kinetic study and find, in
accord with experimental data, that the alpha-helix forms faster than the
beta-hairpin.Comment: 18 pages, 4 figure
- âŠ