23 research outputs found
Improving Internal Peptide Dynamics in the Coarse-Grained MARTINI Model: Toward Large-Scale Simulations of Amyloid- and Elastin-like Peptides
We present an extension of the coarse-grained MARTINI
model for
proteins and apply this extension to amyloid- and elastin-like peptides.
Atomistic simulations of tetrapeptides, octapeptides, and longer peptides
in solution are used as a reference to parametrize a set of pseudodihedral
potentials that describe the internal flexibility of MARTINI peptides.
We assess the performance of the resulting model in reproducing various
structural properties computed from atomistic trajectories of peptides
in water. The addition of new dihedral angle potentials improves agreement
with the contact maps computed from atomistic simulations significantly.
We also address the question of which parameters derived from atomistic
trajectories are transferable between different lengths of peptides.
The modified coarse-grained model shows reasonable transferability
of parameters for the amyloid- and elastin-like peptides. In addition,
the improved coarse-grained model is also applied to investigate the
self-assembly of ÎČ-sheet forming peptides on the microsecond
time scale. The octapeptides SNNFGAIL and (GV)4 are used
to examine peptide aggregation in different environments, in water,
and at the waterâoctane interface. At the interface, peptide
adsorption occurs rapidly, and peptides spontaneously aggregate in
favor of stretched conformers resembling ÎČ-strands
Inference of Co-Evolving Site Pairs: an Excellent Predictor of Contact Residue Pairs in Protein 3D structures
Residue-residue interactions that fold a protein into a unique
three-dimensional structure and make it play a specific function impose
structural and functional constraints on each residue site. Selective
constraints on residue sites are recorded in amino acid orders in homologous
sequences and also in the evolutionary trace of amino acid substitutions. A
challenge is to extract direct dependences between residue sites by removing
indirect dependences through other residues within a protein or even through
other molecules. Recent attempts of disentangling direct from indirect
dependences of amino acid types between residue positions in multiple sequence
alignments have revealed that the strength of inferred residue pair couplings
is an excellent predictor of residue-residue proximity in folded structures.
Here, we report an alternative attempt of inferring co-evolving site pairs from
concurrent and compensatory substitutions between sites in each branch of a
phylogenetic tree. First, branch lengths of a phylogenetic tree inferred by the
neighbor-joining method are optimized as well as other parameters by maximizing
a likelihood of the tree in a mechanistic codon substitution model. Mean
changes of quantities, which are characteristic of concurrent and compensatory
substitutions, accompanied by substitutions at each site in each branch of the
tree are estimated with the likelihood of each substitution. Partial
correlation coefficients of the characteristic changes along branches between
sites are calculated and used to rank co-evolving site pairs. Accuracy of
contact prediction based on the present co-evolution score is comparable to
that achieved by a maximum entropy model of protein sequences for 15 protein
families taken from the Pfam release 26.0. Besides, this excellent accuracy
indicates that compensatory substitutions are significant in protein evolution.Comment: 17 pages, 4 figures, and 4 tables with supplementary information of 5
figure
Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints
BACKGROUND: Modern modelling techniques may potentially provide more accurate predictions of binary outcomes than classical techniques. We aimed to study the predictive performance of different modelling techniques in relation to the effective sample size (âdata hungrinessâ). METHODS: We performed simulation studies based on three clinical cohorts: 1282 patients with head and neck cancer (with 46.9% 5Â year survival), 1731 patients with traumatic brain injury (22.3% 6Â month mortality) and 3181 patients with minor head injury (7.6% with CT scan abnormalities). We compared three relatively modern modelling techniques: support vector machines (SVM), neural nets (NN), and random forests (RF) and two classical techniques: logistic regression (LR) and classification and regression trees (CART). We created three large artificial databases with 20 fold, 10 fold and 6 fold replication of subjects, where we generated dichotomous outcomes according to different underlying models. We applied each modelling technique to increasingly larger development parts (100 repetitions). The area under the ROC-curve (AUC) indicated the performance of each model in the development part and in an independent validation part. Data hungriness was defined by plateauing of AUC and small optimism (difference between the mean apparent AUC and the mean validated AUC <0.01). RESULTS: We found that a stable AUC was reached by LR at approximately 20 to 50 events per variable, followed by CART, SVM, NN and RF models. Optimism decreased with increasing sample sizes and the same ranking of techniques. The RF, SVM and NN models showed instability and a high optimism even with >200 events per variable. CONCLUSIONS: Modern modelling techniques such as SVM, NN and RF may need over 10 times as many events per variable to achieve a stable AUC and a small optimism than classical modelling techniques such as LR. This implies that such modern techniques should only be used in medical prediction problems if very large data sets are available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2288-14-137) contains supplementary material, which is available to authorized users
Retour sur les 13Ăšmes Rencontres dâArchĂ©obotaniques 2018 « la Carpologie et lâInterdisciplinaritĂ© : approches intĂ©grĂ©es », compte rendu du dĂ©bat
International audienc
Statistical modelling
In this chapter, we present statistical modelling approaches for predictive tasks in business and science. Most prominent is the ubiquitous multiple linear regression approach where coefficients are estimated using the ordinary least squares algorithm. There are many derivations and generalizations of that technique. In the form of logistic regression, it has been adapted to cope with binary classification problems. Various statistical survival models allow for modelling of time-to-event data. We will detail the many benefits and a few pitfalls of these techniques based on real-world examples. A primary focus will be on pointing out the added value that these statistical modelling tools yield over more black box-type machine-learning algorithms. In our opinion, the added value predominantly stems from the often much easier interpretation of the model, the availability of tools that pin down the influence of the predictor variables in concise form, and finally from the options they provide for variable selection and residual analysis, allowing for user-friendly model development, refinement, and improvement