14 research outputs found

    RPBS: a web resource for structural bioinformatics

    Get PDF
    RPBS (Ressource Parisienne en Bioinformatique Structurale) is a resource dedicated primarily to structural bioinformatics. It is the result of a joint effort by several teams to set up an interface that offers original and powerful methods in the field. As an illustration, we focus here on three such methods uniquely available at RPBS: AUTOMAT for sequence databank scanning, YAKUSA for structure databank scanning and WLOOP for homology loop modelling. The RPBS server can be accessed at and the specific services at

    Inference of Co-Evolving Site Pairs: an Excellent Predictor of Contact Residue Pairs in Protein 3D structures

    Get PDF
    Residue-residue interactions that fold a protein into a unique three-dimensional structure and make it play a specific function impose structural and functional constraints on each residue site. Selective constraints on residue sites are recorded in amino acid orders in homologous sequences and also in the evolutionary trace of amino acid substitutions. A challenge is to extract direct dependences between residue sites by removing indirect dependences through other residues within a protein or even through other molecules. Recent attempts of disentangling direct from indirect dependences of amino acid types between residue positions in multiple sequence alignments have revealed that the strength of inferred residue pair couplings is an excellent predictor of residue-residue proximity in folded structures. Here, we report an alternative attempt of inferring co-evolving site pairs from concurrent and compensatory substitutions between sites in each branch of a phylogenetic tree. First, branch lengths of a phylogenetic tree inferred by the neighbor-joining method are optimized as well as other parameters by maximizing a likelihood of the tree in a mechanistic codon substitution model. Mean changes of quantities, which are characteristic of concurrent and compensatory substitutions, accompanied by substitutions at each site in each branch of the tree are estimated with the likelihood of each substitution. Partial correlation coefficients of the characteristic changes along branches between sites are calculated and used to rank co-evolving site pairs. Accuracy of contact prediction based on the present co-evolution score is comparable to that achieved by a maximum entropy model of protein sequences for 15 protein families taken from the Pfam release 26.0. Besides, this excellent accuracy indicates that compensatory substitutions are significant in protein evolution.Comment: 17 pages, 4 figures, and 4 tables with supplementary information of 5 figure

    Improving Internal Peptide Dynamics in the Coarse-Grained MARTINI Model: Toward Large-Scale Simulations of Amyloid- and Elastin-like Peptides

    Get PDF
    We present an extension of the coarse-grained MARTINI model for proteins and apply this extension to amyloid- and elastin-like peptides. Atomistic simulations of tetrapeptides, octapeptides, and longer peptides in solution are used as a reference to parametrize a set of pseudodihedral potentials that describe the internal flexibility of MARTINI peptides. We assess the performance of the resulting model in reproducing various structural properties computed from atomistic trajectories of peptides in water. The addition of new dihedral angle potentials improves agreement with the contact maps computed from atomistic simulations significantly. We also address the question of which parameters derived from atomistic trajectories are transferable between different lengths of peptides. The modified coarse-grained model shows reasonable transferability of parameters for the amyloid- and elastin-like peptides. In addition, the improved coarse-grained model is also applied to investigate the self-assembly of β-sheet forming peptides on the microsecond time scale. The octapeptides SNNFGAIL and (GV)4 are used to examine peptide aggregation in different environments, in water, and at the water–octane interface. At the interface, peptide adsorption occurs rapidly, and peptides spontaneously aggregate in favor of stretched conformers resembling β-strands

    Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints

    Full text link
    BACKGROUND: Modern modelling techniques may potentially provide more accurate predictions of binary outcomes than classical techniques. We aimed to study the predictive performance of different modelling techniques in relation to the effective sample size (“data hungriness”). METHODS: We performed simulation studies based on three clinical cohorts: 1282 patients with head and neck cancer (with 46.9% 5 year survival), 1731 patients with traumatic brain injury (22.3% 6 month mortality) and 3181 patients with minor head injury (7.6% with CT scan abnormalities). We compared three relatively modern modelling techniques: support vector machines (SVM), neural nets (NN), and random forests (RF) and two classical techniques: logistic regression (LR) and classification and regression trees (CART). We created three large artificial databases with 20 fold, 10 fold and 6 fold replication of subjects, where we generated dichotomous outcomes according to different underlying models. We applied each modelling technique to increasingly larger development parts (100 repetitions). The area under the ROC-curve (AUC) indicated the performance of each model in the development part and in an independent validation part. Data hungriness was defined by plateauing of AUC and small optimism (difference between the mean apparent AUC and the mean validated AUC <0.01). RESULTS: We found that a stable AUC was reached by LR at approximately 20 to 50 events per variable, followed by CART, SVM, NN and RF models. Optimism decreased with increasing sample sizes and the same ranking of techniques. The RF, SVM and NN models showed instability and a high optimism even with >200 events per variable. CONCLUSIONS: Modern modelling techniques such as SVM, NN and RF may need over 10 times as many events per variable to achieve a stable AUC and a small optimism than classical modelling techniques such as LR. This implies that such modern techniques should only be used in medical prediction problems if very large data sets are available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2288-14-137) contains supplementary material, which is available to authorized users
    corecore