45 research outputs found

    EvDTree: structure-dependent substitution profiles based on decision tree classification of 3D environments

    Get PDF
    BACKGROUND: Structure-dependent substitution matrices increase the accuracy of sequence alignments when the 3D structure of one sequence is known, and are successful e.g. in fold recognition. We propose a new automated method, EvDTree, based on a decision tree algorithm, for automatic derivation of amino acid substitution probabilities from a set of sequence-structure alignments. The main advantage over other approaches is an unbiased automatic selection of the most informative structural descriptors and associated values or thresholds. This feature allows automatic derivation of structure-dependent substitution scores for any specific set of structures, without the need to empirically determine best descriptors and parameters. RESULTS: Decision trees for residue substitutions were constructed for each residue type from sequence-structure alignments extracted from the HOMSTRAD database. For each tree cluster, environment-dependent substitution profiles were derived. The resulting structure-dependent substitution scores were assessed using a criterion based on the mean ranking of observed substitution among all possible substitutions and in sequence-structure alignments. The automatically built EvDTree substitution scores provide significantly better results than conventional matrices and similar or slightly better results than other structure-dependent matrices. EvDTree has been applied to small disulfide-rich proteins as a test case to automatically derive specific substitutions scores providing better results than non-specific substitution scores. Analyses of the decision tree classifications provide useful information on the relative importance of different structural descriptors. CONCLUSIONS: We propose a fully automatic method for the classification of structural environments and inference of structure-dependent substitution profiles. We show that this approach is more accurate than existing methods for various applications. The easy adaptation of EvDTree to any specific data set opens the way for class-specific structure-dependent substitution scores which can be used in threading-based remote homology searches

    Detection and Architecture of Small Heat Shock Protein Monomers

    Get PDF
    International audienceBACKGROUND: Small Heat Shock Proteins (sHSPs) are chaperone-like proteins involved in the prevention of the irreversible aggregation of misfolded proteins. Although many studies have already been conducted on sHSPs, the molecular mechanisms and structural properties of these proteins remain unclear. Here, we propose a better understanding of the architecture, organization and properties of the sHSP family through structural and functional annotations. We focused on the Alpha Crystallin Domain (ACD), a sandwich fold that is the hallmark of the sHSP family. METHODOLOGY/PRINCIPAL FINDINGS: We developed a new approach for detecting sHSPs and delineating ACDs based on an iterative Hidden Markov Model algorithm using a multiple alignment profile generated from structural data on ACD. Using this procedure on the UniProt databank, we found 4478 sequences identified as sHSPs, showing a very good coverage with the corresponding PROSITE and Pfam profiles. ACD was then delimited and structurally annotated. We showed that taxonomic-based groups of sHSPs (animals, plants, bacteria) have unique features regarding the length of their ACD and, more specifically, the length of a large loop within ACD. We detailed highly conserved residues and patterns specific to the whole family or to some groups of sHSPs. For 96% of studied sHSPs, we identified in the C-terminal region a conserved I/V/L-X-I/V/L motif that acts as an anchor in the oligomerization process. The fragment defined from the end of ACD to the end of this motif has a mean length of 14 residues and was named the C-terminal Anchoring Module (CAM). CONCLUSIONS/SIGNIFICANCE: This work annotates structural components of ACD and quantifies properties of several thousand sHSPs. It gives a more accurate overview of the architecture of sHSP monomers

    Combiner connaissances expertes, hors-ligne, transientes et en ligne pour l'exploration Monte-Carlo

    Get PDF
    National audienceNous combinons pour de l'exploration Monte-Carlo d'arbres de l'apprentissage arti- RÉSUMÉ. ïŹciel Ă  4 Ă©chelles de temps : – regret en ligne, via l'utilisation d'algorithmes de bandit et d'estimateurs Monte-Carlo ; – de l'apprentissage transient, via l'utilisation d'estimateur rapide de Q-fonction (RAVE, pour Rapid Action Value Estimate) qui sont appris en ligne et utilisĂ©s pour accĂ©lĂ©rer l'explora- tion mais sont ensuite peu Ă  peu laissĂ©s de cĂŽtĂ© Ă  mesure que des informations plus ïŹnes sont disponibles ; – apprentissage hors-ligne, par fouille de donnĂ©es de jeux ; – utilisation de connaissances expertes comme information a priori. L'algorithme obtenu est plus fort que chaque Ă©lĂ©ment sĂ©parĂ©ment. Nous mettons en Ă©vidence par ailleurs un dilemne exploration-exploitation dans l'exploration Monte-Carlo d'arbres et obtenons une trĂšs forte amĂ©lioration par calage des paramĂštres correspondant. We combine for Monte-Carlo exploration machine learning at four different time ABSTRACT. scales: – online regret, through the use of bandit algorithms and Monte-Carlo estimates; – transient learning, through the use of rapid action value estimates (RAVE) which are learnt online and used for accelerating the exploration and are thereafter neglected; – ofïŹ‚ine learning, by data mining of datasets of games; – use of expert knowledge coming from the old ages as prior information

    Combiner connaissances expertes, hors-ligne, transientes et en ligne pour l'exploration Monte-Carlo

    Get PDF
    National audienceNous combinons pour de l'exploration Monte-Carlo d'arbres de l'apprentissage arti- RÉSUMÉ. ïŹciel Ă  4 Ă©chelles de temps : – regret en ligne, via l'utilisation d'algorithmes de bandit et d'estimateurs Monte-Carlo ; – de l'apprentissage transient, via l'utilisation d'estimateur rapide de Q-fonction (RAVE, pour Rapid Action Value Estimate) qui sont appris en ligne et utilisĂ©s pour accĂ©lĂ©rer l'explora- tion mais sont ensuite peu Ă  peu laissĂ©s de cĂŽtĂ© Ă  mesure que des informations plus ïŹnes sont disponibles ; – apprentissage hors-ligne, par fouille de donnĂ©es de jeux ; – utilisation de connaissances expertes comme information a priori. L'algorithme obtenu est plus fort que chaque Ă©lĂ©ment sĂ©parĂ©ment. Nous mettons en Ă©vidence par ailleurs un dilemne exploration-exploitation dans l'exploration Monte-Carlo d'arbres et obtenons une trĂšs forte amĂ©lioration par calage des paramĂštres correspondant. We combine for Monte-Carlo exploration machine learning at four different time ABSTRACT. scales: – online regret, through the use of bandit algorithms and Monte-Carlo estimates; – transient learning, through the use of rapid action value estimates (RAVE) which are learnt online and used for accelerating the exploration and are thereafter neglected; – ofïŹ‚ine learning, by data mining of datasets of games; – use of expert knowledge coming from the old ages as prior information

    A short survey on protein blocks.

    Get PDF
    International audienceProtein structures are classically described in terms of secondary structures. Even if the regular secondary structures have relevant physical meaning, their recognition from atomic coordinates has some important limitations such as uncertainties in the assignment of boundaries of helical and ÎČ-strand regions. Further, on an average about 50% of all residues are assigned to an irregular state, i.e., the coil. Thus different research teams have focused on abstracting conformation of protein backbone in the localized short stretches. Using different geometric measures, local stretches in protein structures are clustered in a chosen number of states. A prototype representative of the local structures in each cluster is generally defined. These libraries of local structures prototypes are named as "structural alphabets". We have developed a structural alphabet, named Protein Blocks, not only to approximate the protein structure, but also to predict them from sequence. Since its development, we and other teams have explored numerous new research fields using this structural alphabet. We review here some of the most interesting applications

    Assignment of PolyProline II Conformation and Analysis of Sequence – Structure Relationship

    Get PDF
    International audienceBACKGROUND: Secondary structures are elements of great importance in structural biology, biochemistry and bioinformatics. They are broadly composed of two repetitive structures namely α-helices and ÎČ-sheets, apart from turns, and the rest is associated to coil. These repetitive secondary structures have specific and conserved biophysical and geometric properties. PolyProline II (PPII) helix is yet another interesting repetitive structure which is less frequent and not usually associated with stabilizing interactions. Recent studies have shown that PPII frequency is higher than expected, and they could have an important role in protein - protein interactions. METHODOLOGY/PRINCIPAL FINDINGS: A major factor that limits the study of PPII is that its assignment cannot be carried out with the most commonly used secondary structure assignment methods (SSAMs). The purpose of this work is to propose a PPII assignment methodology that can be defined in the frame of DSSP secondary structure assignment. Considering the ambiguity in PPII assignments by different methods, a consensus assignment strategy was utilized. To define the most consensual rule of PPII assignment, three SSAMs that can assign PPII, were compared and analyzed. The assignment rule was defined to have a maximum coverage of all assignments made by these SSAMs. Not many constraints were added to the assignment and only PPII helices of at least 2 residues length are defined. CONCLUSIONS/SIGNIFICANCE: The simple rules designed in this study for characterizing PPII conformation, lead to the assignment of 5% of all amino as PPII. Sequence - structure relationships associated with PPII, defined by the different SSAMs, underline few striking differences. A specific study of amino acid preferences in their N and C-cap regions was carried out as their solvent accessibility and contact patterns. Thus the assignment of PPII can be coupled with DSSP and thus opens a simple way for further analysis in this field

    EvDTree : structure-dependent substitution matrices based on decision tree classification of 3D environments

    No full text
    Introduction Substitution matrices are commonly used in sequence alignment or homology searches. They are the essential component in the detection of structure, function and evolutionary relationships between protein sequences. Substitution matrices derived from structural superposition of homologous pairs of proteins provide the best performance, and it has been shown that amino acid substitutions are indeed constrained by the structural environment, each environment displaying a distinct substitution pattern. One of the most reliable and popular tool for sequence-structure homology recognition, FUGUE, is based on environment-dependent matrices [1]. The FUGUE substitution matrices are deduced from a classification into 64 empirically selected 3D environments. Here we use hierarchical clustering and decision tree algorithms to determine optimal classifications of 3D environments leading to improved environment-dependent substitution matrices. Decision tree classifications appear rob

    Protein Peeling 3D: new tools for analyzing protein structures.

    Get PDF
    International audienceWe present an improved version of our Protein Peeling web server dedicated to the analysis of protein structure architecture through the identification of protein units produced by an iterative splitting algorithm. New features include identification of structural domains, detection of unstructured terminal elements and evaluation of the stability of protein unit structures. AVAILABILITY: The website is free and open to all users with no login requirements at http://www.dsimb.inserm.fr/dsimb-tools/peeling3

    A bioinformatic web server to cut protein structures in terms of Protein Units.

    Get PDF
    Analysis of the architecture and organization of protein structures is a major challenge to better understand protein flexibility, folding, functions and interactions with their partners and to design new drugs. Protein structures are often described as series of alpha-helices and beta-sheets, or at a higher level as an arrangement of protein domains. Due to the lack of an intermediate vision which could give a good understanding and description of protein structure architecture, we have proposed a novel intermediate view, the Protein Units (PUs). They are novel level of protein structure description between secondary structures and domains. A PU is defined as a compact sub-region of the 3D structure corresponding to one sequence fragment, defined by a high number of intra-PU contacts and a low number of inter-PU contacts. The methodology to obtain PUs from the protein structures is named Protein Peeling (PP). For the algorithm, the protein structures are described as a succession of Ca. The distances between Ca are translated into contact probabilities using a logistic function. Protein Peeling only uses this contact probability matrix. An optimization procedure, based on the Matthews' coefficient correlation (MCC) between contacts probability sub matrices, defines optimal cutting points that separate the region examined into two or three PUs. The process is iterated until the compactness of the resulting PUs reaches a given limit. An index assesses the compactness quality and relative independence of each PU. Protein Peeling is a tool to better understand and analyze the organization of protein structures. We have developed a dedicated bioinformatic web server: Protein Peeling 2 (PP2). Given the 3D coordinates of a protein, it proposes an automatic identification of protein units (PUs). The interface component consists of a web page (HTML) and common gateway interface (CGI). The user can set many parameters and upload a given structure in PDB file format to a perl core instance. This last component is a module that embeds all the information necessary for two others softwares (mainly coded in C to perform most of the computation tasks and R for the analysis). Results are given both textually and graphically using JMol applet and PyMol software. The server can be accessed from http://www.dsimb.inserm.fr/dsimb_tools/peeling/. Only one equivalent on line methodology is available
    corecore