252 research outputs found

    DeepSF: deep convolutional neural network for mapping protein sequences to folds

    Get PDF
    Motivation Protein fold recognition is an important problem in structural bioinformatics. Almost all traditional fold recognition methods use sequence (homology) comparison to indirectly predict the fold of a tar get protein based on the fold of a template protein with known structure, which cannot explain the relationship between sequence and fold. Only a few methods had been developed to classify protein sequences into a small number of folds due to methodological limitations, which are not generally useful in practice. Results We develop a deep 1D-convolution neural network (DeepSF) to directly classify any protein se quence into one of 1195 known folds, which is useful for both fold recognition and the study of se quence-structure relationship. Different from traditional sequence alignment (comparison) based methods, our method automatically extracts fold-related features from a protein sequence of any length and map it to the fold space. We train and test our method on the datasets curated from SCOP1.75, yielding a classification accuracy of 80.4%. On the independent testing dataset curated from SCOP2.06, the classification accuracy is 77.0%. We compare our method with a top profile profile alignment method - HHSearch on hard template-based and template-free modeling targets of CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is 14.5%-29.1% higher than HHSearch on template-free modeling targets and 4.5%-16.7% higher on hard template-based modeling targets for top 1, 5, and 10 predicted folds. The hidden features extracted from sequence by our method is robust against sequence mutation, insertion, deletion and truncation, and can be used for other protein pattern recognition problems such as protein clustering, comparison and ranking.Comment: 28 pages, 13 figure

    Improved protein contact predictions with the MetaPSICOV2 server in CASP12

    Get PDF
    In this paper, we present the results for the MetaPSICOV2 contact prediction server in the CASP12 community experiment (http://predictioncenter.org). Over the 35 assessed Free Modelling target domains the MetaPSICOV2 server achieved a mean precision of 43.27%, a substantial increase relative to the server's performance in the CASP11 experiment. In the following paper, we discuss improvements to the MetaPSICOV2 server, covering both changes to the neural network and attempts to integrate contact predictions on a domain basis into the prediction pipeline. We also discuss some limitations in the CASP12 assessment which may have overestimated the performance of our method

    A Database of Domain Definitions for Proteins with Complex Interdomain Geometry

    Get PDF
    Protein structural domains are necessary for understanding evolution and protein folding, and may vary widely from functional and sequence based domains. Although, various structural domain databases exist, defining domains for some proteins is non-trivial, and definitions of their domain boundaries are not available. Here, we present a novel database of manually defined structural domains for a representative set of proteins from the SCOP “multi-domain proteins” class. (http://prodata.swmed.edu/multidom/). We consider our domains as mobile evolutionary units, which may rearrange during protein evolution. Additionally, they may be visualized as structurally compact and possibly independently folding units. We also found that representing domains as evolutionary and folding units do not always lead to a unique domain definition. However, unlike existing databases, we retain and refine these “alternate” domain definitions after careful inspection of structural similarity, functional sites and automated domain definition methods. We provide domain definitions, including actual residue boundaries, for proteins that well known databases like SCOP and CATH do not attempt to split. Our alternate domain definitions are suitable for sequence and structure searches by automated methods. Additionally, the database can be used for training and testing domain delineation algorithms. Since our domains represent structurally compact evolutionary units, the database may be useful for studying domain properties and evolution

    An extracellular steric seeding mechanism for Eph-ephrin signaling platform assembly

    Get PDF
    Erythropoetin-producing hepatoma (Eph) receptors are cell-surface protein tyrosine kinases mediating cell-cell communication. Upon activation, they form signaling clusters. We report crystal structures of the full ectodomain of human EphA2 (eEphA2) both alone and in complex with the receptor-binding domain of the ligand ephrinA5 (ephrinA5 RBD). Unliganded eEphA2 forms linear arrays of staggered parallel receptors involving two patches of residues conserved across A-class Ephs. eEphA2-ephrinA5 RBD forms a more elaborate assembly, whose interfaces include the same conserved regions on eEphA2, but rearranged to accommodate ephrinA5 RBD. Cell-surface expression of mutant EphA2s showed that these interfaces are critical for localization at cell-cell contacts and activation-dependent degradation. Our results suggest a 'nucleation' mechanism whereby a limited number of ligand-receptor interactions 'seed' an arrangement of receptors which can propagate into extended signaling arrays

    Blind testing of cross-linking/mass spectrometry hybrid methods in CASP11

    Get PDF
    Hybrid approaches combine computational methods with experimental data. The information contained in the experimental data can be leveraged to probe the structure of proteins otherwise elusive to computational methods. Compared with computational methods, the structures produced by hybrid methods exhibit some degree of experimental validation. In spite of these advantages, most hybrid methods have not yet been validated in blind tests, hampering their development. Here, we describe the first blind test of a specific cross-link based hybrid method in CASP. This blind test was coordinated by the CASP organizers and utilized a novel, high-density cross-linking/mass-spectrometry (CLMS) approach that is able to collect high-density CLMS data in a matter of days. This experimental protocol was developed in the Rappsilber laboratory. This approach exploits the chemistry of a highly reactive, photoactivatable cross-linker to produce an order of magnitude more cross-links than homobifunctional cross-linkers. The Rappsilber laboratory generated experimental CLMS data based on this protocol, submitted the data to the CASP organizers which then released this data to the CASP11 prediction groups in a separate, CLMS assisted modeling experiment. We did not observe a clear improvement of assisted models, presumably because the properties of the CLMS data-uncertainty in cross-link identification and residue-residue assignment, and uneven distribution over the protein-were largely unknown to the prediction groups and their approaches were not yet tailored to this kind of data. We also suggest modifications to the CLMS-CASP experiment and discuss the importance of rigorous blind testing in the development of hybrid methods. (C) 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc

    EigenTHREADER: analogous protein fold recognition by efficient contact map threading

    Get PDF
    Motivation: Protein fold recognition when appropriate, evolutionarily-related, structural templates can be identified is often trivial and may even be viewed as a solved problem. However in cases where no homologous structural templates can be detected, fold recognition is a notoriously difficult problem (Moult et al., 2014). Here we present EigenTHREADER, a novel fold recognition method capable of identifying folds where no homologous structures can be identified. EigenTHREADER takes a query amino acid sequence, generates a map of intra-residue contacts, and then searches a library of contact maps of known structures. To allow the contact maps to be compared, we use eigenvector decomposition to resolve the principal eigenvectors these can then be aligned using standard dynamic programming algorithms. The approach is similar to the Al-Eigen approach of Di Lena et al. (2010), but with improvements made both to speed and accuracy. With this search strategy, EigenTHREADER does not depend directly on sequence homology between the target protein and entries in the fold library to generate models. This in turn enables EigenTHREADER to correctly identify analogous folds where little or no sequence homology information is. Results: EigenTHREADER outperforms well-established fold recognition methods such as pGenTHREADER and HHSearch in terms of True Positive Rate in the difficult task of analogous fold recognition. This should allow template-based modelling to be extended to many new protein families that were previously intractable to homology based fold recognition methods. Availability and implementation: All code used to generate these results and the computational protocol can be downloaded from https://github.com/DanBuchan/eigen_scripts. EigenTHREADER, the benchmark code and the data this paper is based on can be downloaded from: http://bioinfadmin.cs.ucl.ac.uk/downloads/eigenTHREADER/

    Assembly and dynamics of the bacteriophage T4 homologous recombination machinery

    Get PDF
    Homologous recombination (HR), a process involving the physical exchange of strands between homologous or nearly homologous DNA molecules, is critical for maintaining the genetic diversity and genome stability of species. Bacteriophage T4 is one of the classic systems for studies of homologous recombination. T4 uses HR for high-frequency genetic exchanges, for homology-directed DNA repair (HDR) processes including DNA double-strand break repair, and for the initiation of DNA replication (RDR). T4 recombination proteins are expressed at high levels during T4 infection in E. coli, and share strong sequence, structural, and/or functional conservation with their counterparts in cellular organisms. Biochemical studies of T4 recombination have provided key insights on DNA strand exchange mechanisms, on the structure and function of recombination proteins, and on the coordination of recombination and DNA synthesis activities during RDR and HDR. Recent years have seen the development of detailed biochemical models for the assembly and dynamics of presynaptic filaments in the T4 recombination system, for the atomic structure of T4 UvsX recombinase, and for the roles of DNA helicases in T4 recombination. The goal of this chapter is to review these recent advances and their implications for HR and HDR mechanisms in all organisms

    Mathematical modeling of microRNA-mediated mechanisms of translation repression

    Full text link
    MicroRNAs can affect the protein translation using nine mechanistically different mechanisms, including repression of initiation and degradation of the transcript. There is a hot debate in the current literature about which mechanism and in which situations has a dominant role in living cells. The worst, same experimental systems dealing with the same pairs of mRNA and miRNA can provide ambiguous evidences about which is the actual mechanism of translation repression observed in the experiment. We start with reviewing the current knowledge of various mechanisms of miRNA action and suggest that mathematical modeling can help resolving some of the controversial interpretations. We describe three simple mathematical models of miRNA translation that can be used as tools in interpreting the experimental data on the dynamics of protein synthesis. The most complex model developed by us includes all known mechanisms of miRNA action. It allowed us to study possible dynamical patterns corresponding to different miRNA-mediated mechanisms of translation repression and to suggest concrete recipes on determining the dominant mechanism of miRNA action in the form of kinetic signatures. Using computational experiments and systematizing existing evidences from the literature, we justify a hypothesis about co-existence of distinct miRNA-mediated mechanisms of translation repression. The actually observed mechanism will be that acting on or changing the limiting "place" of the translation process. The limiting place can vary from one experimental setting to another. This model explains the majority of existing controversies reported.Comment: 40 pages, 9 figures, 4 tables, 91 cited reference. The analysis of kinetic signatures is updated according to the new model of coupled transcription, translation and degradation, and of miRNA-based regulation of this process published recently (arXiv:1204.5941). arXiv admin note: text overlap with arXiv:0911.179
    corecore