252 research outputs found
DeepSF: deep convolutional neural network for mapping protein sequences to folds
Motivation
Protein fold recognition is an important problem in structural
bioinformatics. Almost all traditional fold recognition methods use sequence
(homology) comparison to indirectly predict the fold of a tar get protein based
on the fold of a template protein with known structure, which cannot explain
the relationship between sequence and fold. Only a few methods had been
developed to classify protein sequences into a small number of folds due to
methodological limitations, which are not generally useful in practice.
Results
We develop a deep 1D-convolution neural network (DeepSF) to directly classify
any protein se quence into one of 1195 known folds, which is useful for both
fold recognition and the study of se quence-structure relationship. Different
from traditional sequence alignment (comparison) based methods, our method
automatically extracts fold-related features from a protein sequence of any
length and map it to the fold space. We train and test our method on the
datasets curated from SCOP1.75, yielding a classification accuracy of 80.4%. On
the independent testing dataset curated from SCOP2.06, the classification
accuracy is 77.0%. We compare our method with a top profile profile alignment
method - HHSearch on hard template-based and template-free modeling targets of
CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is
14.5%-29.1% higher than HHSearch on template-free modeling targets and
4.5%-16.7% higher on hard template-based modeling targets for top 1, 5, and 10
predicted folds. The hidden features extracted from sequence by our method is
robust against sequence mutation, insertion, deletion and truncation, and can
be used for other protein pattern recognition problems such as protein
clustering, comparison and ranking.Comment: 28 pages, 13 figure
Improved protein contact predictions with the MetaPSICOV2 server in CASP12
In this paper, we present the results for the MetaPSICOV2 contact prediction server in the CASP12 community experiment (http://predictioncenter.org). Over the 35 assessed Free Modelling target domains the MetaPSICOV2 server achieved a mean precision of 43.27%, a substantial increase relative to the server's performance in the CASP11 experiment. In the following paper, we discuss improvements to the MetaPSICOV2 server, covering both changes to the neural network and attempts to integrate contact predictions on a domain basis into the prediction pipeline. We also discuss some limitations in the CASP12 assessment which may have overestimated the performance of our method
A Database of Domain Definitions for Proteins with Complex Interdomain Geometry
Protein structural domains are necessary for understanding evolution and protein folding, and may vary widely from functional and sequence based domains. Although, various structural domain databases exist, defining domains for some proteins is non-trivial, and definitions of their domain boundaries are not available. Here, we present a novel database of manually defined structural domains for a representative set of proteins from the SCOP “multi-domain proteins” class. (http://prodata.swmed.edu/multidom/). We consider our domains as mobile evolutionary units, which may rearrange during protein evolution. Additionally, they may be visualized as structurally compact and possibly independently folding units. We also found that representing domains as evolutionary and folding units do not always lead to a unique domain definition. However, unlike existing databases, we retain and refine these “alternate” domain definitions after careful inspection of structural similarity, functional sites and automated domain definition methods. We provide domain definitions, including actual residue boundaries, for proteins that well known databases like SCOP and CATH do not attempt to split. Our alternate domain definitions are suitable for sequence and structure searches by automated methods. Additionally, the database can be used for training and testing domain delineation algorithms. Since our domains represent structurally compact evolutionary units, the database may be useful for studying domain properties and evolution
An extracellular steric seeding mechanism for Eph-ephrin signaling platform assembly
Erythropoetin-producing hepatoma (Eph) receptors are cell-surface protein tyrosine kinases mediating cell-cell communication. Upon activation, they form signaling clusters. We report crystal structures of the full ectodomain of human EphA2 (eEphA2) both alone and in complex with the receptor-binding domain of the ligand ephrinA5 (ephrinA5 RBD). Unliganded eEphA2 forms linear arrays of staggered parallel receptors involving two patches of residues conserved across A-class Ephs. eEphA2-ephrinA5 RBD forms a more elaborate assembly, whose interfaces include the same conserved regions on eEphA2, but rearranged to accommodate ephrinA5 RBD. Cell-surface expression of mutant EphA2s showed that these interfaces are critical for localization at cell-cell contacts and activation-dependent degradation. Our results suggest a 'nucleation' mechanism whereby a limited number of ligand-receptor interactions 'seed' an arrangement of receptors which can propagate into extended signaling arrays
Blind testing of cross-linking/mass spectrometry hybrid methods in CASP11
Hybrid approaches combine computational methods with experimental data. The information contained in the experimental data can be leveraged to probe the structure of proteins otherwise elusive to computational methods. Compared with computational methods, the structures produced by hybrid methods exhibit some degree of experimental validation. In spite of these advantages, most hybrid methods have not yet been validated in blind tests, hampering their development. Here, we describe the first blind test of a specific cross-link based hybrid method in CASP. This blind test was coordinated by the CASP organizers and utilized a novel, high-density cross-linking/mass-spectrometry (CLMS) approach that is able to collect high-density CLMS data in a matter of days. This experimental protocol was developed in the Rappsilber laboratory. This approach exploits the chemistry of a highly reactive, photoactivatable cross-linker to produce an order of magnitude more cross-links than homobifunctional cross-linkers. The Rappsilber laboratory generated experimental CLMS data based on this protocol, submitted the data to the CASP organizers which then released this data to the CASP11 prediction groups in a separate, CLMS assisted modeling experiment. We did not observe a clear improvement of assisted models, presumably because the properties of the CLMS data-uncertainty in cross-link identification and residue-residue assignment, and uneven distribution over the protein-were largely unknown to the prediction groups and their approaches were not yet tailored to this kind of data. We also suggest modifications to the CLMS-CASP experiment and discuss the importance of rigorous blind testing in the development of hybrid methods. (C) 2016 The Authors Proteins: Structure, Function, and Bioinformatics Published by Wiley Periodicals, Inc
EigenTHREADER: analogous protein fold recognition by efficient contact map threading
Motivation: Protein fold recognition when appropriate, evolutionarily-related, structural templates can be identified is often trivial and may even be viewed as a solved problem. However in cases where no homologous structural templates can be detected, fold recognition is a notoriously difficult problem (Moult et al., 2014). Here we present EigenTHREADER, a novel fold recognition method capable of identifying folds where no homologous structures can be identified. EigenTHREADER takes a query amino acid sequence, generates a map of intra-residue contacts, and then searches a library of contact maps of known structures. To allow the contact maps to be compared, we use eigenvector decomposition to resolve the principal eigenvectors these can then be aligned using standard dynamic programming algorithms. The approach is similar to the Al-Eigen approach of Di Lena et al. (2010), but with improvements made both to speed and accuracy. With this search strategy, EigenTHREADER does not depend directly on sequence homology between the target protein and entries in the fold library to generate models. This in turn enables EigenTHREADER to correctly identify analogous folds where little or no sequence homology information is.
Results: EigenTHREADER outperforms well-established fold recognition methods such as pGenTHREADER and HHSearch in terms of True Positive Rate in the difficult task of analogous fold recognition. This should allow template-based modelling to be extended to many new protein families that were previously intractable to homology based fold recognition methods.
Availability and implementation: All code used to generate these results and the computational protocol can be downloaded from https://github.com/DanBuchan/eigen_scripts. EigenTHREADER, the benchmark code and the data this paper is based on can be downloaded from: http://bioinfadmin.cs.ucl.ac.uk/downloads/eigenTHREADER/
Getting to the end of RNA: structural analysis of protein recognition of 5' and 3' termini.
Accepted versio
Assembly and dynamics of the bacteriophage T4 homologous recombination machinery
Homologous recombination (HR), a process involving the physical exchange of strands between homologous or nearly homologous DNA molecules, is critical for maintaining the genetic diversity and genome stability of species. Bacteriophage T4 is one of the classic systems for studies of homologous recombination. T4 uses HR for high-frequency genetic exchanges, for homology-directed DNA repair (HDR) processes including DNA double-strand break repair, and for the initiation of DNA replication (RDR). T4 recombination proteins are expressed at high levels during T4 infection in E. coli, and share strong sequence, structural, and/or functional conservation with their counterparts in cellular organisms. Biochemical studies of T4 recombination have provided key insights on DNA strand exchange mechanisms, on the structure and function of recombination proteins, and on the coordination of recombination and DNA synthesis activities during RDR and HDR. Recent years have seen the development of detailed biochemical models for the assembly and dynamics of presynaptic filaments in the T4 recombination system, for the atomic structure of T4 UvsX recombinase, and for the roles of DNA helicases in T4 recombination. The goal of this chapter is to review these recent advances and their implications for HR and HDR mechanisms in all organisms
Investigation of the feasibility of elective irradiation to neck level Ib using intensity-modulated radiotherapy for patients with nasopharyngeal carcinoma: a retrospective analysis
Mathematical modeling of microRNA-mediated mechanisms of translation repression
MicroRNAs can affect the protein translation using nine mechanistically
different mechanisms, including repression of initiation and degradation of the
transcript. There is a hot debate in the current literature about which
mechanism and in which situations has a dominant role in living cells. The
worst, same experimental systems dealing with the same pairs of mRNA and miRNA
can provide ambiguous evidences about which is the actual mechanism of
translation repression observed in the experiment. We start with reviewing the
current knowledge of various mechanisms of miRNA action and suggest that
mathematical modeling can help resolving some of the controversial
interpretations. We describe three simple mathematical models of miRNA
translation that can be used as tools in interpreting the experimental data on
the dynamics of protein synthesis. The most complex model developed by us
includes all known mechanisms of miRNA action. It allowed us to study possible
dynamical patterns corresponding to different miRNA-mediated mechanisms of
translation repression and to suggest concrete recipes on determining the
dominant mechanism of miRNA action in the form of kinetic signatures. Using
computational experiments and systematizing existing evidences from the
literature, we justify a hypothesis about co-existence of distinct
miRNA-mediated mechanisms of translation repression. The actually observed
mechanism will be that acting on or changing the limiting "place" of the
translation process. The limiting place can vary from one experimental setting
to another. This model explains the majority of existing controversies
reported.Comment: 40 pages, 9 figures, 4 tables, 91 cited reference. The analysis of
kinetic signatures is updated according to the new model of coupled
transcription, translation and degradation, and of miRNA-based regulation of
this process published recently (arXiv:1204.5941). arXiv admin note: text
overlap with arXiv:0911.179
- …