32,970 research outputs found
DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences
Identification of drug-target interactions (DTIs) plays a key role in drug
discovery. The high cost and labor-intensive nature of in vitro and in vivo
experiments have highlighted the importance of in silico-based DTI prediction
approaches. In several computational models, conventional protein descriptors
are shown to be not informative enough to predict accurate DTIs. Thus, in this
study, we employ a convolutional neural network (CNN) on raw protein sequences
to capture local residue patterns participating in DTIs. With CNN on protein
sequences, our model performs better than previous protein descriptor-based
models. In addition, our model performs better than the previous deep learning
model for massive prediction of DTIs. By examining the pooled convolution
results, we found that our model can detect binding sites of proteins for DTIs.
In conclusion, our prediction model for detecting local residue patterns of
target proteins successfully enriches the protein features of a raw protein
sequence, yielding better prediction results than previous approaches.Comment: 26 pages, 7 figure
Prediction of protein-protein interaction types using association rule based classification
This article has been made available through the Brunel Open Access Publishing Fund - Copyright @ 2009 Park et alBackground: Protein-protein interactions (PPI) can be classified according to their characteristics into, for example obligate or transient interactions. The identification and characterization of these PPI types may help in the functional annotation of new protein complexes and in the prediction of protein interaction partners by knowledge driven approaches. Results: This work addresses pattern discovery of the interaction sites for four different interaction types to characterize and uses them for the prediction of PPI types employing Association Rule Based Classification (ARBC) which includes association rule generation and posterior classification. We incorporated domain information from protein complexes in SCOP proteins and identified 354 domain-interaction sites. 14 interface properties were calculated from amino acid and secondary structure composition and then used to generate a set of association rules characterizing these domain-interaction sites employing the APRIORI algorithm. Our results regarding the classification of PPI types based on a set of discovered association rules shows that the discriminative ability of association rules can significantly impact on the prediction power of classification models. We also showed that the accuracy of the classification can be improved through the use of structural domain information and also the use of secondary structure content. Conclusion: The advantage of our approach is that we can extract biologically significant information from the interpretation of the discovered association rules in terms of understandability and interpretability of rules. A web application based on our method can be found at http://bioinfo.ssu.ac.kr/~shpark/picasso/SHP was supported by the Korea Research Foundation Grant funded by the Korean Government(KRF-2005-214-E00050). JAR has been
supported by the Programme Alβan, the European Union Programme of High level Scholarships for Latin America, scholarship E04D034854CL. SK was supported by Soongsil University Research Fund
Deep learning extends de novo protein modelling coverage of genomes using iteratively predicted structural constraints
The inapplicability of amino acid covariation methods to small protein
families has limited their use for structural annotation of whole genomes.
Recently, deep learning has shown promise in allowing accurate residue-residue
contact prediction even for shallow sequence alignments. Here we introduce
DMPfold, which uses deep learning to predict inter-atomic distance bounds, the
main chain hydrogen bond network, and torsion angles, which it uses to build
models in an iterative fashion. DMPfold produces more accurate models than two
popular methods for a test set of CASP12 domains, and works just as well for
transmembrane proteins. Applied to all Pfam domains without known structures,
confident models for 25% of these so-called dark families were produced in
under a week on a small 200 core cluster. DMPfold provides models for 16% of
human proteome UniProt entries without structures, generates accurate models
with fewer than 100 sequences in some cases, and is freely available.Comment: JGG and SMK contributed equally to the wor
Exact and efficient top-K inference for multi-target prediction by querying separable linear relational models
Many complex multi-target prediction problems that concern large target
spaces are characterised by a need for efficient prediction strategies that
avoid the computation of predictions for all targets explicitly. Examples of
such problems emerge in several subfields of machine learning, such as
collaborative filtering, multi-label classification, dyadic prediction and
biological network inference. In this article we analyse efficient and exact
algorithms for computing the top- predictions in the above problem settings,
using a general class of models that we refer to as separable linear relational
models. We show how to use those inference algorithms, which are modifications
of well-known information retrieval methods, in a variety of machine learning
settings. Furthermore, we study the possibility of scoring items incompletely,
while still retaining an exact top-K retrieval. Experimental results in several
application domains reveal that the so-called threshold algorithm is very
scalable, performing often many orders of magnitude more efficiently than the
naive approach
Proteomic study of the membrane components of signalling cascades of Botrytis cinerea controlled by phosphorylation
Protein phosphorylation and membrane proteins play an important role in the infection of plants by phytopathogenic fungi, given their involvement in signal transduction cascades. Botrytis cinerea is a well-studied necrotrophic fungus taken as a model organism in fungal plant pathology, given its broad host range and adverse economic impact. To elucidate relevant events during infection, several proteomics analyses have been performed in B. cinerea, but they cover only 10% of the total proteins predicted in the genome database of this fungus. To increase coverage, we analysed by LC-MS/MS the first-reported overlapped proteome in phytopathogenic fungi, the “phosphomembranome” of B. cinerea, combining the two most important signal transduction subproteomes. Of the 1112 membrane-associated phosphoproteins identified, 64 and 243 were classified as exclusively identified or overexpressed under glucose and deproteinized tomato cell wall conditions, respectively. Seven proteins were found under both conditions, but these presented a specific phosphorylation pattern, so they were considered as exclusively identified or overexpressed proteins. From bioinformatics analysis, those differences in the membrane-associated phosphoproteins composition were associated with various processes, including pyruvate metabolism, unfolded protein response, oxidative stress response, autophagy and cell death. Our results suggest these proteins play a significant role in the B. cinerea pathogenic cycl
Annotating Protein Functional Residues by Coupling High-Throughput Fitness Profile and Homologous-Structure Analysis.
Identification and annotation of functional residues are fundamental questions in protein sequence analysis. Sequence and structure conservation provides valuable information to tackle these questions. It is, however, limited by the incomplete sampling of sequence space in natural evolution. Moreover, proteins often have multiple functions, with overlapping sequences that present challenges to accurate annotation of the exact functions of individual residues by conservation-based methods. Using the influenza A virus PB1 protein as an example, we developed a method to systematically identify and annotate functional residues. We used saturation mutagenesis and high-throughput sequencing to measure the replication capacity of single nucleotide mutations across the entire PB1 protein. After predicting protein stability upon mutations, we identified functional PB1 residues that are essential for viral replication. To further annotate the functional residues important to the canonical or noncanonical functions of viral RNA-dependent RNA polymerase (vRdRp), we performed a homologous-structure analysis with 16 different vRdRp structures. We achieved high sensitivity in annotating the known canonical polymerase functional residues. Moreover, we identified a cluster of noncanonical functional residues located in the loop region of the PB1 β-ribbon. We further demonstrated that these residues were important for PB1 protein nuclear import through the interaction with Ran-binding protein 5. In summary, we developed a systematic and sensitive method to identify and annotate functional residues that are not restrained by sequence conservation. Importantly, this method is generally applicable to other proteins about which homologous-structure information is available.ImportanceTo fully comprehend the diverse functions of a protein, it is essential to understand the functionality of individual residues. Current methods are highly dependent on evolutionary sequence conservation, which is usually limited by sampling size. Sequence conservation-based methods are further confounded by structural constraints and multifunctionality of proteins. Here we present a method that can systematically identify and annotate functional residues of a given protein. We used a high-throughput functional profiling platform to identify essential residues. Coupling it with homologous-structure comparison, we were able to annotate multiple functions of proteins. We demonstrated the method with the PB1 protein of influenza A virus and identified novel functional residues in addition to its canonical function as an RNA-dependent RNA polymerase. Not limited to virology, this method is generally applicable to other proteins that can be functionally selected and about which homologous-structure information is available
- …