62,892 research outputs found

    Protein Domain Linker Prediction: A Direction for Detecting Protein – Protein Interactions

    Get PDF
    Protein chains are generally long and consist of multiple domains. Domains are the basic of elements of protein structures that can exist, evolve and function independently. The accurate and reliable identification of protein domains and their interactions has very important impacts in several protein research areas. The accurate prediction of protein domains is a fundamental stage in both experimental and computational proteomics. The knowledge is an initial stage of protein tertiary structure prediction which can give insight into the way in which protein works. The knowledge of domains is also useful in classifying the proteins, understanding their structures, functions and evolution, and predicting protein-protein interactions (PPI). However, predicting structural domains within proteins is a challenging task in computational biology. A promising direction of domain prediction is detecting inter-domain linkers and then predicting the reigns of the protein sequence in which the structural domains are located accordingly. Protein-protein interactions occur at almost every level of cell function. The identification of interaction among proteins and their associated domains provide a global picture of cellular functions and biological processes. It is also an essential step in the construction of PPI networks for human and other organisms. PPI prediction has been considered as a promising alternative to the traditional drug design techniques. The identification of possible viral-host protein interaction can lead to a better understanding of infection mechanisms and, in turn, to the development of several medication drugs and treatment optimization. In this work, a compact and accurate approach for inter-domain linker prediction is developed based solely on protein primary structure information. Then, inter-domain linker knowledge is used in predicting structural domains and detecting PPI. The research work in this dissertation can be summarized in three main contributions. The first contribution is predicting protein inter-domain linker regions by introducing the concept of amino acid compositional index and refining the prediction by using the Simulated Annealing optimization technique. The second contribution is identifying structural domains based on inter-domain linker knowledge. The inter-domain linker knowledge, represented by the compositional index, is enhanced by the in cooperation of biological knowledge, represented by amino acid physiochemical properties. To develop a well optimized Random Forest classifier for predicting novel domain and inter-domain linkers. In the third contribution, the domain information knowledge is utilized to predict protein-protein interactions. This is achieved by characterizing structural domains within protein sequences, analyzing their interactions, and predicting protein interaction based on their interacting domains. The experimental studies and the higher accuracy achieved is a valid argument in favor of the proposed framework

    Structural Prediction of Protein–Protein Interactions by Docking: Application to Biomedical Problems

    Get PDF
    A huge amount of genetic information is available thanks to the recent advances in sequencing technologies and the larger computational capabilities, but the interpretation of such genetic data at phenotypic level remains elusive. One of the reasons is that proteins are not acting alone, but are specifically interacting with other proteins and biomolecules, forming intricate interaction networks that are essential for the majority of cell processes and pathological conditions. Thus, characterizing such interaction networks is an important step in understanding how information flows from gene to phenotype. Indeed, structural characterization of protein–protein interactions at atomic resolution has many applications in biomedicine, from diagnosis and vaccine design, to drug discovery. However, despite the advances of experimental structural determination, the number of interactions for which there is available structural data is still very small. In this context, a complementary approach is computational modeling of protein interactions by docking, which is usually composed of two major phases: (i) sampling of the possible binding modes between the interacting molecules and (ii) scoring for the identification of the correct orientations. In addition, prediction of interface and hot-spot residues is very useful in order to guide and interpret mutagenesis experiments, as well as to understand functional and mechanistic aspects of the interaction. Computational docking is already being applied to specific biomedical problems within the context of personalized medicine, for instance, helping to interpret pathological mutations involved in protein–protein interactions, or providing modeled structural data for drug discovery targeting protein–protein interactions.Spanish Ministry of Economy grant number BIO2016-79960-R; D.B.B. is supported by a predoctoral fellowship from CONACyT; M.R. is supported by an FPI fellowship from the Severo Ochoa program. We are grateful to the Joint BSC-CRG-IRB Programme in Computational Biology.Peer ReviewedPostprint (author's final draft

    Predicting Secondary Structures, Contact Numbers, and Residue-wise Contact Orders of Native Protein Structure from Amino Acid Sequence by Critical Random Networks

    Full text link
    Prediction of one-dimensional protein structures such as secondary structures and contact numbers is useful for the three-dimensional structure prediction and important for the understanding of sequence-structure relationship. Here we present a new machine-learning method, critical random networks (CRNs), for predicting one-dimensional structures, and apply it, with position-specific scoring matrices, to the prediction of secondary structures (SS), contact numbers (CN), and residue-wise contact orders (RWCO). The present method achieves, on average, Q3Q_3 accuracy of 77.8% for SS, correlation coefficients of 0.726 and 0.601 for CN and RWCO, respectively. The accuracy of the SS prediction is comparable to other state-of-the-art methods, and that of the CN prediction is a significant improvement over previous methods. We give a detailed formulation of critical random networks-based prediction scheme, and examine the context-dependence of prediction accuracies. In order to study the nonlinear and multi-body effects, we compare the CRNs-based method with a purely linear method based on position-specific scoring matrices. Although not superior to the CRNs-based method, the surprisingly good accuracy achieved by the linear method highlights the difficulty in extracting structural features of higher order from amino acid sequence beyond that provided by the position-specific scoring matrices.Comment: 20 pages, 1 figure, 5 tables; minor revision; accepted for publication in BIOPHYSIC

    Enhancing the functional content of protein interaction networks

    Full text link
    Protein interaction networks are a promising type of data for studying complex biological systems. However, despite the rich information embedded in these networks, they face important data quality challenges of noise and incompleteness that adversely affect the results obtained from their analysis. Here, we explore the use of the concept of common neighborhood similarity (CNS), which is a form of local structure in networks, to address these issues. Although several CNS measures have been proposed in the literature, an understanding of their relative efficacies for the analysis of interaction networks has been lacking. We follow the framework of graph transformation to convert the given interaction network into a transformed network corresponding to a variety of CNS measures evaluated. The effectiveness of each measure is then estimated by comparing the quality of protein function predictions obtained from its corresponding transformed network with those from the original network. Using a large set of S. cerevisiae interactions, and a set of 136 GO terms, we find that several of the transformed networks produce more accurate predictions than those obtained from the original network. In particular, the HC.contHC.cont measure proposed here performs particularly well for this task. Further investigation reveals that the two major factors contributing to this improvement are the abilities of CNS measures, especially HC.contHC.cont, to prune out noisy edges and introduce new links between functionally related proteins

    Diffusion Component Analysis: Unraveling Functional Topology in Biological Networks

    Full text link
    Complex biological systems have been successfully modeled by biochemical and genetic interaction networks, typically gathered from high-throughput (HTP) data. These networks can be used to infer functional relationships between genes or proteins. Using the intuition that the topological role of a gene in a network relates to its biological function, local or diffusion based "guilt-by-association" and graph-theoretic methods have had success in inferring gene functions. Here we seek to improve function prediction by integrating diffusion-based methods with a novel dimensionality reduction technique to overcome the incomplete and noisy nature of network data. In this paper, we introduce diffusion component analysis (DCA), a framework that plugs in a diffusion model and learns a low-dimensional vector representation of each node to encode the topological properties of a network. As a proof of concept, we demonstrate DCA's substantial improvement over state-of-the-art diffusion-based approaches in predicting protein function from molecular interaction networks. Moreover, our DCA framework can integrate multiple networks from heterogeneous sources, consisting of genomic information, biochemical experiments and other resources, to even further improve function prediction. Yet another layer of performance gain is achieved by integrating the DCA framework with support vector machines that take our node vector representations as features. Overall, our DCA framework provides a novel representation of nodes in a network that can be used as a plug-in architecture to other machine learning algorithms to decipher topological properties of and obtain novel insights into interactomes.Comment: RECOMB 201
    corecore