441 research outputs found

    Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-protein interactions play essential roles in protein function determination and drug design. Numerous methods have been proposed to recognize their interaction sites, however, only a small proportion of protein complexes have been successfully resolved due to the high cost. Therefore, it is important to improve the performance for predicting protein interaction sites based on primary sequence alone.</p> <p>Results</p> <p>We propose a new idea to construct an integrative profile for each residue in a protein by combining its hydrophobic and evolutionary information. A support vector machine (SVM) ensemble is then developed, where SVMs train on different pairs of positive (interface sites) and negative (non-interface sites) subsets. The subsets having roughly the same sizes are grouped in the order of accessible surface area change before and after complexation. A self-organizing map (SOM) technique is applied to group similar input vectors to make more accurate the identification of interface residues. An ensemble of ten-SVMs achieves an MCC improvement by around 8% and F1 improvement by around 9% over that of three-SVMs. As expected, SVM ensembles constantly perform better than individual SVMs. In addition, the model by the integrative profiles outperforms that based on the sequence profile or the hydropathy scale alone. As our method uses a small number of features to encode the input vectors, our model is simpler, faster and more accurate than the existing methods.</p> <p>Conclusions</p> <p>The integrative profile by combining hydrophobic and evolutionary information contributes most to the protein-protein interaction prediction. Results show that evolutionary context of residue with respect to hydrophobicity makes better the identification of protein interface residues. In addition, the ensemble of SVM classifiers improves the prediction performance.</p> <p>Availability</p> <p>Datasets and software are available at <url>http://mail.ustc.edu.cn/~bigeagle/BMCBioinfo2010/index.htm</url>.</p

    Experimental and Theoretical Studies of DNA-Macroion Interactions.

    Full text link
    The thesis uses experiments and simulations to examine the interactions of DNA molecules with proteins and protein-like nanoparticles, with applications to protein search and targeting of DNA sequences, and to DNA complexation in chromatin and for gene delivery. Two topics are covered in depth. In the first topic, kinetic Monte Carlo simulations, one dimensional reaction-diffusion equations, and analytical methods are used to determine rate at which DNA-binding proteins (e.g. transcription factors) can find the target sequences in long DNA molecules through a combination of sequence-dependent 1D diffusion and sequence-independent 3D diffusion. We quantify how thousands of ``decoy sites'' which have similar base pair sequences as target sites slow down the protein targeting process dramatically. We find the conditions under which the protein targeting process can be sped-up, including the effect of a ``two-state'' protein model, allowing for both rapid diffusion and accurate searching. In the second topic, we investigate how the surface charge density of a poly(amido amine) (or PAMAM) dendrimer affects its ability to condense on DNA, using light scattering, circular dichroism, and single molecule imaging of dendrimer-DNA complexes combed onto surfaces and tethered to those surfaces under flow. This study is important not only for understanding how to condense dsDNA to facilitate its penetration into cell membranes for non-viral gene therapy, but also because PAMAM dendrimers provide an ideal biomimic of DNA-binding proteins (e.g. histones). To describe DNA compaction by dendrimers, we develop a mesoscale model combining a coarse-grained DNA model of de Pablo and coworkers which resolves the DNA double helix structure with a coarse-grained dendrimer model of Muthukumar and coworkers. The predictions of our new model on effects of dendrimer generation, dendrimer surface charge density, and salt concentration on dendrimer-DNA complexes formation are consistent with both experiments and potential of mean force results from all-atom molecular dynamics simulations, but give much more detail regarding the structure of the complex. Moreover, this model can be extended to other cationic macroion-DNA systems which are also of great interest, such as, polylysine, micelles, and colloidal particles.PHDChemical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111572/1/shiyu_1.pd

    STRUCTURAL BIOINFORMATICS BASED METHOD FOR PREDICTING THE INITIAL ADSORBED PROTEIN ORIENTATION ON A SURFACE

    Get PDF
    In any molecular simulation of protein-surface interaction, the selection of the initial orientation with which the protein would interact with the surface must be first made and is found to be critical in the determination of the bioactive state of the adsorbed protein. While various molecular simulation methods have been developed to identify the preferred orientation, these methods are generally computationally expensive and time consuming, especially for large molecules thereby motivating the current study. The computational implementation for identifying a preferred orientation was done in MATLAB¨ and directly addresses the current research problem by assuming the protein to be rigid and mapping the number of solvent accessible residues that would interact with the surface as a function of orientation, thereby yielding a topography map that would reveal the potential minimum energy orientations for a given protein-interface interaction system. The protein orientation prediction has been performed for a wide range of proteins (11kDa-300kDa) and surfaces (hydrophobic, hydrophilic, charged, biological-membranes) with the total runtime involved usually averaging in minutes. These results were also found to be in good agreement with the experimental and simulation results reported in the literature for biological and man-made materials. Besides the intended application for the support to molecular simulations, this method also has the general application of surface design to control the bioactive state of adsorbed proteins and to selectively target and immobilize protein in a controlled orientation

    Protein Domain Linker Prediction: A Direction for Detecting Protein – Protein Interactions

    Get PDF
    Protein chains are generally long and consist of multiple domains. Domains are the basic of elements of protein structures that can exist, evolve and function independently. The accurate and reliable identification of protein domains and their interactions has very important impacts in several protein research areas. The accurate prediction of protein domains is a fundamental stage in both experimental and computational proteomics. The knowledge is an initial stage of protein tertiary structure prediction which can give insight into the way in which protein works. The knowledge of domains is also useful in classifying the proteins, understanding their structures, functions and evolution, and predicting protein-protein interactions (PPI). However, predicting structural domains within proteins is a challenging task in computational biology. A promising direction of domain prediction is detecting inter-domain linkers and then predicting the reigns of the protein sequence in which the structural domains are located accordingly. Protein-protein interactions occur at almost every level of cell function. The identification of interaction among proteins and their associated domains provide a global picture of cellular functions and biological processes. It is also an essential step in the construction of PPI networks for human and other organisms. PPI prediction has been considered as a promising alternative to the traditional drug design techniques. The identification of possible viral-host protein interaction can lead to a better understanding of infection mechanisms and, in turn, to the development of several medication drugs and treatment optimization. In this work, a compact and accurate approach for inter-domain linker prediction is developed based solely on protein primary structure information. Then, inter-domain linker knowledge is used in predicting structural domains and detecting PPI. The research work in this dissertation can be summarized in three main contributions. The first contribution is predicting protein inter-domain linker regions by introducing the concept of amino acid compositional index and refining the prediction by using the Simulated Annealing optimization technique. The second contribution is identifying structural domains based on inter-domain linker knowledge. The inter-domain linker knowledge, represented by the compositional index, is enhanced by the in cooperation of biological knowledge, represented by amino acid physiochemical properties. To develop a well optimized Random Forest classifier for predicting novel domain and inter-domain linkers. In the third contribution, the domain information knowledge is utilized to predict protein-protein interactions. This is achieved by characterizing structural domains within protein sequences, analyzing their interactions, and predicting protein interaction based on their interacting domains. The experimental studies and the higher accuracy achieved is a valid argument in favor of the proposed framework

    Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers

    Get PDF
    Background. Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. Results. In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. Conclusions. Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy. © 2010 Li and Chen; licensee BioMed Central Ltd

    A comprehensive review of computation-based metal-binding prediction approaches at the residue level

    Get PDF
    Clear evidence has shown that metal ions strongly connect and delicately tune the dynamic homeostasis in living bodies. They have been proved to be associated with protein structure, stability, regulation, and function. Even small changes in the concentration of metal ions can shift their effects from natural beneficial functions to harmful. This leads to degenerative diseases, malignant tumors, and cancers. Accurate characterizations and predictions of metalloproteins at the residue level promise informative clues to the investigation of intrinsic mechanisms of protein-metal ion interactions. Compared to biophysical or biochemical wet-lab technologies, computational methods provide open web interfaces of high-resolution databases and high-throughput predictors for efficient investigation of metal-binding residues. This review surveys and details 18 public databases of metal-protein binding. We collect a comprehensive set of 44 computation-based methods and classify them into four categories, namely, learning-, docking-, template-, and meta-based methods. We analyze the benchmark datasets, assessment criteria, feature construction, and algorithms. We also compare several methods on two benchmark testing datasets and include a discussion about currently publicly available predictive tools. Finally, we summarize the challenges and underlying limitations of the current studies and propose several prospective directions concerning the future development of the related databases and methods
    • …
    corecore