1,620 research outputs found

    A structural classification of protein-protein interactions for detection of convergently evolved motifs and for prediction of protein binding sites on sequence level

    Get PDF
    BACKGROUND: A long-standing challenge in the post-genomic era of Bioinformatics is the prediction of protein-protein interactions, and ultimately the prediction of protein functions. The problem is intrinsically harder, when only amino acid sequences are available, but a solution is more universally applicable. So far, the problem of uncovering protein-protein interactions has been addressed in a variety of ways, both experimentally and computationally. MOTIVATION: The central problem is: How can protein complexes with solved threedimensional structure be utilized to identify and classify protein binding sites and how can knowledge be inferred from this classification such that protein interactions can be predicted for proteins without solved structure? The underlying hypothesis is that protein binding sites are often restricted to a small number of residues, which additionally often are well-conserved in order to maintain an interaction. Therefore, the signal-to-noise ratio in binding sites is expected to be higher than in other parts of the surface. This enables binding site detection in unknown proteins, when homology based annotation transfer fails. APPROACH: The problem is addressed by first investigating how geometrical aspects of domain-domain associations can lead to a rigorous structural classification of the multitude of protein interface types. The interface types are explored with respect to two aspects: First, how do interface types with one-sided homology reveal convergently evolved motifs? Second, how can sequential descriptors for local structural features be derived from the interface type classification? Then, the use of sequential representations for binding sites in order to predict protein interactions is investigated. The underlying algorithms are based on machine learning techniques, in particular Hidden Markov Models. RESULTS: This work includes a novel approach to a comprehensive geometrical classification of domain interfaces. Alternative structural domain associations are found for 40% of all family-family interactions. Evaluation of the classification algorithm on a hand-curated set of interfaces yielded a precision of 83% and a recall of 95%. For the first time, a systematic screen of convergently evolved motifs in 102.000 protein-protein interactions with structural information is derived. With respect to this dataset, all cases related to viral mimicry of human interface bindings are identified. Finally, a library of 740 motif descriptors for binding site recognition - encoded as Hidden Markov Models - is generated and cross-validated. Tests for the significance of motifs are provided. The usefulness of descriptors for protein-ligand binding sites is demonstrated for the case of "ATP-binding", where a precision of 89% is achieved, thus outperforming comparable motifs from PROSITE. In particular, a novel descriptor for a P-loop variant has been used to identify ATP-binding sites in 60 protein sequences that have not been annotated before by existing motif databases

    PocketMatch: A new algorithm to compare binding sites in protein structures

    Get PDF
    Background: Recognizing similarities and deriving relationships among protein molecules is a fundamental
requirement in present-day biology. Similarities can be present at various levels which can be detected through comparison of protein sequences or their structural folds. In some cases similarities obscure at these levels could be present merely in the substructures at their binding sites. Inferring functional similarities between protein molecules by comparing their binding sites is still largely exploratory and not as yet a routine protocol. One of
the main reasons for this is the limitation in the choice of appropriate analytical tools that can compare binding sites with high sensitivity. To benefit from the enormous amount of structural data that is being rapidly accumulated, it is essential to have high throughput tools that enable large scale binding site comparison.

Results: Here we present a new algorithm PocketMatch for comparison of binding sites in a frame invariant
manner. Each binding site is represented by 90 lists of sorted distances capturing shape and chemical nature of the site. The sorted arrays are then aligned using an incremental alignment method and scored to obtain PMScores for pairs of sites. A comprehensive sensitivity analysis and an extensive validation of the algorithm have been carried out. Perturbation studies where the geometry of a given site was retained but the residue types were changed randomly, indicated that chance similarities were virtually non-existent. Our analysis also demonstrates that shape information alone is insufficient to discriminate between diverse binding sites, unless
combined with chemical nature of amino acids.

Conclusions: A new algorithm has been developed to compare binding sites in accurate, efficient and
high-throughput manner. Though the representation used is conceptually simplistic, we demonstrate that along
with the new alignment strategy used, it is sufficient to enable binding comparison with high sensitivity. Novel methodology has also been presented for validating the algorithm for accuracy and sensitivity with respect to geometry and chemical nature of the site. The method is also fast and takes about 1/250th second for one comparison on a single processor. A parallel version on BlueGene has also been implemented

    PocketMatch: A new algorithm to compare binding sites in protein structures

    Get PDF
    Background: Recognizing similarities and deriving relationships among protein molecules is a fundamental
requirement in present-day biology. Similarities can be present at various levels which can be detected through comparison of protein sequences or their structural folds. In some cases similarities obscure at these levels could be present merely in the substructures at their binding sites. Inferring functional similarities between protein molecules by comparing their binding sites is still largely exploratory and not as yet a routine protocol. One of
the main reasons for this is the limitation in the choice of appropriate analytical tools that can compare binding sites with high sensitivity. To benefit from the enormous amount of structural data that is being rapidly accumulated, it is essential to have high throughput tools that enable large scale binding site comparison.

Results: Here we present a new algorithm PocketMatch for comparison of binding sites in a frame invariant
manner. Each binding site is represented by 90 lists of sorted distances capturing shape and chemical nature of the site. The sorted arrays are then aligned using an incremental alignment method and scored to obtain PMScores for pairs of sites. A comprehensive sensitivity analysis and an extensive validation of the algorithm have been carried out. Perturbation studies where the geometry of a given site was retained but the residue types were changed randomly, indicated that chance similarities were virtually non-existent. Our analysis also demonstrates that shape information alone is insufficient to discriminate between diverse binding sites, unless
combined with chemical nature of amino acids.

Conclusions: A new algorithm has been developed to compare binding sites in accurate, efficient and
high-throughput manner. Though the representation used is conceptually simplistic, we demonstrate that along
with the new alignment strategy used, it is sufficient to enable binding comparison with high sensitivity. Novel methodology has also been presented for validating the algorithm for accuracy and sensitivity with respect to geometry and chemical nature of the site. The method is also fast and takes about 1/250th second for one comparison on a single processor. A parallel version on BlueGene has also been implemented

    Kinome-wide interaction modelling using alignment-based and alignment-independent approaches for kinase description and linear and non-linear data analysis techniques

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein kinases play crucial roles in cell growth, differentiation, and apoptosis. Abnormal function of protein kinases can lead to many serious diseases, such as cancer. Kinase inhibitors have potential for treatment of these diseases. However, current inhibitors interact with a broad variety of kinases and interfere with multiple vital cellular processes, which causes toxic effects. Bioinformatics approaches that can predict inhibitor-kinase interactions from the chemical properties of the inhibitors and the kinase macromolecules might aid in design of more selective therapeutic agents, that show better efficacy and lower toxicity.</p> <p>Results</p> <p>We applied proteochemometric modelling to correlate the properties of 317 wild-type and mutated kinases and 38 inhibitors (12,046 inhibitor-kinase combinations) to the respective combination's interaction dissociation constant (K<sub>d</sub>). We compared six approaches for description of protein kinases and several linear and non-linear correlation methods. The best performing models encoded kinase sequences with amino acid physico-chemical z-scale descriptors and used support vector machines or partial least- squares projections to latent structures for the correlations. Modelling performance was estimated by double cross-validation. The best models showed high predictive ability; the squared correlation coefficient for new kinase-inhibitor pairs ranging P<sup>2 </sup>= 0.67-0.73; for new kinases it ranged P<sup>2</sup><sub>kin </sub>= 0.65-0.70. Models could also separate interacting from non-interacting inhibitor-kinase pairs with high sensitivity and specificity; the areas under the ROC curves ranging AUC = 0.92-0.93. We also investigated the relationship between the number of protein kinases in the dataset and the modelling results. Using only 10% of all data still a valid model was obtained with P<sup>2 </sup>= 0.47, P<sup>2</sup><sub>kin </sub>= 0.42 and AUC = 0.83.</p> <p>Conclusions</p> <p>Our results strongly support the applicability of proteochemometrics for kinome-wide interaction modelling. Proteochemometrics might be used to speed-up identification and optimization of protein kinase targeted and multi-targeted inhibitors.</p

    QSAR model development for early stage screening of monoclonal antibody therapeutics to facilitate rapid developability

    Get PDF
    PhD ThesisMonoclonal antibodies (mAbs) and related therapeutics are highly desirable from a biopharmaceutical perspective as they are highly target specific and well tolerated within the human system. Nevertheless, several mAbs have been discontinued or withdrawn based either on their inability to demonstrate efficacy and/or due to adverse effects. With nearly 80% of drugs failing in clinical development mainly due to lack of efficacy and safety there arises an urgent need for better understanding of biological activity, affinity, pharmacology, toxicity, immunogenicity etc. thus leading to early prediction of success/failure. In this study a hybrid modelling framework was developed that enabled early stage screening of mAbs. The applicability of the experimental methods was first tested on chemical compounds to assess the assay quality following which they were used to assess potential off target adverse effects of mAbs. Furthermore, hypersensitivity reactions were assessed using Skimune™, a non-artificial human skin explants based assay for safety and efficacy assessment of novel compounds and drugs, developed by Alcyomics Ltd. The suitability of Skimune™ for assessing the immune related adverse effects of aggregated mAbs was studied where aggregation was induced using a heat stress protocol. The aggregates were characterised by protein analysis techniques such as analytical ultra-centrifugation following which the immunogenicity tested using Skimune™ assay. Numerical features (descriptors) of mAbs were identified and generated using ProtDCal, EMBOSS Pepstat software as well as amino acid scales for different. Five independent and novel X block datasets consisting of these descriptors were generated based on the physicochemical, electronic, thermodynamic, electronic and topological properties of amino acids: Domain, Window, Substructure, Single Amino Acid, and Running Sum. This study describes the development of a hybrid QSAR based model with a structured workflow and clear evaluation metrics, with several optimisation steps, that could be beneficial for broader and more generic PLS modelling. Based on the results and observation from this study, it was demonstrated incremental improvement via selection of datasets and variables help in further optimisation of these hybrid models. Furthermore, using hypersensitivity and cross reactivity as responses and physicochemical characteristics of mAbs as descriptors, the QSAR models generated for different applicability domains allow for rapid early stage screening and developability. These models were validated with external test set comprising of proprietary compounds from industrial partners, thus paving way for enhanced developability that tackles manufacturing failures as well as attrition rates.European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie actions grant agreemen

    Pattern recognition methods for the prediction of chemical structures of fungal secondary metabolites

    Get PDF
    Non-Ribosomal Peptide Synthetases (NRPS) are mega synthetases that are predominantly found in bacteria and fungi. They produce small peptides that serve numerous biological functions and crucial ecological roles. Adenylation (A) domains of NRPSs catalyze ATP dependent activation of substrates harboring carboxy terminus. A-domain substrates include not only natural amino acids (D and L forms) but also non-proteinogenic amino acids. As the substrate repertoire is large and specificity rules for fungi are not established well, there is a difficulty in predicting substrates for fungal A-domains. In bacteria, ten amino acid residues were established as NRPS code, which determine specificity of A-domains. To study relationships between fungal A-domains and their specificity, the cluster analysis of NRPS code residues was done. NRPS code residues were encoded by physicochemical properties essential for binding small molecules and these residues were clustered. Cluster analysis showed similar NRPS codes for α-amino adipic acid, and tryptophan, etc. between bacteria and fungi. Fungal NRPS codes for substrates such as tyrosine, and proline, did not cluster together with bacteria, which indicates an independent evolution of substrate specificity in fungi. This emphasizes the need for the development of a fungus-specific prediction tool. Currently available A-domain substrate specificity prediction tools accurately identify substrates for bacteria but fail to provide correct predictions for fungi. A novel approach for fungal A-domain substrate specificity prediction is presented here. Neural Network based A-domain substrate specificity classifier (NNassc) was developed using Keras with TensorFlow backend. NNassc was trained solely using fungal NRPS codes and combines physicochemical and structural features for specificity predictions. Internal and external validation datasets of experimentally verified NRPS codes were used to assess the performance of NNassc

    Comparison of Protein Active Site Structures for Functional Annotation of Proteins and Drug Design

    Get PDF
    Rapid and accurate functional assignment of novel proteins is increasing in importance, given the completion of numerous genome sequencing projects and the vastly expanding list of unannotated proteins. Traditionally, global primary-sequence and structure comparisons have been used to determine putative function. These approaches, however, do not emphasize similarities in active site configurations that are fundamental to a protein’s activity and highly conserved relative to the global and more variable structural features. The Comparison of Protein Active Site Structures (CPASS) database and software enable the comparison of experimentally identified ligand-binding sites to infer biological function and aid in drug discovery. The CPASS database comprises the ligand-defined active sites identified in the protein data bank, where the CPASS program compares these ligand-defined active sites to determine sequence and structural similarity without maintaining sequence connectivity. CPASS will compare any set of ligand-defined protein active sites, irrespective of the identity of the bound ligand
    • …
    corecore