52 research outputs found

    Identification of ATP binding residues of a protein from its primary sequence

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the major challenges in post-genomic era is to provide functional annotations for large number of proteins arising from genome sequencing projects. The function of many proteins depends on their interaction with small molecules or ligands. ATP is one such important ligand that plays critical role as a coenzyme in the functionality of many proteins. There is a need to develop method for identifying ATP interacting residues in a ATP binding proteins (ABPs), in order to understand mechanism of protein-ligands interaction.</p> <p>Results</p> <p>We have compared the amino acid composition of ATP interacting and non-interacting regions of proteins and observed that certain residues are preferred for interaction with ATP. This study describes few models that have been developed for identifying ATP interacting residues in a protein. All these models were trained and tested on 168 non-redundant ABPs chains. First we have developed a Support Vector Machine (SVM) based model using primary sequence of proteins and obtained maximum MCC 0.33 with accuracy of 66.25%. Secondly, another SVM based model was developed using position specific scoring matrix (PSSM) generated by PSI-BLAST. The performance of this model was improved significantly (MCC 0.5) from the previous one, where only the primary sequence of the proteins were used.</p> <p>Conclusion</p> <p>This study demonstrates that it is possible to predict 'ATP interacting residues' in a protein with moderate accuracy using its sequence. The evolutionary information is important for the identification of 'ATP interacting residues', as it provides more information compared to the primary sequence. This method will be useful for researchers studying ATP-binding proteins. Based on this study, a web server has been developed for predicting 'ATP interacting residues' in a protein <url>http://www.imtech.res.in/raghava/atpint/</url>.</p

    Prediction of GTP interacting residues, dipeptides and tripeptides in a protein from its evolutionary information

    Get PDF
    Background: Guanosine triphosphate (GTP)-binding proteins play an important role in regulation of G-protein. Thus prediction of GTP interacting residues in a protein is one of the major challenges in the field of the computational biology. In this study, an attempt has been made to develop a computational method for predicting GTP interacting residues in a protein with high accuracy (Acc), precision (Prec) and recall (Rc). Result: All the models developed in this study have been trained and tested on a non-redundant (40% similarity) dataset using five-fold cross-validation. Firstly, we have developed neural network based models using single sequence and PSSM profile and achieved maximum Matthews Correlation Coefficient (MCC) 0.24 (Acc 61.30%) and 0.39 (Acc 68.88%) respectively. Secondly, we have developed a support vector machine (SVM) based models using single sequence and PSSM profile and achieved maximum MCC 0.37 (Prec 0.73, Rc 0.57, Acc 67.98%) and 0.55 (Prec 0.80, Rc 0.73, Acc 77.17%) respectively. In this work, we have introduced a new concept of predicting GTP interacting dipeptide (two consecutive GTP interacting residues) and tripeptide (three consecutive GTP interacting residues) for the first time. We have developed SVM based model for predicting GTP interacting dipeptides using PSSM profile and achieved MCC 0.64 with precision 0.87, recall 0.74 and accuracy 81.37%. Similarly, SVM based model have been developed for predicting GTP interacting tripeptides using PSSM profile and achieved MCC 0.70 with precision 0.93, recall 0.73 and accuracy 83.98%. Conclusion: These results show that PSSM based method performs better than single sequence based method. The prediction models based on dipeptides or tripeptides are more accurate than the traditional model based on single residue. A web server "GTPBinder" http://www.imtech.res.in/raghava/gtpbinder/ webcite based on above models has been developed for predicting GTP interacting residues in a protein

    ProGlycProt: a repository of experimentally characterized prokaryotic glycoproteins

    Get PDF
    ProGlycProt (http://www.proglycprot.org/) is an open access, manually curated, comprehensive repository of bacterial and archaeal glycoproteins with at least one experimentally validated glycosite (glycosylated residue). To facilitate maximum information at one point, the database is arranged under two sections: (i) ProCGPā€”the main data section consisting of 95 entries with experimentally characterized glycosites and (ii) ProUGPā€”a supplementary data section containing 245 entries with experimentally identified glycosylation but uncharacterized glycosites. Every entry in the database is fully cross-referenced and enriched with available published information about source organism, coding gene, protein, glycosites, glycosylation type, attached glycan, associated oligosaccharyl/glycosyl transferases (OSTs/GTs), supporting references, and applicable additional information. Interestingly, ProGlycProt contains as many as 174 entries for which information is unavailable or the characterized glycosites are unannotated in Swiss-Prot release 2011_07. The website supports a dedicated structure gallery of homology models and crystal structures of characterized glycoproteins in addition to two new tools developed in view of emerging information about prokaryotic sequons (conserved sequences of amino acids around glycosites) that are never or rarely seen in eukaryotic glycoproteins. ProGlycProt provides an extensive compilation of experimentally identified glycosites (334) and glycoproteins (340) of prokaryotes that could serve as an information resource for research and technology applications in glycobiology

    Classification of Clinical Isolates of Klebsiella pneumoniae Based on Their in vitro Biofilm Forming Capabilities and Elucidation of the Biofilm Matrix Chemistry With Special Reference to the Protein Content

    Get PDF
    Klebsiella pneumoniae is a human pathogen, capable of forming biofilms on abiotic and biotic surfaces. The limitations of the therapeutic options against Klebsiella pneumoniae is actually due to its innate capabilities to form biofilm and harboring determinants of multidrug resistance. We utilized a newer approach for classification of biofilm producing Klebsiella pneumoniae isolates and subsequently we evaluated the chemistry of its slime, more accurately its biofilm. We extracted and determined the amount of polysaccharides and proteins from representative bacterial biofilms. The spatial distribution of sugars and proteins were then investigated in the biofilm matrix using confocal laser scanning microscopy (CLSM). Thereafter, the extracted matrix components were subjected to sophisticated analysis incorporating Fourier transform infrared (FTIR) spectroscopy, nuclear magnetic resonance (NMR) spectroscopy, one-dimensional gel-based electrophoresis (SDS-PAGE), high performance liquid chromatography (HPLC), and MALDI MS/MS analysis. Besides, the quantification of its total proteins, total sugars, uronates, total acetyl content was also done. Results suggest sugars are not the only/major constituent of its biofilms. The proteins were harvested and subjected to SDS-PAGE which revealed various common and unique protein bands. The common band was excised and analyzed by HPLC. MALDI MS/MS results of this common protein band indicated the presence of different proteins within the biofilm. The 55 different proteins were identified including both cytosolic and membrane proteins. About 22 proteins were related to protein synthesis and processing while 15 proteins were identified related to virulence. Similarly, proteins related to energy and metabolism were 8 and those related to capsule and cell wall synthesis were 4. These results will improve our understanding of Klebsiella biofilm composition and will further help us design better strategies for controlling its biofilm such as techniques focused on weakening/targeting certain portions of the slime which is the most common building block of the biofilm matrix

    Analysis and prediction of cancerlectins using evolutionary and domain information

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Predicting the function of a protein is one of the major challenges in the post-genomic era where a large number of protein sequences of unknown function are accumulating rapidly. Lectins are the proteins that specifically recognize and bind to carbohydrate moieties present on either proteins or lipids. Cancerlectins are those lectins that play various important roles in tumor cell differentiation and metastasis. Although the two types of proteins are linked, still there is no computational method available that can distinguish cancerlectins from the large pool of non-cancerlectins. Hence, it is imperative to develop a method that can distinguish between cancer and non-cancerlectins.</p> <p>Results</p> <p>All the models developed in this study are based on a non-redundant dataset containing 178 cancerlectins and 226 non-cancerlectins in which no two sequences have more than 50% sequence similarity. We have applied the similarity search based technique, i.e. BLAST, and achieved a maximum accuracy of 43.25%. The amino acids compositional analysis have shown that certain residues (e.g. Leucine, Proline) were preferred in cancerlectins whereas some other (e.g. Asparatic acid, Asparagine) were preferred in non-cancerlectins. It has been found that the PROSITE domain "Crystalline beta gamma" was abundant in cancerlectins whereas domains like "SUEL-type lectin domain" were found mainly in non-cancerlectins. An SVM-based model has been developed to differentiate between the cancer and non-cancerlectins which achieved a maximum Matthew's correlation coefficient (MCC) value of 0.32 with an accuracy of 64.84%, using amino acid compositions. We have developed a model based on dipeptide compositions which achieved an MCC value of 0.30 with an accuracy of 64.84%. Thereafter, we have developed models based on split compositions (2 and 4 parts) and achieved an MCC value of 0.31, 0.32 with accuracies of 65.10% and 66.09%, respectively. An SVM model based on Position Specific Scoring Matrix (PSSM), generated by PSI-BLAST, was developed and achieved an MCC value of 0.36 with an accuracy of 68.34%. Finally, we have integrated the PROSITE domain information with PSSM and developed an SVM model that has achieved an MCC value of 0.38 with 69.09% accuracy.</p> <p>Conclusion</p> <p>BLAST has been found inefficient to distinguish between cancer and non-cancerlectins. We analyzed the protein sequences of cancer and non-cancerlectins and identified interesting patterns. We have been able to identify PROSITE domains that are preferred in cancer and non-cancerlectins and thus provided interesting insights into the two types of proteins. The method developed in this study will be useful for researchers studying cancerlectins, lectins and cancer biology. The web-server based on the above study, is available at <url>http://www.imtech.res.in/raghava/cancer_pred/</url></p

    RASSF1A uncouples Wnt from Hippo signalling and promotes YAP mediated differentiation via p73

    Get PDF
    Transition from pluripotency to differentiation is a pivotal yet poorly understood developmental step. Here, we show that the tumour suppressor RASSF1A is a key player driving the early specification of cell fate. RASSF1A acts as a natural barrier to stem cell self-renewal and iPS cell generation, by switching YAP from an integral component in the Ī²-catenin-TCF pluripotency network to a key factor that promotes differentiation. We demonstrate that epigenetic regulation of the Rassf1A promoter maintains stemness by allowing a quaternary association of YAPā€“TEAD and Ī²-cateninā€“TCF3 complexes on the Oct4 distal enhancer. However, during differentiation, promoter demethylation allows GATA1-mediated RASSF1A expression which prevents YAP from contributing to the TEAD/Ī²-cateninā€“TCF3 complex. Simultaneously, we find that RASSF1A promotes a YAPā€“p73 transcriptional programme that enables differentiation. Together, our findings demonstrate that RASSF1A mediates transcription factor selection of YAP in stem cells, thereby acting as a functional ā€œswitchā€ between pluripotency and initiation of differentiation

    <i>In silico</i> Platform for Prediction of N-, O- and C-Glycosites in Eukaryotic Protein Sequences

    Get PDF
    <div><p>Glycosylation is one of the most abundant and an important post-translational modification of proteins. Glycosylated proteins (glycoproteins) are involved in various cellular biological functions like protein folding, cell-cell interactions, cell recognition and host-pathogen interactions. A large number of eukaryotic glycoproteins also have therapeutic and potential technology applications. Therefore, characterization and analysis of glycosites (glycosylated residues) in these proteins is of great interest to biologists. In order to cater these needs a number of <i>in silico</i> tools have been developed over the years, however, a need to get even better prediction tools remains. Therefore, in this study we have developed a new webserver GlycoEP for more accurate prediction of N-linked, O-linked and C-linked glycosites in eukaryotic glycoproteins using two larger datasets, namely, standard and advanced datasets. In case of standard datasets no two glycosylated proteins are more similar than 40%; advanced datasets are highly non-redundant where no two glycositesā€™ patterns (as defined in methods) have more than 60% similarity. Further, based on our results with several algorihtms developed using different machine-learning techniques, we found Support Vector Machine (SVM) as optimum tool to develop glycosite prediction models. Accordingly, using our more stringent and non-redundant advanced datasets, the SVM based models developed in this study achieved a prediction accuracy of 84.26%, 86.87% and 91.43% with corresponding MCC of 0.54, 0.20 and 0.78, for N-, O- and C-linked glycosites, respectively. The best performing models trained on advanced datasets were then implemented as a user-friendly web server GlycoEP (<a href="http://www.imtech.res.in/raghava/glycoep/" target="_blank">http://www.imtech.res.in/raghava/glycoep/</a>). Additionally, this server provides prediction models developed on standard datasets and allows users to scan sequons in input protein sequences.</p></div
    • ā€¦
    corecore