44 research outputs found
Prediction and classification of aminoacyl tRNA synthetases using PROSITE domains
<p>Abstract</p> <p>Background</p> <p>Aminoacyl tRNA synthetases (aaRSs) catalyse the first step of protein synthesis in all organisms. They are responsible for the precise attachment of amino acids to their cognate transfer RNAs. There are twenty different types of aaRSs, unique for each amino acid. These aaRSs have been divided into two classes, each comprising ten enzymes. It is important to predict and classify aaRSs in order to understand protein synthesis.</p> <p>Results</p> <p>In this study, all models were developed on a non-redundant dataset containing 117 aaRSs and an equal number of non-aaRSs, in which no two sequences have more than 30% similarity. First, we applied the similarity search technique, BLAST, and achieved a maximum accuracy of 67.52%. We observed that 62% of tRNA synthetases contain one or more domains from amongst the following four PROSITE domains: PS50862, PS00178, PS50860 and PS50861. An SVM-based model was developed to discriminate between aaRSs, and non-aaRSs, and achieved a maximum MCC of 0.68 with accuracy of 83.73%, using selective dipeptide composition. We developed a hybrid approach and achieved a maximum MCC of 0.72 with accuracy of 85.49%, where SVM model developed using selected dipeptide composition and information of four PROSITE domains. We further developed an SVM-based model for classifying the aaRSs into class-1 and class-2, using selective dipeptide composition and achieved an MCC of 0.79. We also observed that two domains (PS00178, PS50889) in class-1 and three domains (PS50862, PS50860, PS50861) in class-2 were preferred. A hybrid method was developed using these domains as descriptor, along with selected dipeptide composition, and achieved an MCC of 0.87 with a sensitivity of 94.55% and an accuracy of 93.19%. All models were evaluated using a five-fold cross-validation technique.</p> <p>Conclusions</p> <p>We have analyzed protein sequences of aaRSs (class-1 and class-2) and non-aaRSs and identified interesting patterns. The high accuracy achieved by our SVM models using selected dipeptide composition demonstrates that certain types of dipeptide are preferred in aaRSs. We were able to identify PROSITE domains that are preferred in aaRSs and their classes, providing interesting insights into tRNA synthetases. The method developed in this study will be useful for researchers studying aaRS enzymes and tRNA biology. The web-server based on the above study, is available at <url>http://www.imtech.res.in/raghava/icaars/</url>.</p
Correlation and prediction of gene expression level from amino acid and dipeptide composition of its protein
BACKGROUND: A large number of papers have been published on analysis of microarray data with particular emphasis on normalization of data, detection of differentially expressed genes, clustering of genes and regulatory network. On other hand there are only few studies on relation between expression level and composition of nucleotide/protein sequence, using expression data. There is a need to understand why particular genes/proteins express more in particular conditions. In this study, we analyze 3468 genes of Saccharomyces cerevisiae obtained from Holstege et al., (1998) to understand the relationship between expression level and amino acid composition. RESULTS: We compute the correlation between expression of a gene and amino acid composition of its protein. It was observed that some residues (like Ala, Gly, Arg and Val) have significant positive correlation (r > 0.20) and some other residues (Like Asp, Leu, Asn and Ser) have negative correlation (r < -0.15) with the expression of genes. A significant negative correlation (r = -0.18) was also found between length and gene expression. These observations indicate the relationship between percent composition and gene expression level. Thus, attempts have been made to develop a Support Vector Machine (SVM) based method for predicting the expression level of genes from its protein sequence. In this method the SVM is trained with proteins whose gene expression data is known in a given condition. Then trained SVM is used to predict the gene expression of other proteins of the same organism in the same condition. A correlation coefficient r = 0.70 was obtained between predicted and experimentally determined expression of genes, which improves from r = 0.70 to 0.72 when dipeptide composition was used instead of residue composition. The method was evaluated using 5-fold cross validation test. We also demonstrate that amino acid composition information along with gene expression data can be used for improving the function classification of proteins. CONCLUSION: There is a correlation between gene expression and amino acid composition that can be used to predict the expression level of genes up to a certain extent. A web server based on the above strategy has been developed for calculating the correlation between amino acid composition and gene expression and prediction of expression level . This server will allow users to study the evolution from expression data
Identification of conformational B-cell Epitopes in an antigen from its primary sequence
Background: One of the major challenges in the field of vaccine design is to predict conformational B-cell epitopes in an antigen. In the past, several methods have been developed for predicting conformational B-cell epitopes in an antigen from its tertiary structure. This is the first attempt in this area to predict conformational B-cell epitope in an antigen from its amino acid sequence. Results: All Support vector machine (SVM) models were trained and tested on 187 non-redundant protein chains consisting of 2261 antibody interacting residues of B-cell epitopes. Models have been developed using binary profile of pattern (BPP) and physiochemical profile of patterns (PPP) and achieved a maximum MCC of 0.22 and 0.17 respectively. In this study, for the first time SVM model has been developed using composition profile of patterns (CPP) and achieved a maximum MCC of 0.73 with accuracy 86.59%. We compare our CPP based model with existing structure based methods and observed that our sequence based model is as good as structure based methods. Conclusion: This study demonstrates that prediction of conformational B-cell epitope in an antigen is possible from is primary sequence. This study will be very useful in predicting conformational B-cell epitopes in antigens whose tertiary structures are not available. A web server CBTOPE has been developed for predicting B-cell epitope http://www.imtech.res.in/raghava/cbtope/
MHCBN 4.0: A database of MHC/TAP binding peptides and T-cell epitopes
<p>Abstract</p> <p>Background</p> <p>Many databases housing the information about MHC binders and non-binders have been developed in the past to help the scientific community working in the field of immunology, immune-informatics or vaccine design. As the information about these MHC binding and non-binding peptides continues to grow with the time and there is a need to keep the databases updated. So, in order to provide the immunological fraternity with the most recent information we need to maintain and update our database regularly. In this paper, we describe the updated version of 4.0 of the database MHCBN.</p> <p>Findings</p> <p>MHCBN is a comprehensive database comprising over 25,857 peptide sequences (1053 TAP binding peptides), whose binding affinity with either MHC or TAP molecules has been assayed experimentally. It is a manually curated database where entries are collected & compiled from published literature and existing immunological public databases. MHCBN has a number of web-based tools for the analysis and retrieval of information like mapping of antigenic regions, creation of allele specific dataset, BLAST search, various diseases associated with MHC alleles etc. Further, all entries are hyper linked to major databases like SWISS-PROT, PDB etc. to provide the information beyond the scope of MHCBN. The latest version 4.0 of MHCBN has 6080 more entries than previously published version 1.1.</p> <p>Conclusion</p> <p>MHCBN database updating is meant to facilitate immunologist in understanding the immune system and provide them the latest information. We feel that our database will complement the existing databases in serving scientific community.</p
Virtual Screening of potential drug-like inhibitors against Lysine/DAP pathway of Mycobacterium tuberculosis
Background: An explosive global spreading of multidrug resistant Mycobacterium tuberculosis (Mtb) is a catastrophe, which demands an urgent need to design or develop novel/potent antitubercular agents. The Lysine/DAP biosynthetic pathway is a promising target due its specific role in cell wall and amino acid biosynthesis. Here, we report identification of potential antitubercular candidates targeting Mtb dihydrodipicolinate synthase (DHDPS) enzyme of the pathway using virtual screening protocols. Results: In the present study, we generated three sets of drug-like molecules in order to screen potential inhibitors against Mtb drug target DHDPS. The first set of compounds was a combinatorial library, which comprised analogues of pyruvate (substrate of DHDPS). The second set of compounds consisted of pyruvate-like molecules i.e. structurally similar to pyruvate, obtained using 3D flexible similarity search against NCI and PubChem database. The third set constituted 3847 anti-infective molecules obtained from PubChem. These compounds were subjected to Lipinski's rule of drug-like five filters. Finally, three sets of drug-like compounds i.e. 4088 pyruvate analogues, 2640 pyruvate-like molecules and 1750 anti-infective molecules were docked at the active site of Mtb DHDPS (PDB code: 1XXX used in the molecular docking calculations) to select inhibitors establishing favorable interactions. Conclusion: The above-mentioned virtual screening procedures helped in the identification of several potent candidates that possess inhibitory activity against Mtb DHDPS. Therefore, these novel scaffolds/candidates which could have the potential to inhibit Mtb DHDPS enzyme would represent promising starting points as lead compounds and certainly aid the experimental designing of antituberculars in lesser time
Bcipep: A database of B-cell epitopes
BACKGROUND: Bcipep is a database of experimentally determined linear B-cell epitopes of varying immunogenicity collected from literature and other publicly available databases. RESULTS: The current version of Bcipep database contains 3031 entries that include 763 immunodominant, 1797 immunogenic and 471 null-immunogenic epitopes. It covers a wide range of pathogenic organisms like viruses, bacteria, protozoa, and fungi. The database provides a set of tools for the analysis and extraction of data that includes keyword search, peptide mapping and BLAST search. It also provides hyperlinks to various databases such as GenBank, PDB, SWISS-PROT and MHCBN. CONCLUSION: A comprehensive database of B-cell epitopes called Bcipep has been developed that covers information on epitopes from a wide range of pathogens. The Bcipep will be source of information for investigators involved in peptide-based vaccine design, disease diagnosis and research in allergy. It should also be a promising data source for the development and evaluation of methods for prediction of B-cell epitopes. The database is available at
KiDoQ: using docking based energy scores to develop ligand based model for predicting antibacterials
Background: Identification of novel drug targets and their inhibitors is a major challenge in the field of drug designing and development. Diaminopimelic acid (DAP) pathway is a unique lysine biosynthetic pathway present in bacteria, however absent in mammals. This pathway is vital for bacteria due to its critical role in cell wall biosynthesis. One of the essential enzymes of this pathway is dihydrodipicolinate synthase (DHDPS), considered to be crucial for the bacterial survival. In view of its importance, the development and prediction of potent inhibitors against DHDPS may be valuable to design effective drugs against bacteria, in general. Results: This paper describes a methodology for predicting novel/potent inhibitors against DHDPS. Here, quantitative structure activity relationship (QSAR) models were trained and tested on experimentally verified 23 enzyme's inhibitors having inhibitory value (Ki) in the range of 0.005-22(mM). These inhibitors were docked at the active site of DHDPS (1YXD) using AutoDock software, which resulted in 11 energy-based descriptors. For QSAR modeling, Multiple Linear Regression (MLR) model was engendered using best four energy-based descriptors yielding correlation values R/q2 of 0.82/0.67 and MAE of 2.43. Additionally, Support Vector Machine (SVM) based model was developed with three crucial descriptors selected using F-stepping remove-one approach, which enhanced the performance by attaining R/q2 values of 0.93/0.80 and MAE of 1.89. To validate the performance of QSAR models, external cross-validation procedure was adopted which accomplished high training/testing correlation values (q2/r2) in the range of 0.78-0.83/0.93-0.95. Conclusions: Our results suggests that ligand-receptor binding interactions for DHDPS employing QSAR modeling seems to be a promising approach for prediction of antibacterial agents. To serve the experimentalist to develop novel/potent inhibitors, a webserver "KiDoQ" has been developed http://crdd.osdd.net/raghava/kidoq webcite, which allows the prediction of Ki value of a new ligand molecule against DHDPS
Support Vector Machine-based method for predicting subcellular localization of mycobacterial proteins using evolutionary information and motifs
<p>Abstract</p> <p>Background</p> <p>In past number of methods have been developed for predicting subcellular location of eukaryotic, prokaryotic (Gram-negative and Gram-positive bacteria) and human proteins but no method has been developed for mycobacterial proteins which may represent repertoire of potent immunogens of this dreaded pathogen. In this study, attempt has been made to develop method for predicting subcellular location of mycobacterial proteins.</p> <p>Results</p> <p>The models were trained and tested on 852 mycobacterial proteins and evaluated using five-fold cross-validation technique. First SVM (Support Vector Machine) model was developed using amino acid composition and overall accuracy of 82.51% was achieved with average accuracy (mean of class-wise accuracy) of 68.47%. In order to utilize evolutionary information, a SVM model was developed using PSSM (Position-Specific Scoring Matrix) profiles obtained from PSI-BLAST (Position-Specific Iterated BLAST) and overall accuracy achieved was of 86.62% with average accuracy of 73.71%. In addition, HMM (Hidden Markov Model), MEME/MAST (Multiple Em for Motif Elicitation/Motif Alignment and Search Tool) and hybrid model that combined two or more models were also developed. We achieved maximum overall accuracy of 86.8% with average accuracy of 89.00% using combination of PSSM based SVM model and MEME/MAST. Performance of our method was compared with that of the existing methods developed for predicting subcellular locations of Gram-positive bacterial proteins.</p> <p>Conclusion</p> <p>A highly accurate method has been developed for predicting subcellular location of mycobacterial proteins. This method also predicts very important class of proteins that is membrane-attached proteins. This method will be useful in annotating newly sequenced or hypothetical mycobacterial proteins. Based on above study, a freely accessible web server TBpred http://www.imtech.res.in/raghava/tbpred/ has been developed.</p
AntiBP2: improved version of antibacterial peptide prediction
<p>Abstract</p> <p>Background</p> <p>Antibacterial peptides are one of the effecter molecules of innate immune system. Over the last few decades several antibacterial peptides have successfully approved as drug by FDA, which has prompted an interest in these antibacterial peptides. In our recent study we analyzed 999 antibacterial peptides, which were collected from Antibacterial Peptide Database (APD). We have also developed methods to predict and classify these antibacterial peptides using Support Vector Machine (SVM).</p> <p>Results</p> <p>During analysis we observed that certain residues are preferred over other in antibacterial peptide, particularly at the N and C terminus. These observation and increased data of antibacterial peptide in APD encouraged us to again develop a new and more robust method for predicting antibacterial peptides in protein from their amino acid sequence or given peptide have antibacterial properties or not. First, the binary patterns of the 15 N terminus residues were used for predicting antibacterial peptide using SVM and achieved accuracy of 85.46% with 0.705 Mathew's Correlation Coefficient (MCC). Then we used the binary pattern of 15 C terminus residues and achieved accuracy of 85.05% with 0.701 MCC, latter on we developed prediction method by combining N & C terminus and achieved an accuracy of 91.64% with 0.831 MCC. Finally we developed SVM based model using amino acid composition of whole peptide and achieved 92.14% accuracy with MCC 0.843. In this study we used five-fold cross validation technique to develop all these models and tested the performance of these models on an independent dataset. We further classify antibacterial peptides according to their sources and achieved an overall accuracy of 98.95%. We further classify antibacterial peptides in their respective family and got a satisfactory result.</p> <p>Conclusion</p> <p>Among antibacterial peptides, there is preference for certain residues at N and C terminus, which helps to discriminate them from non-antibacterial peptides. Amino acid composition of antibacterial peptides helps to demarcate them from non-antibacterial peptide and their further classification in source and family. Antibp2 will be helpful in discovering efficacious antibacterial peptide, which we hope will be helpful against antibiotics resistant bacteria. We also developed user friendly web server for the biological community.</p
Identification of DNA-binding proteins using support vector machines and evolutionary profiles
<p>Abstract</p> <p>Background</p> <p>Identification of DNA-binding proteins is one of the major challenges in the field of genome annotation, as these proteins play a crucial role in gene-regulation. In this paper, we developed various SVM modules for predicting DNA-binding domains and proteins. All models were trained and tested on multiple datasets of non-redundant proteins.</p> <p>Results</p> <p>SVM models have been developed on DNAaset, which consists of 1153 DNA-binding and equal number of non DNA-binding proteins, and achieved the maximum accuracy of 72.42% and 71.59% using amino acid and dipeptide compositions, respectively. The performance of SVM model improved from 72.42% to 74.22%, when evolutionary information in form of PSSM profiles was used as input instead of amino acid composition. In addition, SVM models have been developed on DNAset, which consists of 146 DNA-binding and 250 non-binding chains/domains, and achieved the maximum accuracy of 79.80% and 86.62% using amino acid composition and PSSM profiles. The SVM models developed in this study perform better than existing methods on a blind dataset.</p> <p>Conclusion</p> <p>A highly accurate method has been developed for predicting DNA-binding proteins using SVM and PSSM profiles. This is the first study in which evolutionary information in form of PSSM profiles has been used successfully for predicting DNA-binding proteins. A web-server DNAbinder has been developed for identifying DNA-binding proteins and domains from query amino acid sequences <url>http://www.imtech.res.in/raghava/dnabinder/</url>.</p