8,922 research outputs found

    Predicting zinc binding at the proteome level

    Get PDF
    BACKGROUND: Metalloproteins are proteins capable of binding one or more metal ions, which may be required for their biological function, for regulation of their activities or for structural purposes. Metal-binding properties remain difficult to predict as well as to investigate experimentally at the whole-proteome level. Consequently, the current knowledge about metalloproteins is only partial. RESULTS: The present work reports on the development of a machine learning method for the prediction of the zinc-binding state of pairs of nearby amino-acids, using predictors based on support vector machines. The predictor was trained using chains containing zinc-binding sites and non-metalloproteins in order to provide positive and negative examples. Results based on strong non-redundancy tests prove that (1) zinc-binding residues can be predicted and (2) modelling the correlation between the binding state of nearby residues significantly improves performance. The trained predictor was then applied to the human proteome. The present results were in good agreement with the outcomes of previous, highly manually curated, efforts for the identification of human zinc-binding proteins. Some unprecedented zinc-binding sites could be identified, and were further validated through structural modelling. The software implementing the predictor is freely available at: CONCLUSION: The proposed approach constitutes a highly automated tool for the identification of metalloproteins, which provides results of comparable quality with respect to highly manually refined predictions. The ability to model correlations between pairwise residues allows it to obtain a significant improvement over standard 1D based approaches. In addition, the method permits the identification of unprecedented metal sites, providing important hints for the work of experimentalists

    Proteomic profile of KSR1-regulated signalling in response to genotoxic agents in breast cancer

    Get PDF
    Kinase suppressor of Ras 1 (KSR1) has been implicated in tumorigenesis in multiple cancers, including skin, pancreatic and lung carcinomas. However, our recent study revealed a role of KSR1 as a tumour suppressor in breast cancer, the expression of which is potentially correlated with chemotherapy response. Here, we aimed to further elucidate the KSR1-regulated signalling in response to genotoxic agents in breast cancer. Stable isotope labelling by amino acids in cell culture (SILAC) coupled to high-resolution mass spectrometry (MS) was implemented to globally characterise cellular protein levels induced by KSR1 in the presence of doxorubicin or etoposide. The acquired proteomic signature was compared and GO-STRING analysis was subsequently performed to illustrate the activated functional signalling networks. Furthermore, the clinical associations of KSR1 with identified targets and their relevance in chemotherapy response were examined in breast cancer patients. We reveal a comprehensive repertoire of thousands of proteins identified in each dataset and compare the unique proteomic profiles as well as functional connections modulated by KSR1 after doxorubicin (Doxo-KSR1) or etoposide (Etop-KSR1) stimulus. From the up-regulated top hits, several proteins, including STAT1, ISG15 and TAP1 are also found to be positively associated with KSR1 expression in patient samples. Moreover, high KSR1 expression, as well as high abundance of these proteins, is correlated with better survival in breast cancer patients who underwent chemotherapy. In aggregate, our data exemplify a broad functional network conferred by KSR1 with genotoxic agents and highlight its implication in predicting chemotherapy response in breast cancer

    DNA-binding protein prediction using plant specific support vector machines:validation and application of a new genome annotation tool

    Get PDF
    There are currently 151 plants with draft genomes available but levels of functional annotation for putative protein products are low. Therefore, accurate computational predictions are essential to annotate genomes in the first instance, and to provide focus for the more costly and time consuming functional assays that follow. DNA-binding proteins are an important class of proteins that require annotation, but current computational methods are not applicable for genome wide predictions in plant species. Here, we explore the use of species and lineage specific models for the prediction of DNA-binding proteins in plants. We show that a species specific support vector machine model based on Arabidopsis sequence data is more accurate (accuracy 81%) than a generic model (74%), and based on this we develop a plant specific model for predicting DNA-binding proteins. We apply this model to the tomato proteome and demonstrate its ability to perform accurate high-throughput prediction of DNA-binding proteins. In doing so, we have annotated 36 currently uncharacterised proteins by assigning a putative DNA-binding function. Our model is publically available and we propose it be used in combination with existing tools to help increase annotation levels of DNA-binding proteins encoded in plant genomes

    Peptide vocabulary analysis reveals ultra-conservation and homonymity in protein sequences

    Get PDF
    A new algorithm is presented for vocabulary analysis (word detection) in texts of human origin. It performs at 60%–70% overall accuracy and greater than 80% accuracy for longer words, and approximately 85% sensitivity on Alice in Wonderland, a considerable improvement on previous methods. When applied to protein sequences, it detects short sequences analogous to words in human texts, i.e. intolerant to changes in spelling (mutation), and relatively contextindependent in their meaning (function). Some of these are homonyms of up to 7 amino acids, which can assume different structures in different proteins. Others are ultra-conserved stretches of up to 18 amino acids within proteins of less than 40% overall identity, reflecting extreme constraint or convergent evolution. Different species are found to have qualitatively different major peptide vocabularies, e.g. some are dominated by large gene families, while others are rich in simple repeats or dominated by internally repetitive proteins. This suggests the possibility of a peptide vocabulary signature, analogous to genome signatures in DNA. Homonyms may be useful in detecting convergent evolution and positive selection in protein evolution. Ultra-conserved words may be useful in identifying structures intolerant to substitution over long periods of evolutionary time

    Yeast optimizes metal utilization based on metabolic network and enzyme kinetics

    Get PDF
    Metal ions are vital to metabolism, as they can act as cofactors on enzymes and thus modulate individual enzymatic reactions. Although many enzymes have been reported to interact with metal ions, the quantitative relationships between metal ions and metabolism are lacking. Here, we reconstructed a genome-scale metabolic model of the yeast Saccharomyces cerevisiae to account for proteome constraints and enzyme cofactors such as metal ions, named CofactorYeast. The model is able to estimate abundances of metal ions binding on enzymes in cells under various conditions, which are comparable to measured metal ion contents in biomass. In addition, the model predicts distinct metabolic flux distributions in response to reduced levels of various metal ions in the medium. Specifically, the model reproduces changes upon iron deficiency in metabolic and gene expression levels, which could be interpreted by optimization principles (i.e., yeast optimizes iron utilization based on metabolic network and enzyme kinetics rather than preferentially targeting iron to specific enzymes or pathways). At last, we show the potential of using the model for understanding cell factories that harbor heterologous iron-containing enzymes to synthesize high-value compounds such as p-coumaric acid. Overall, the model demonstrates the dependence of enzymes on metal ions and links metal ions to metabolism on a genome scale

    Functional Diversity and Structural Disorder in the Human Ubiquitination Pathway

    Get PDF
    The ubiquitin-proteasome system plays a central role in cellular regulation and protein quality control (PQC). The system is built as a pyramid of increasing complexity, with two E1 (ubiquitin activating), few dozen E2 (ubiquitin conjugating) and several hundred E3 (ubiquitin ligase) enzymes. By collecting and analyzing E3 sequences from the KEGG BRITE database and literature, we assembled a coherent dataset of 563 human E3s and analyzed their various physical features. We found an increase in structural disorder of the system with multiple disorder predictors (IUPred - E1: 5.97%, E2: 17.74%, E3: 20.03%). E3s that can bind E2 and substrate simultaneously (single subunit E3, ssE3) have significantly higher disorder (22.98%) than E3s in which E2 binding (multi RING-finger, mRF, 0.62%), scaffolding (6.01%) and substrate binding (adaptor/substrate recognition subunits, 17.33%) functions are separated. In ssE3s, the disorder was localized in the substrate/adaptor binding domains, whereas the E2-binding RING/HECT-domains were structured. To demonstrate the involvement of disorder in E3 function, we applied normal modes and molecular dynamics analyses to show how a disordered and highly flexible linker in human CBL (an E3 that acts as a regulator of several tyrosine kinase-mediated signalling pathways) facilitates long-range conformational changes bringing substrate and E2-binding domains towards each other and thus assisting in ubiquitin transfer. E3s with multiple interaction partners (as evidenced by data in STRING) also possess elevated levels of disorder (hubs, 22.90% vs. non-hubs, 18.36%). Furthermore, a search in PDB uncovered 21 distinct human E3 interactions, in 7 of which the disordered region of E3s undergoes induced folding (or mutual induced folding) in the presence of the partner. In conclusion, our data highlights the primary role of structural disorder in the functions of E3 ligases that manifests itself in the substrate/adaptor binding functions as well as the mechanism of ubiquitin transfer by long-range conformational transitions. © 2013 Bhowmick et al

    Binding Site Prediction for Protein-Protein Interactions and Novel Motif Discovery using Re-occurring Polypeptide Sequences

    Get PDF
    Background: While there are many methods for predicting protein-protein interaction, very few can determine the specific site of interaction on each protein. Characterization of the specific sequence regions mediating interaction (binding sites) is crucial for an understanding of cellular pathways. Experimental methods often report false binding sites due to experimental limitations, while computational methods tend to require data which is not available at the proteome-scale. Here we present PIPE-Sites, a novel method of protein specific binding site prediction based on pairs of re-occurring polypeptide sequences, which have been previously shown to accurately predict proteinprotein interactions. PIPE-Sites operates at high specificity and requires only the sequences of query proteins and a database of known binary interactions with no binding site data, making it applicable to binding site prediction at the proteome-scale. Results: PIPE-Sites was evaluated using a dataset of 265 yeast and 423 human interacting proteins pairs with experimentally-determined binding sites. We found that PIPE-Sites predictions were closer to the confirmed binding site than those of two existing binding site prediction methods based on domain-domain interactions, when applied to the same dataset. Finally, we applied PIPE-Sites to two datasets of 2347 yeast and 14,438 human novel interacting protein pairs predicted to interact with high confidence. An analysis of the predicted interaction sites revealed a number of protein subsequences which are highly re-occurring in binding sites and which may represent novel binding motifs. Conclusions: PIPE-Sites is an accurate method for predicting protein binding sites and is applicable to the proteome-scale. Thus, PIPE-Sites could be useful for exhaustive analysis of protein binding patterns in whole proteomes as well as discovery of novel binding motifs. PIPE-Sites is available online a

    Differential Proteomic Analysis of Human Saliva using Tandem Mass Tags Quantification for Gastric Cancer Detection.

    Get PDF
    Novel biomarkers and non-invasive diagnostic methods are urgently needed for the screening of gastric cancer to reduce its high mortality. We employed quantitative proteomics approach to develop discriminatory biomarker signatures from human saliva for the detection of gastric cancer. Salivary proteins were analyzed and compared between gastric cancer patients and matched control subjects by using tandem mass tags (TMT) technology. More than 500 proteins were identified with quantification, and 48 of them showed significant difference expression (p < 0.05) between normal controls and gastric cancer patients, including 7 up-regulated proteins and 41 down-regulated proteins. Five proteins were selected for initial verification by ELISA and three were successfully verified, namely cystatin B (CSTB), triosephosphate isomerase (TPI1), and deleted in malignant brain tumors 1 protein (DMBT1). All three proteins could differentiate gastric cancer patients from normal control subjects, dramatically (p < 0.05). The combination of these three biomarkers could reach 85% sensitivity and 80% specificity for the detection of gastric cancer with accuracy of 0.93. This study provides the proof of concept of salivary biomarkers for the non-invasive detection of gastric cancer. It is highly encouraging to turn these biomarkers into an applicable clinical test after large scale validation
    corecore