70 research outputs found

    NSort/DB: an intra-nuclear compartment protein database

    Get PDF
    Distinct substructures within the nucleus are associated with a wide variety of important nuclear processes. Structures such as chromatin and nuclear pores have specific roles, while others such as Cajal bodies are more functionally varied. Understanding the roles of these membraneless intra-nuclear compartments requires extensive data sets covering nuclear and compartment-associated proteins. NSort/DB is a database providing access to intra- or sub-nuclear compartment associations for the mouse nuclear proteome. Based on resources ranging from large-scale curated data sets to detailed experiments, this data set provides a high-quality set of annotations of non-exclusive association of nuclear proteins with structures such as promyelocytic leukaemia bodies and chromatin. The database is searchable by protein identifier or compartment, and has a documented web service API. The search interface, web service and data download are all freely available online at http://www.nsort.org/db/. Availability of this data set will enable systematic analyses of the protein complements of nuclear compartments, improving our understanding of the diverse functional repertoire of these structures

    Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction

    Get PDF
    BACKGROUND: The accomplishment of the various genome sequencing projects resulted in accumulation of massive amount of gene sequence information. This calls for a large-scale computational method for predicting protein localization from sequence. The protein localization can provide valuable information about its molecular function, as well as the biological pathway in which it participates. The prediction of localization of a protein at subnuclear level is a challenging task. In our previous work we proposed an SVM-based system using protein sequence information for this prediction task. In this work, we assess protein similarity with Gene Ontology (GO) and then improve the performance of the system by adding a module of nearest neighbor classifier using a similarity measure derived from the GO annotation terms for protein sequences. RESULTS: The performance of the new system proposed here was compared with our previous system using a set of proteins resided within 6 localizations collected from the Nuclear Protein Database (NPD). The overall MCC (accuracy) is elevated from 0.284 (50.0%) to 0.519 (66.5%) for single-localization proteins in leave-one-out cross-validation; and from 0.420 (65.2%) to 0.541 (65.2%) for an independent set of multi-localization proteins. The new system is available at . CONCLUSION: The prediction of protein subnuclear localizations can be largely influenced by various definitions of similarity for a pair of proteins based on different similarity measures of GO terms. Using the sum of similarity scores over the matched GO term pairs for two proteins as the similarity definition produced the best predictive outcome. Substantial improvement in predicting protein subnuclear localizations has been achieved by combining Gene Ontology with sequence information

    Towards defining the nuclear proteome

    Get PDF
    Direct evidence is reported for 2,568 mammalian proteins within the nuclear proteome, consisting of at least 14% of the entire proteome

    The proteins of intra-nuclear bodies: a data-driven analysis of sequence, interaction and expression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Cajal bodies, nucleoli, PML nuclear bodies, and nuclear speckles are morpohologically distinct intra-nuclear structures that dynamically respond to cellular cues. Such nuclear bodies are hypothesized to play important regulatory roles, e.g. by sequestering and releasing transcription factors in a timely manner. While the nucleolus and nuclear speckles have received more attention experimentally, the PML nuclear body and the Cajal body are still incompletely characterized in terms of their roles and protein complement.</p> <p>Results</p> <p>By collating recent experimentally verified data, we find that almost 1000 proteins in the mouse nuclear proteome are known to associate with one or more of the nuclear bodies. Their gene ontology terms highlight their regulatory roles: splicing is confirmed to be a core activity of speckles and PML nuclear bodies house a range of proteins involved in DNA repair. We train support-vector machines to show that nuclear proteins contain discriminative sequence features that can be used to identify their intra-nuclear body associations. Prediction accuracy is highest for nucleoli and nuclear speckles. The trained models are also used to estimate the full protein complement of each nuclear body. Protein interactions are found primarily to link proteins in the nuclear speckles with proteins from other compartments. Cell cycle expression data provide support for increased activity in nucleoli, nuclear speckles and PML nuclear bodies especially during S and G<sub>2 </sub>phases.</p> <p>Conclusions</p> <p>The large-scale analysis of the mouse nuclear proteome sheds light on the <it>functional </it>organization of <it>physically </it>embodied intra-nuclear compartments. We observe partial support for the hypothesis that the physical organization of the nucleus mirrors functional modularity. However, we are unable to unambiguously identify proteins' intra-nuclear destination, suggesting that critical drivers behind of intra-nuclear translocation are yet to be identified.</p

    An SVM-based system for predicting protein subnuclear localizations

    Get PDF
    BACKGROUND: The large gap between the number of protein sequences in databases and the number of functionally characterized proteins calls for the development of a fast computational tool for the prediction of subnuclear and subcellular localizations generally applicable to protein sequences. The information on localization may reveal the molecular function of novel proteins, in addition to providing insight on the biological pathways in which they function. The bulk of past work has been focused on protein subcellular localizations. Furthermore, no specific tool has been dedicated to prediction at the subnuclear level, despite its high importance. In order to design a suitable predictive system, the extraction of subtle sequence signals that can discriminate among proteins with different subnuclear localizations is the key. RESULTS: New kernel functions used in a support vector machine (SVM) learning model are introduced for the measurement of sequence similarity. The k-peptide vectors are first mapped by a matrix of high-scored pairs of k-peptides which are measured by BLOSUM62 scores. The kernels, measuring the similarity for sequences, are then defined on the mapped vectors. By combining these new encoding methods, a multi-class classification system for the prediction of protein subnuclear localizations is established for the first time. The performance of the system is evaluated with a set of proteins collected in the Nuclear Protein Database (NPD). The overall accuracy of prediction for 6 localizations is about 50% (vs. random prediction 16.7%) for single localization proteins in the leave-one-out cross-validation; and 65% for an independent set of multi-localization proteins. This integrated system can be accessed at . CONCLUSION: The integrated system benefits from the combination of predictions from several SVMs based on selected encoding methods. Finally, the predictive power of the system is expected to improve as more proteins with known subnuclear localizations become available

    A manually curated network of the PML nuclear body interactome reveals an important role for PML-NBs in SUMOylation dynamics

    Get PDF
    Promyelocytic Leukaemia Protein nuclear bodies (PML-NBs) are dynamic nuclear protein aggregates. To gain insight in PML-NB function, reductionist and high throughput techniques have been employed to identify PML-NB proteins. Here we present a manually curated network of the PML-NB interactome based on extensive literature review including database information. By compiling 'the PML-ome', we highlighted the presence of interactors in the Small Ubiquitin Like Modifier (SUMO) conjugation pathway. Additionally, we show an enrichment of SUMOylatable proteins in the PML-NBs through an in-house prediction algorithm. Therefore, based on the PML network, we hypothesize that PML-NBs may function as a nuclear SUMOylation hotspot

    In vitro nuclear interactome of the HIV-1 Tat protein

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One facet of the complexity underlying the biology of HIV-1 resides not only in its limited number of viral proteins, but in the extensive repertoire of cellular proteins they interact with and their higher-order assembly. HIV-1 encodes the regulatory protein Tat (86–101aa), which is essential for HIV-1 replication and primarily orchestrates HIV-1 provirus transcriptional regulation. Previous studies have demonstrated that Tat function is highly dependent on specific interactions with a range of cellular proteins. However they can only partially account for the intricate molecular mechanisms underlying the dynamics of proviral gene expression. To obtain a comprehensive nuclear interaction map of Tat in T-cells, we have designed a proteomic strategy based on affinity chromatography coupled with mass spectrometry.</p> <p>Results</p> <p>Our approach resulted in the identification of a total of 183 candidates as Tat nuclear partners, 90% of which have not been previously characterised. Subsequently we applied <it>in silico </it>analysis, to validate and characterise our dataset which revealed that the Tat nuclear interactome exhibits unique signature(s). First, motif composition analysis highlighted that our dataset is enriched for domains mediating protein, RNA and DNA interactions, and helicase and ATPase activities. Secondly, functional classification and network reconstruction clearly depicted Tat as a polyvalent protein adaptor and positioned Tat at the nexus of a densely interconnected interaction network involved in a range of biological processes which included gene expression regulation, RNA biogenesis, chromatin structure, chromosome organisation, DNA replication and nuclear architecture.</p> <p>Conclusion</p> <p>We have completed the <it>in vitro </it>Tat nuclear interactome and have highlighted its modular network properties and particularly those involved in the coordination of gene expression by Tat. Ultimately, the highly specialised set of molecular interactions identified will provide a framework to further advance our understanding of the mechanisms of HIV-1 proviral gene silencing and activation.</p

    Prediction of protein submitochondria locations by hybridizing pseudo-amino acid composition with various physicochemical features of segmented sequence

    Get PDF
    BACKGROUND: Knowing the submitochondria localization of a mitochondria protein is an important step to understand its function. We develop a method which is based on an extended version of pseudo-amino acid composition to predict the protein localization within mitochondria. This work goes one step further than predicting protein subcellular location. We also try to predict the membrane protein type for mitochondrial inner membrane proteins. RESULTS: By using leave-one-out cross validation, the prediction accuracy is 85.5% for inner membrane, 94.5% for matrix and 51.2% for outer membrane. The overall prediction accuracy for submitochondria location prediction is 85.2%. For proteins predicted to localize at inner membrane, the accuracy is 94.6% for membrane protein type prediction. CONCLUSION: Our method is an effective method for predicting protein submitochondria location. But even with our method or the methods at subcellular level, the prediction of protein submitochondria location is still a challenging problem. The online service SubMito is now available at
    corecore