26 research outputs found

    An SVM-based system for predicting protein subnuclear localizations

    Get PDF
    BACKGROUND: The large gap between the number of protein sequences in databases and the number of functionally characterized proteins calls for the development of a fast computational tool for the prediction of subnuclear and subcellular localizations generally applicable to protein sequences. The information on localization may reveal the molecular function of novel proteins, in addition to providing insight on the biological pathways in which they function. The bulk of past work has been focused on protein subcellular localizations. Furthermore, no specific tool has been dedicated to prediction at the subnuclear level, despite its high importance. In order to design a suitable predictive system, the extraction of subtle sequence signals that can discriminate among proteins with different subnuclear localizations is the key. RESULTS: New kernel functions used in a support vector machine (SVM) learning model are introduced for the measurement of sequence similarity. The k-peptide vectors are first mapped by a matrix of high-scored pairs of k-peptides which are measured by BLOSUM62 scores. The kernels, measuring the similarity for sequences, are then defined on the mapped vectors. By combining these new encoding methods, a multi-class classification system for the prediction of protein subnuclear localizations is established for the first time. The performance of the system is evaluated with a set of proteins collected in the Nuclear Protein Database (NPD). The overall accuracy of prediction for 6 localizations is about 50% (vs. random prediction 16.7%) for single localization proteins in the leave-one-out cross-validation; and 65% for an independent set of multi-localization proteins. This integrated system can be accessed at . CONCLUSION: The integrated system benefits from the combination of predictions from several SVMs based on selected encoding methods. Finally, the predictive power of the system is expected to improve as more proteins with known subnuclear localizations become available

    Abundant copy-number loss of CYCLOPS and STOP genes in gastric adenocarcinoma

    Get PDF
    Background Gastric cancer, a leading cause of cancer death worldwide, has been little studied compared with other cancers that impose similar health burdens. Our goal is to assess genomic copy-number loss and the possible functional consequences and therapeutic implications thereof across a large series of gastric adenocarcinomas. Methods We used high-density single-nucleotide polymorphism microarrays to determine patterns of copy-number loss and allelic imbalance in 74 gastric adenocarcinomas. We investigated whether suppressor of tumorigenesis and/or proliferation (STOP) genes are associated with genomic copy-number loss. We also analyzed the extent to which copy-number loss affects Copy-number alterations Yielding Cancer Liabilities Owing to Partial losS (CYCLOPS) genes–genes that may be attractive targets for therapeutic inhibition when partially deleted. Results The proportion of the genome subject to copy-number loss varies considerably from tumor to tumor, with a median of 5.5 %, and a mean of 12 % (range 0–58.5 %). On average, 91 STOP genes were subject to copy-number loss per tumor (median 35, range 0–452), and STOP genes tended to have lower copy-number compared with the rest of the genes. Furthermore, on average, 1.6 CYCLOPS genes per tumor were both subject to copy-number loss and downregulated, and 51.4 % of the tumors had at least one such gene. Conclusions The enrichment of STOP genes in regions of copy-number loss indicates that their deletion may contribute to gastric carcinogenesis. Furthermore, the presence of several deleted and downregulated CYCLOPS genes in some tumors suggests potential therapeutic targets in these tumors.Singapore. Ministry of Health (Duke-NUS Signature Research Programs)Singapore. Agency for Science, Technology and ResearchSingapore-MIT Allianc

    Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction

    Get PDF
    BACKGROUND: The accomplishment of the various genome sequencing projects resulted in accumulation of massive amount of gene sequence information. This calls for a large-scale computational method for predicting protein localization from sequence. The protein localization can provide valuable information about its molecular function, as well as the biological pathway in which it participates. The prediction of localization of a protein at subnuclear level is a challenging task. In our previous work we proposed an SVM-based system using protein sequence information for this prediction task. In this work, we assess protein similarity with Gene Ontology (GO) and then improve the performance of the system by adding a module of nearest neighbor classifier using a similarity measure derived from the GO annotation terms for protein sequences. RESULTS: The performance of the new system proposed here was compared with our previous system using a set of proteins resided within 6 localizations collected from the Nuclear Protein Database (NPD). The overall MCC (accuracy) is elevated from 0.284 (50.0%) to 0.519 (66.5%) for single-localization proteins in leave-one-out cross-validation; and from 0.420 (65.2%) to 0.541 (65.2%) for an independent set of multi-localization proteins. The new system is available at . CONCLUSION: The prediction of protein subnuclear localizations can be largely influenced by various definitions of similarity for a pair of proteins based on different similarity measures of GO terms. Using the sum of similarity scores over the matched GO term pairs for two proteins as the similarity definition produced the best predictive outcome. Substantial improvement in predicting protein subnuclear localizations has been achieved by combining Gene Ontology with sequence information

    A New Kernel Based on High-Scored Pairs of Tri-peptides and Its Application in Prediction of Protein Subcellular Localization ⋆

    No full text
    Abstract. A new kernel has been developed for vectors derived from a coding scheme of the tri-peptide composition for protein sequences. This kernel defines the sequence similarity through a mapping that transforms a tri-peptide coding vector into a new vector based on a matrix formed by the high BLOSUM scores associated with pairs of tri-peptides. In conjunction with the use of support vector machines, the effectiveness of the new kernel is evaluated against the conventional coding schemes of k-peptide (k ≤ 3) for the prediction of subcellular localizations of proteins in Gram-negative bacteria. It is demonstrated that the new method outperforms all the other methods in a 5-fold cross-validation. Keywords: protein subcellular localization, Gram-negative bacteria, BLOSUM matrix, kernel, support vector machine.

    Lipidomics identifies a requirement for peroxisomal function during influenza virus replication

    Get PDF
    Influenza virus acquires a host-derived lipid envelope during budding, yet a convergent view on the role of host lipid metabolism during infection is lacking. Using a mass spectrometry-based lipidomics approach, we provide a systems-scale perspective on membrane lipid dynamics of infected human lung epithelial cells and purified influenza virions. We reveal enrichment of the minor peroxisome-derived ether-linked phosphatidylcholines relative to bulk ester-linked phosphatidylcholines in virions as a unique pathogenicity-dependent signature for influenza not found in other enveloped viruses. Strikingly, pharmacological and genetic interference with peroxisomal and ether lipid metabolism impaired influenza virus production. Further integration of our lipidomics results with published genomics and proteomics data corroborated altered peroxisomal lipid metabolism as a hallmark of influenza virus infection in vitro and in vivo. Influenza virus may therefore tailor peroxisomal and particularly ether lipid metabolism for efficient replication
    corecore