15 research outputs found

    Functional Implications of Structural Predictions for Alternative Splice Proteins Expressed in Her2/neu–Induced Breast Cancers

    No full text
    Alternative splicing allows a single gene to generate multiple mRNA transcripts, which can be translated into functionally diverse proteins. However, experimentally determined structures of protein splice isoforms are rare, and homology modeling methods are poor at predicting atomic-level structural differences because of high sequence identity. Here we exploit the state-of-the-art structure prediction method I-TASSER to analyze the structural and functional consequences of alternative splicing of proteins differentially expressed in a breast cancer model. We first successfully benchmarked the I-TASSER pipeline for structure modeling of all seven pairs of protein splice isoforms, which are known to have experimentally solved structures. We then modeled three cancer-related variant pairs reported to have opposite functions. In each pair, we observed structural differences in regions where the presence or absence of a motif can directly influence the distinctive functions of the variants. Finally, we applied the method to five splice variants overexpressed in mouse Her2/neu mammary tumor: anxa6, calu, cdc42, ptbp1, and tax1bp3. Despite >75% sequence identity between the variants, structural differences were observed in biologically important regions of these protein pairs. These results demonstrate the feasibility of integrating proteomic analysis with structure-based conformational predictions of differentially expressed alternative splice variants in cancers and other conditions

    Robust performance of our algorithm to predicting functions using RNA-seq data.

    No full text
    <p>We carried out five-fold cross validation to test the performance of our algorithm. For each function, the prediction value for each gene is assigned the maximum prediction value of all of its isoforms, under the assumption that at least one of its isoforms should carry out the function. Because the number of known genes of each GO term systematically affects the prediction performance, we group these terms into 5 groups according to their GO term sizes. (A)–(D) shows the distribution (10, 25, 50, 75, 90%) of the AUCs, the AUPRCs, the precisions at 1% recall and the precisions at 10% recall, respectively.</p

    Functional Networks of Highest-Connected Splice Isoforms: From The Chromosome 17 Human Proteome Project

    No full text
    Alternative splicing allows a single gene to produce multiple transcript-level splice isoforms from which the translated proteins may show differences in their expression and function. Identifying the major functional or canonical isoform is important for understanding gene and protein functions. Identification and characterization of splice isoforms is a stated goal of the HUPO Human Proteome Project and of neXtProt. Multiple efforts have catalogued splice isoforms as “dominant”, “principal”, or “major” isoforms based on expression or evolutionary traits. In contrast, we recently proposed highest connected isoforms (HCIs) as a new class of canonical isoforms that have the strongest interactions in a functional network and revealed their significantly higher (differential) transcript-level expression compared to nonhighest connected isoforms (NCIs) regardless of tissues/cell lines in the mouse. HCIs and their expression behavior in the human remain unexplored. Here we identified HCIs for 6157 multi-isoform genes using a human isoform network that we constructed by integrating a large compendium of heterogeneous genomic data. We present examples for pairs of transcript isoforms of <i>ABCC3, RBM34</i>, <i>ERBB2</i>, and <i>ANXA7</i>. We found that functional networks of isoforms of the same gene can show large differences. Interestingly, differential expression between HCIs and NCIs was also observed in the human on an independent set of 940 RNA-seq samples across multiple tissues, including heart, kidney, and liver. Using proteomic data from normal human retina and placenta, we showed that HCIs are a promising indicator of expressed protein isoforms exemplified by <i>NUDFB6</i> and <i>M6PR</i>. Furthermore, we found that a significant percentage (20%, <i>p</i> = 0.0003) of human and mouse HCIs are homologues, suggesting their conservation between species. Our identified HCIs expand the repertoire of canonical isoforms and are expected to facilitate studying main protein products, understanding gene regulation, and possibly evolution. The network is available through our web server as a rich resource for investigating isoform functional relationships (http://guanlab.ccmb.med.umich.edu/hisonet). All MS/MS data were available at ProteomeXchange Web site (http://www.proteomexchange.org) through their identifiers (retina: PXD001242, placenta: PXD000754)

    Performance comparison of different formulations of the SVM-MIL algorithm in predicting isoform functions.

    No full text
    <p>A. The histogram shows the score distribution of the instances in the positive bags and the negative bags in the training set. Different threshold choices in mi-SVM are based on the distribution of scores of negative genes. The first threshold is equal to the mode of distribution of scores from negative instances in the training set. The second threshold is equal to the 75% percentile of scores of the negative instances in the training set. The third threshold is equal to the maximum score of negative instances in the training set. B. This panel illustrates how different thresholds and formulations can divide the isoforms in a positive bag into positive, negative and neutral classes. Three thresholds in mi-SVM represent different degrees of strictness for assigning labels. The first threshold is the least strict, which assigns most of the isoforms from positive genes as positive, whereas the third threshold is the strictest, which in general leaves only one positive instance in every positive bag. For the MI-SVM formulation, only one isoform per positive gene is assigned as positive, and other isoforms are dropped (<i>i.e.</i> neutral class). C. Performance comparison of three different threshold choices for the mi-SVM formulation, the MI-SVM formulation and the MI-SVM formulation with random witness selection. This plot shows that the mi-SVM formulation with threshold-2 performs best in terms of AUC.</p

    Distinct Splice Variants and Pathway Enrichment in the Cell-Line Models of Aggressive Human Breast Cancer Subtypes

    No full text
    This study was conducted as a part of the Chromosome-Centric Human Proteome Project (C-HPP) of the Human Proteome Organization. The United States team of C-HPP is focused on characterizing the protein-coding genes in chromosome 17. Despite its small size, chromosome 17 is rich in protein-coding genes; it contains many cancer-associated genes, including BRCA1, ERBB2, (Her2/neu), and TP53. The goal of this study was to examine the splice variants expressed in three ERBB2 expressed breast cancer cell-line models of hormone-receptor-negative breast cancers by integrating RNA-Seq and proteomic mass spectrometry data. The cell lines represent distinct phenotypic variations subtype: SKBR3 (ERBB2+ (overexpression)/ER–/PR–; adenocarcinoma), SUM190 (ERBB2+ (overexpression)/ER–/PR–; inflammatory breast cancer), and SUM149 (ERBB2 (low expression) ER–/PR–; inflammatory breast cancer). We identified more than one splice variant for 1167 genes expressed in at least one of the three cancer cell lines. We found multiple variants of genes that are in the signaling pathways downstream of ERBB2 along with variants specific to one cancer cell line compared with the other two cancer cell lines and with normal mammary cells. The overall transcript profiles based on read counts indicated more similarities between SKBR3 and SUM190. The top-ranking Gene Ontology and BioCarta pathways for the cell-line specific variants pointed to distinct key mechanisms including: amino sugar metabolism, caspase activity, and endocytosis in SKBR3; different aspects of metabolism, especially of lipids in SUM190; cell-to-cell adhesion, integrin, and ERK1/ERK2 signaling; and translational control in SUM149. The analyses indicated an enrichment in the electron transport chain processes in the ERBB2 overexpressed cell line models and an association of nucleotide binding, RNA splicing, and translation processes with the IBC models, SUM190 and SUM149. Detailed experimental studies on the distinct variants identified from each of these three breast cancer cell line models that may open opportunities for drug target discovery and help unveil their specific roles in cancer progression and metastasis

    Distinct Splice Variants and Pathway Enrichment in the Cell-Line Models of Aggressive Human Breast Cancer Subtypes

    No full text
    This study was conducted as a part of the Chromosome-Centric Human Proteome Project (C-HPP) of the Human Proteome Organization. The United States team of C-HPP is focused on characterizing the protein-coding genes in chromosome 17. Despite its small size, chromosome 17 is rich in protein-coding genes; it contains many cancer-associated genes, including BRCA1, ERBB2, (Her2/neu), and TP53. The goal of this study was to examine the splice variants expressed in three ERBB2 expressed breast cancer cell-line models of hormone-receptor-negative breast cancers by integrating RNA-Seq and proteomic mass spectrometry data. The cell lines represent distinct phenotypic variations subtype: SKBR3 (ERBB2+ (overexpression)/ER–/PR–; adenocarcinoma), SUM190 (ERBB2+ (overexpression)/ER–/PR–; inflammatory breast cancer), and SUM149 (ERBB2 (low expression) ER–/PR–; inflammatory breast cancer). We identified more than one splice variant for 1167 genes expressed in at least one of the three cancer cell lines. We found multiple variants of genes that are in the signaling pathways downstream of ERBB2 along with variants specific to one cancer cell line compared with the other two cancer cell lines and with normal mammary cells. The overall transcript profiles based on read counts indicated more similarities between SKBR3 and SUM190. The top-ranking Gene Ontology and BioCarta pathways for the cell-line specific variants pointed to distinct key mechanisms including: amino sugar metabolism, caspase activity, and endocytosis in SKBR3; different aspects of metabolism, especially of lipids in SUM190; cell-to-cell adhesion, integrin, and ERK1/ERK2 signaling; and translational control in SUM149. The analyses indicated an enrichment in the electron transport chain processes in the ERBB2 overexpressed cell line models and an association of nucleotide binding, RNA splicing, and translation processes with the IBC models, SUM190 and SUM149. Detailed experimental studies on the distinct variants identified from each of these three breast cancer cell line models that may open opportunities for drug target discovery and help unveil their specific roles in cancer progression and metastasis
    corecore