4 research outputs found

    Proteome-wide Analysis of Non-synonymous Single-nucleotide Variations in Active Sites of Human Proteins

    No full text
    An enzyme's active site is essential to normal protein activity such that any disruptions at this site may lead to dysfunction and disease. Non-synonymous single nucleotide variations (nsSNVs), which alter the amino acid sequence by a one-nucleotide-substitution in the codon, are one type of genomic change that can alter the active site. When this occurs, it is assumed enzyme activity will vary because of the criticality of the site to normal protein function. We integrate nsSNV data and active site annotations from curated resources to identify all active-site-impacting-nsSNVs in the human genome and search for all pathways observed to be associated with this dataset to assess the likely consequences. We find that there are 934 unique nsSNVs that cause alteration in the active sites of 559 proteins. Analysis of the nsSNV data shows an over-representation of arginine and an under-representation of cysteine, phenylalanine and tyrosine when comparing the list of nsSNV-impacted active site residues to the list of all possible proteomic active site residues, implying a potential bias for or against variation of these residues at the active site. Clustering analysis shows an abundance of hydrolases and transferases. Pathway and functional analysis shows several pathways over- or under-represented in the active site nsSNV dataset with the most significantly affected pathways involved in carbohydrate metabolism. We provide a table of 32 variation-substrate/product pairs that can be used in targeted metabolomics experiments to assay for presence and quantify the effects of specific variations. Additionally, we find significant prevalence of aspartic acid to histidine variation in 8 proteins associated with 9 diseases including Glycogen storage diseases, Lacrimo-auriculo-dento-digital syndrome, Parkinson's Disease and several cancers

    Comprehensive Analysis of Glycosyltransferases in Cancer Identified from a Multifaceted Genomics Approach

    No full text
    Despite accumulating evidence supporting a role for glycosylation in cancer progression and prognosis, the complexity of the human glycome and study thereof poses many challenges to gaining a comprehensive understanding of glycosylation-related events in cancer. In this study, a multifaceted genomics approach was applied to analyze potential impact of differential expression of glycosyltransferases responsible involved in various glycosylation pathways. An enzyme list was first compiled and curated from numerous resources to create a consensus list of glycosyltransferases. These enzymes were then analyzed for differential expression in cancer and examined for enrichment among cancer samples. Finally, these results were integrated with experimental evidence from other types of analyses including similarity of expression patterns across orthologous genes in mice, miRNA expression of miRNAs expected to target these genes, scRNA expression of the same genes, and automatically mined literature relationships for these genes in human disease. Top genes identified by cross-referencing analyses were examined with respect to functional impact, and high-value glycan residues were identified. Relevant findings have been made publicly available through OncoMX at data.oncomx.org, developed in part within the scope of this project. Scripts (available in GitHub) and the overarching pipeline defined herein can be used as a framework for similarly analyzing other groups of enzymes for impact across diverse evidence types in cancer. This work is expected to improve the overall understanding of the role of glycosylation in cancer by transparently defining the space of glycosyltransferase enzymes, and by harmonizing variable experimental data to enable improved generation of data-driven cancer biomarker hypotheses

    Loss and gain of N-linked glycosylation sequons due to single-nucleotide variation in cancer.

    No full text
    Abstract Despite availability of sequence site-specific information resulting from years of sequencing and sequence feature curation, there have been few efforts to integrate and annotate this information. In this study, we update the number of human N-linked glycosylation sequons (NLGs), and we investigate cancer-relatedness of glycosylation-impacting somatic nonsynonymous single-nucleotide variation (nsSNV) by mapping human NLGs to cancer variation data and reporting the expected loss or gain of glycosylation sequon. We find 75.8% of all human proteins have at least one NLG for a total of 59,341 unique NLGs (includes predicted and experimentally validated). Only 27.4% of all NLGs are experimentally validated sites on 4,412 glycoproteins. With respect to cancer, 8,895 somatic-only nsSNVs abolish NLGs in 5,204 proteins and 12,939 somatic-only nsSNVs create NLGs in 7,356 proteins in cancer samples. nsSNVs causing loss of 24 NLGs on 23 glycoproteins and nsSNVs creating 41 NLGs on 40 glycoproteins are identified in three or more cancers. Of all identified cancer somatic variants causing potential loss or gain of glycosylation, only 36 have previously known disease associations. Although this work is computational, it builds on existing genomics and glycobiology research to promote identification and rank potential cancer nsSNV biomarkers for experimental validation
    corecore