27 research outputs found

    A framework for application of metabolic modeling in yeast to predict the effects of nsSNV in human orthologs

    Get PDF
    Background We have previously suggested a method for proteome wide analysis of variation at functional residues wherein we identified the set of all human genes with nonsynonymous single nucleotide variation (nsSNV) in the active site residue of the corresponding proteins. 34 of these proteins were shown to have a 1:1:1 enzyme:pathway:reaction relationship, making these proteins ideal candidates for laboratory validation through creation and observation of specific yeast active site knock-outs and downstream targeted metabolomics experiments. Here we present the next step in the workflow toward using yeast metabolic modeling to predict human metabolic behavior resulting from nsSNV. Results For the previously identified candidate proteins, we used the reciprocal best BLAST hits method followed by manual alignment and pathway comparison to identify 6 human proteins with yeast orthologs which were suitable for flux balance analysis (FBA). 5 of these proteins are known to be associated with diseases, including ribose 5-phosphate isomerase deficiency, myopathy with lactic acidosis and sideroblastic anaemia, anemia due to disorders of glutathione metabolism, and two porphyrias, and we suspect the sixth enzyme to have disease associations which are not yet classified or understood based on the work described herein. Conclusions Preliminary findings using the Yeast 7.0 FBA model show lack of growth for only one enzyme, but augmentation of the Yeast 7.0 biomass function to better simulate knockout of certain genes suggested physiological relevance of variations in three additional proteins. Thus, we suggest the following four proteins for laboratory validation: delta-aminolevulinic acid dehydratase, ferrochelatase, ribose-5 phosphate isomerase and mitochondrial tyrosyl-tRNA synthetase. This study indicates that the predictive ability of this method will improve as more advanced, comprehensive models are developed. Moreover, these findings will be useful in the development of simple downstream biochemical or mass-spectrometric assays to corroborate these predictions and detect presence of certain known nsSNVs with deleterious outcomes. Results may also be useful in predicting as yet unknown outcomes of active site nsSNVs for enzymes that are not yet well classified or annotated

    BioMuta and BioXpress: mutation and expression knowledgebases for cancer biomarker discovery

    Get PDF
    Single-nucleotide variation and gene expression of disease samples represent important resources for biomarker discovery. Many databases have been built to host and make available such data to the community, but these databases are frequently limited in scope and/or content. BioMuta, a database of cancer-associated single-nucleotide variations, and BioXpress, a database of cancer-associated differentially expressed genes and microRNAs, differ from other disease-associated variation and expression databases primarily through the aggregation of data across many studies into a single source with a unified representation and annotation of functional attributes. Early versions of these resources were initiated by pilot funding for specific research applications, but newly awarded funds have enabled hardening of these databases to production-level quality and will allow for sustained development of these resources for the next few years. Because both resources were developed using a similar methodology of integration, curation, unification, and annotation, we present BioMuta and BioXpress as allied databases that will facilitate a more comprehensive view of gene associations in cancer. BioMuta and BioXpress are hosted on the High-performance Integrated Virtual Environment (HIVE) server at the George Washington University at https://hive.biochemistry.gwu.edu/biomuta and https://hive.biochemistry.gwu.edu/bioxpress, respectively

    Human germline and pan-cancer variomes and their distinct functional profiles

    Get PDF
    Identification of non-synonymous single nucleotide variations (nsSNVs) has exponentially increased due to advances in Next-Generation Sequencing technologies. The functional impacts of these variations have been difficult to ascertain because the corresponding knowledge about sequence functional sites is quite fragmented. It is clear that mapping of variations to sequence functional features can help us better understand the pathophysiological role of variations. In this study, we investigated the effect of nsSNVs on more than 17 common types of post-translational modification (PTM) sites, active sites and binding sites. Out of 1 705 285 distinct nsSNVs on 259 216 functional sites we identified 38 549 variations that significantly affect 10 major functional sites. Furthermore, we found distinct patterns of site disruptions due to germline and somatic nsSNVs. Pan-cancer analysis across 12 different cancer types led to the identification of 51 genes with 106 nsSNV affected functional sites found in 3 or more cancer types. 13 of the 51 genes overlap with previously identified Significantly Mutated Genes (Nature. 2013 Oct 17;502(7471)). 62 mutations in these 13 genes affecting functional sites such as DNA, ATP binding and various PTM sites occur across several cancers and can be prioritized for additional validation and investigations

    BioXpress: an integrated RNA-seq-derived gene expression database for pan-cancer analysis.

    Get PDF
    BioXpress is a gene expression and cancer association database in which the expression levels are mapped to genes using RNA-seq data obtained from The Cancer Genome Atlas, International Cancer Genome Consortium, Expression Atlas and publications. The BioXpress database includes expression data from 64 cancer types, 6361 patients and 17 469 genes with 9513 of the genes displaying differential expression between tumor and normal samples. In addition to data directly retrieved from RNA-seq data repositories, manual biocuration of publications supplements the available cancer association annotations in the database. All cancer types are mapped to Disease Ontology terms to facilitate a uniform pan-cancer analysis. The BioXpress database is easily searched using HUGO Gene Nomenclature Committee gene symbol, UniProtKB/RefSeq accession or, alternatively, can be queried by cancer type with specified significance filters. This interface along with availability of pre-computed downloadable files containing differentially expressed genes in multiple cancers enables straightforward retrieval and display of a broad set of cancer-related genes

    Proteome-wide Analysis of Non-synonymous Single-nucleotide Variations in Active Sites of Human Proteins

    No full text
    An enzyme's active site is essential to normal protein activity such that any disruptions at this site may lead to dysfunction and disease. Non-synonymous single nucleotide variations (nsSNVs), which alter the amino acid sequence by a one-nucleotide-substitution in the codon, are one type of genomic change that can alter the active site. When this occurs, it is assumed enzyme activity will vary because of the criticality of the site to normal protein function. We integrate nsSNV data and active site annotations from curated resources to identify all active-site-impacting-nsSNVs in the human genome and search for all pathways observed to be associated with this dataset to assess the likely consequences. We find that there are 934 unique nsSNVs that cause alteration in the active sites of 559 proteins. Analysis of the nsSNV data shows an over-representation of arginine and an under-representation of cysteine, phenylalanine and tyrosine when comparing the list of nsSNV-impacted active site residues to the list of all possible proteomic active site residues, implying a potential bias for or against variation of these residues at the active site. Clustering analysis shows an abundance of hydrolases and transferases. Pathway and functional analysis shows several pathways over- or under-represented in the active site nsSNV dataset with the most significantly affected pathways involved in carbohydrate metabolism. We provide a table of 32 variation-substrate/product pairs that can be used in targeted metabolomics experiments to assay for presence and quantify the effects of specific variations. Additionally, we find significant prevalence of aspartic acid to histidine variation in 8 proteins associated with 9 diseases including Glycogen storage diseases, Lacrimo-auriculo-dento-digital syndrome, Parkinson's Disease and several cancers

    Comprehensive Analysis of Glycosyltransferases in Cancer Identified from a Multifaceted Genomics Approach

    No full text
    Despite accumulating evidence supporting a role for glycosylation in cancer progression and prognosis, the complexity of the human glycome and study thereof poses many challenges to gaining a comprehensive understanding of glycosylation-related events in cancer. In this study, a multifaceted genomics approach was applied to analyze potential impact of differential expression of glycosyltransferases responsible involved in various glycosylation pathways. An enzyme list was first compiled and curated from numerous resources to create a consensus list of glycosyltransferases. These enzymes were then analyzed for differential expression in cancer and examined for enrichment among cancer samples. Finally, these results were integrated with experimental evidence from other types of analyses including similarity of expression patterns across orthologous genes in mice, miRNA expression of miRNAs expected to target these genes, scRNA expression of the same genes, and automatically mined literature relationships for these genes in human disease. Top genes identified by cross-referencing analyses were examined with respect to functional impact, and high-value glycan residues were identified. Relevant findings have been made publicly available through OncoMX at data.oncomx.org, developed in part within the scope of this project. Scripts (available in GitHub) and the overarching pipeline defined herein can be used as a framework for similarly analyzing other groups of enzymes for impact across diverse evidence types in cancer. This work is expected to improve the overall understanding of the role of glycosylation in cancer by transparently defining the space of glycosyltransferase enzymes, and by harmonizing variable experimental data to enable improved generation of data-driven cancer biomarker hypotheses
    corecore