7 research outputs found

    Pan-Cancer Identification and Prioritization of Cancer-Associated Alternatively Spliced and Differentially Expressed Genes: A Biomarker Discovery Application

    Get PDF
    Tumour cells arise through aberrant expression of genes and the proteins they encode. This may result from a direct change to DNA sequence or perturbations in the machinery responsible for production or activity of proteins, such as gene splicing. With the advent of massively parallel RNA-sequencing (RNA-seq), large-scale exploration of changes at the stage of transcription and posttranscriptional splicing has the potential to unravel the landscape of gene expression changes across human cancers. Aberrantly expressed genes in cancer can serve as molecular biomarkers for discrimination of tumour and normal cells if localized to the cell surface and therefore can be used as targets for targeted antibody-based cancer therapy. In the current study, I devised an analysis pipeline to identify and rank such events from human cancer RNA-seq datasets. Using my pipeline, I conducted a pan-cancer analysis in the RNA-sequencing data of more than 7,000 patients from 24 different cancer types generated by the cancer genome atlas (TCGA). I identified abnormally expressed and alternatively spliced genes, which seemed to be cancer-associated in comparison to a large compendium of transcriptomes from non-diseased tissues gathered from Genotype-Tissue Expression (GTEx) and TCGA. My analysis revealed 1,503 putative tumor-associated abnormally expressed genes and 1,142 novel cancer-associated splice variants occurring in 694 genes. In order to rank identified candidate genes, I performed an extensive literature search and studied known therapeutic antibody targets to collect the characteristics of an ideal antibody target in cancer. I developed an R package, Prize, based on the Analytic Hierarchy Process (AHP) algorithm. AHP is a multiple-criteria decision making solution that allows a user to prioritize a list of elements based of a set of user-define criteria and numerical score that express the importance of each criterion to achieving the goal. I built an AHP model to depict cancer biomarker target properties for ranking and prioritizing the genes. Using this model, Prize was able to successfully recognize and rank known tumour biomarker targets among the top 25 ranked list along with other novel candidates

    Bioinformatics Analysis Identify Novel OB Fold Protein Coding Genes in <i>C. elegans</i>

    Get PDF
    <div><p>Background</p><p>The <i>C. elegans</i> genome has been extensively annotated by the WormBase consortium that uses state of the art bioinformatics pipelines, functional genomics and manual curation approaches. As a result, the identification of novel genes <i>in silico</i> in this model organism is becoming more challenging requiring new approaches. The Oligonucleotide-oligosaccharide binding (OB) fold is a highly divergent protein family, in which protein sequences, in spite of having the same fold, share very little sequence identity (5–25%). Therefore, evidence from sequence-based annotation may not be sufficient to identify all the members of this family. In <i>C. elegans</i>, the number of OB-fold proteins reported is remarkably low (nβ€Š=β€Š46) compared to other evolutionary-related eukaryotes, such as yeast <i>S. cerevisiae</i> (nβ€Š=β€Š344) or fruit fly <i>D. melanogaster</i> (nβ€Š=β€Š84). Gene loss during evolution or differences in the level of annotation for this protein family, may explain these discrepancies.</p><p>Methodology/Principal Findings</p><p>This study examines the possibility that novel OB-fold coding genes exist in the worm. We developed a bioinformatics approach that uses the most sensitive sequence-sequence, sequence-profile and profile-profile similarity search methods followed by 3D-structure prediction as a filtering step to eliminate false positive candidate sequences. We have predicted 18 coding genes containing the OB-fold that have remarkably partially been characterized in <i>C. elegans</i>.</p><p>Conclusions/Significance</p><p>This study raises the possibility that the annotation of highly divergent protein fold families can be improved in <i>C. elegans</i>. Similar strategies could be implemented for large scale analysis by the WormBase consortium when novel versions of the genome sequence of <i>C. elegans</i>, or other evolutionary related species are being released. This approach is of general interest to the scientific community since it can be used to annotate any genome.</p></div

    Tools used in this study.

    No full text
    <p>Tools used in this study.</p

    Functional analysis of Novel OB folds protein coding genes.

    No full text
    *<p>refers to predicted functions. Homologues and paralogues referred to human.</p

    Superimposition of the novel OB-fold 3D-model with their templates.

    No full text
    <p>(Light blue): Predicted 3D-models, (Wheat) PDB template. (.XXXX.)-nxxx name correspond to the protein name followed by the PDB code of the template.</p

    Discovery Pipeline of novel OB fold protein coding genes.

    No full text
    <p>It contains 3 <u>Di</u>scovery <u>M</u>odules. SeqDIM: <u>Seq</u>uence alignment <u>DI</u>scovery <u>M</u>odule; StrucDIM:3D <u>Struc</u>ture prediction <u>Di</u>scovery <u>M</u>odule; and a <u>Func</u>tional prediction <u>Di</u>scovery <u>M</u>odule FuncDIM.</p
    corecore