22 research outputs found

    Characteristics and clustering of human ribosomal protein genes

    Get PDF
    BACKGROUND: The ribosome is a central player in the translation system, which in mammals consists of four RNA species and 79 ribosomal proteins (RPs). The control mechanisms of gene expression and the functions of RPs are believed to be identical. Most RP genes have common promoters and were therefore assumed to have a unified gene expression control mechanism. RESULTS: We systematically analyzed the homogeneity and heterogeneity of RP genes on the basis of their expression profiles, promoter structures, encoded amino acid compositions, and codon compositions. The results revealed that (1) most RP genes are coordinately expressed at the mRNA level, with higher signals in the spleen, lymph node dissection (LND), and fetal brain. However, 17 genes, including the P protein genes (RPLP0, RPLP1, RPLP2), are expressed in a tissue-specific manner. (2) Most promoters have GC boxes and possible binding sites for nuclear respiratory factor 2, Yin and Yang 1, and/or activator protein 1. However, they do not have canonical TATA boxes. (3) Analysis of the amino acid composition of the encoded proteins indicated a high lysine and arginine content. (4) The major RP genes exhibit a characteristic synonymous codon composition with high rates of G or C in the third-codon position and a high content of AAG, CAG, ATC, GAG, CAC, and CTG. CONCLUSION: Eleven of the RP genes are still identified as being unique and did not exhibit at least some of the above characteristics, indicating that they may have unknown functions not present in other RP genes. Furthermore, we found sequences conserved between human and mouse genes around the transcription start sites and in the intronic regions. This study suggests certain overall trends and characteristic features of human RP genes

    Discovering cancer-associated transcripts by RNA sequencing

    Full text link
    High-throughput sequencing of poly-adenylated RNA (RNA-Seq) in human cancers shows remarkable potential to identify uncharacterized aspects of tumor biology, including gene fusions with therapeutic significance and disease markers such as long non-coding RNA (lncRNA) species. However, the analysis of RNA-Seq data places unprecedented demands upon computational infrastructures and algorithms, requiring novel bioinformatics approaches. To meet these demands, we present two new open-source software packages - ChimeraScan and AssemblyLine - designed to detect gene fusion events and novel lncRNAs, respectively. RNA-Seq studies utilizing ChimeraScan led to discoveries of new families of recurrent gene fusions in breast cancers and solitary fibrous tumors. Further, ChimeraScan was one of the key components of the repertoire of computational tools utilized in data analysis for MI-ONCOSEQ, a clinical sequencing initiative to identify potentially informative and actionable mutations in cancer patients’ tumors. AssemblyLine, by contrast, reassembles RNA sequencing data into full-length transcripts ab initio. In head-to-head analyses AssemblyLine compared favorably to existing ab initio approaches and unveiled abundant novel lncRNAs, including antisense and intronic lncRNAs disregarded by previous studies. Moreover, we used AssemblyLine to define the prostate cancer transcriptome from a large patient cohort and discovered myriad lncRNAs, including 121 prostate cancer-associated transcripts (PCATs) that could potentially serve as novel disease markers. Functional studies of two PCATs - PCAT-1 and SChLAP1 - revealed cancer-promoting roles for these lncRNAs. PCAT1, a lncRNA expressed from chromosome 8q24, promotes cell proliferation and represses the tumor suppressor BRCA2. SChLAP1, located in a chromosome 2q31 ‘gene desert’, independently predicts poor patient outcomes, including metastasis and cancer-specific mortality. Mechanistically, SChLAP1 antagonizes the genome-wide localization and regulatory functions of the SWI/SNF chromatin-modifying complex. Collectively, this work demonstrates the utility of ChimeraScan and AssemblyLine as open-source bioinformatics tools. Our applications of ChimeraScan and AssemblyLine led to the discovery of new classes of recurrent and clinically informative gene fusions, and established a prominent role for lncRNAs in coordinating aggressive prostate cancer, respectively. We expect that the methods and findings described herein will establish a precedent for RNA-Seq-based studies in cancer biology and assist the research community at large in making similar discoveries.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120814/1/mkiyer_1.pd

    Multi-omics characterization of pancreatic neuroendocrine neoplasms

    Get PDF
    Pancreatic neuroendocrine neoplasms (PNENs) are biologically and clinically heterogeneous neoplasms in which pathogenic alterations are often indiscernible. Treatments for PNENs are insufficient in part due to lack of alternatives once current options are exhausted. Despite previous efforts to characterize PNENs at the molecular level, there remains a lack of molecular subgroups and molecular features with clinical utility for PNENs. In this work, I describe the identification and characterization of four molecularly distinct subgroups from primary PNEN specimens using whole-exome sequencing, RNA-sequencing and global proteome profiling. A Proliferative subgroup with molecular features of proliferating cells was associated with an inferior overall survival probability. A PDX1-high subgroup consisted of PNENs demonstrating genetic and transcriptomic indications of NRAS or HRAS activation. An Alpha cell-like subgroup, enriched in PNENs with deleterious MEN1 and DAXX mutations, bore transcriptomic similarity to pancreatic α-cells and harbored proteomic cues of dysregulated metabolism involving glutamine and arginine. Lastly, a Stromal/Mesenchymal subgroup exhibited increased expression and activation of the Hippo signaling pathway effectors YAP1 and WWTR1 that are of emerging interest as potentially actionable targets in other cancer types. Whole-genome and whole-transcriptome analysis of PNEN metastases identified novel molecular events likely contributing to pathogenesis, including one case presumably driven by MYCN amplification. In agreement with the findings in primary PNENs, four of the metastatic PNENs displayed a substantial Alpha cell-like subgroup signature and all harboured concurrent mutations in MEN1 and DAXX. Collectively, the identified subgroups present a potential stratification scheme that facilitates the identification of therapeutic vulnerabilities amidst PNEN heterogeneity to improve the effective management of PNENs

    Integrative computational approaches to study protein-nucleic acid interactions

    Get PDF
    Interactions between proteins and nucleic acid molecules are central to the cellular regulation and homeostasis. To study them, I employ a wide range of computational analysis methods to integrate genomic data from many types of experiment. This thesis has three parts. In the first part, I explore the patterns of indels created by CRISPR-Cas9 genome editing. By thorough characterisation of the precision of editing at thousands of genomic target sites, we identify simple sequence rules that can help predict these outcomes. Furthermore, we examine the role of the structural chromatin context in fine-tuning Cas9-DNA interactions. In the second part, I explore methods to study protein-RNA interactions. I use comparative computational analyses to assess both the data quality of, and data analysis methods for, different crosslinking and immunoprecipitation (CLIP) technologies. I then develop new methods to analyse data generated by hybrid individual-nucleotide resolution CLIP (hiCLIP). By tailoring computational solutions to an understanding of experimental conditions, I improve the overall sensitivity of hiCLIP, and ultimately feedback to drive ongoing experimental development. In the third part, I focus on the Staufen family of double-stranded RNA binding proteins and using hiCLIP data to define transcriptome-wide atlases of RNA duplexes bound by these proteins both in a cell line and in rat brain tissue. Through integration with other data sets, both publicly available and newly generated, I derive insights into their function in RNA metabolism, and in how these interactions change during the course of mammalian brain development with putative roles in ribonucleoprotein complex formation. In summary, I present a range of tailored computational methods and analyses developed to understand interactions between proteins and nucleic acids; aiming to link these interactions to functional outcomes

    Plant-parasitic nematodes: from genomics to functional analysis of parasitism genes

    Get PDF
    Nematodes (roundworms) belong to the largest phylum on earth. The numerous species inhabit practically all ecological niches, including plants. Plant-parasitic species live on plant roots, causing substantial damage to the plant and hampering its development. As such, they cause gigantic economical losses in crop production. We used a molecular approach to analyze the plant-parasitic nematode Radopholus similis by generating expressed sequence tags (ESTs). The most striking discovery was tags corresponding to aWolbachia-like endosymbiont, which was subsequently located in the ovaria of R. similis. Numerous tags corresponding to parasitism genes with potential roles in, amongst other things, host localisation, detoxification, cell wall modification, and even putative host transcriptional reprogramming were identified. In addition, a tool to explore all available nematode EST data is presented in this study. The ‘nematode EST exploration tool’ (NEXT) (http://zion.ugent.be/joachim/next) extends the usefulness by extracting and storing temporal and spatial information of all publicly available nematode EST libraries. Some members of the transthyretin-like gene family of R. similis were characterized. All stages except developing embryos express the analyzed genes, and expression is localized to the ventral nerve cord and tissues surrounding the vulva. Predicted secondary structure is suggestive of a binding capacity with a yet unknown ligand. Further, the annotation of the complete mitochondrial (mt) genome of R. similis is reported. The mt genome has the expected gene content, but shows many aberrant features such as: a considerably smaller 16S rRNA with reduced structures, two large repeat regions, the lack of stop codons on many genes and a unique codon reassignment UAA:Stop to UAA:Tyrosine. The aberrant features in the mt genome could be related to this codon reassignment, but results are ambiguous and require further research. A last part of the study reports on the response of the plant on nematode infection. Signaling of two plant hormones involved in plant defense is measured during early phases of parasitism. In addition, the role of flavonoid compounds produced by the plant is analyzed by infection tests on several mutants

    From condition-specific interactions towards the differential complexome of proteins

    Get PDF
    While capturing the transcriptomic state of a cell is a comparably simple effort with modern sequencing techniques, mapping protein interactomes and complexomes in a sample-specific manner is currently not feasible on a large scale. To understand crucial biological processes, however, knowledge on the physical interplay between proteins can be more interesting than just their mere expression. In this thesis, we present and demonstrate four software tools that unlock the cellular wiring in a condition-specific manner and promise a deeper understanding of what happens upon cell fate transitions. PPIXpress allows to exploit the abundance of existing expression data to generate specific interactomes, which can even consider alternative splicing events when protein isoforms can be related to the presence of causative protein domain interactions of an underlying model. As an addition to this work, we developed the convenient differential analysis tool PPICompare to determine rewiring events and their causes within the inferred interaction networks between grouped samples. Furthermore, we present a new implementation of the combinatorial protein complex prediction algorithm DACO that features a significantly reduced runtime. This improvement facilitates an application of the method for a large number of samples and the resulting sample-specific complexes can ultimately be assessed quantitatively with our novel differential protein complex analysis tool CompleXChange.Das Transkriptom einer Zelle ist mit modernen Sequenzierungstechniken vergleichsweise einfach zu erfassen. Die Ermittlung von Proteininteraktionen und -komplexen wiederum ist in großem Maßstab derzeit nicht möglich. Um wichtige biologische Prozesse zu verstehen, kann das Zusammenspiel von Proteinen jedoch erheblich interessanter sein als deren reine Expression. In dieser Arbeit stellen wir vier Software-Tools vor, die es ermöglichen solche Interaktionen zustandsbezogen zu betrachten und damit ein tieferes Verständnis darüber versprechen, was in der Zelle bei Veränderungen passiert. PPIXpress ermöglicht es vorhandene Expressionsdaten zu nutzen, um die aktiven Interaktionen in einem biologischen Kontext zu ermitteln. Wenn Proteinvarianten mit Interaktionen von Proteindomänen in Verbindung gebracht werden können, kann hierbei sogar alternatives Spleißen berücksichtigen werden. Als Ergänzung dazu haben wir das komfortable Differenzialanalyse-Tool PPICompare entwickelt, welches Veränderungen des Interaktoms und deren Ursachen zwischen gruppierten Proben bestimmen kann. Darüber hinaus stellen wir eine neue Implementierung des Proteinkomplex-Vorhersagealgorithmus DACO vor, die eine deutlich reduzierte Laufzeit aufweist. Diese Verbesserung ermöglicht die Anwendung der Methode auf eine große Anzahl von Proben. Die damit bestimmten probenspezifischen Komplexe können schließlich mit unserem neuartigen Differenzialanalyse-Tool CompleXChange quantitativ bewertet werden
    corecore