1,065 research outputs found

    Cyclin D1-mediated microRNA expression signature predicts breast cancer outcome

    Get PDF
    Background: Genetic classification of breast cancer based on the coding mRNA suggests the evolution of distinct subtypes. Whether the non-coding genome is altered concordantly with the coding genome and the mechanism by which the cell cycle directly controls the non-coding genome is poorly understood. Methods: Herein, the miRNA signature maintained by endogenous cyclin D1 in human breast cancer cells was defined. In order to determine the clinical significance of the cyclin D1-mediated miRNA signature, we defined a miRNA expression superset from 459 breast cancer samples. We compared the coding and non-coding genome of breast cancer subtypes. Results: Hierarchical clustering of human breast cancers defined four distinct miRNA clusters (G1-G4) associated with distinguishable relapse-free survival by Kaplan-Meier analysis. The cyclin D1-regulated miRNA signature included several oncomirs, was conserved in multiple breast cancer cell lines, was associated with the G2 tumor miRNA cluster, ERα+ status, better outcome and activation of the Wnt pathway. The coding and non-coding genome were discordant within breast cancer subtypes. Seed elements for cyclin D1-regulated miRNA were identified in 63 genes of the Wnt signaling pathway including DKK. Cyclin D1 restrained DKK1 via the 3\u27UTR. In vivo studies using inducible transgenics confirmed cyclin D1 induces Wnt-dependent gene expression. Conclusion: The non-coding genome defines breast cancer subtypes that are discordant with their coding genome subtype suggesting distinct evolutionary drivers within the tumors. Cyclin D1 orchestrates expression of a miRNA signature that induces Wnt/β-catenin signaling, therefore cyclin D1 serves both upstream and downstream of Wnt/β-catenin signaling

    The non-coding genome in Autism Spectrum Disorders

    Get PDF
    Autism Spectrum Disorders (ASD) are a group of neurodevelopmental disorders (NDDs) characterized by difficulties in social interaction and communication, repetitive behavior, and restricted interests. While ASD have been proven to have a strong genetic component, current research largely focuses on coding regions of the genome. However, non-coding DNA, which makes up for ∼99% of the human genome, has recently been recognized as an important contributor to the high heritability of ASD, and novel sequencing technologies have been a milestone in opening up new directions for the study of the gene regulatory networks embedded within the non-coding regions. Here, we summarize current progress on the contribution of non-coding alterations to the pathogenesis of ASD and provide an overview of existing methods allowing for the study of their functional relevance, discussing potential ways of unraveling ASD's “missing heritability”S

    Learning the Non-Coding Genome

    Get PDF
    The interpretation of the non-coding genome still constitutes a major challenge in the application of whole-genome sequencing. For example, disease and trait-associated variants represent a tiny minority of all known genetic variations, but millions of putatively neutral sites can be identified. In this context, machine learning (ML) methods for predicting disease-associated non-coding variants are faced with a chicken-and-egg problem – such variants cannot be easily found without ML, but ML cannot be applied efficiently until a sufficient number of instances have been found. Recent ML-based methods for variant prediction do not adopt specific imbalance-aware learning techniques to deal with imbalanced data that naturally arise in several genome-wide variant scoring problems, resulting in relatively poor performance with reduced sensitivity and precision. In this work, I present a ML algorithm, called hyperSMURF, that adopts imbalance-aware learning strategies based on resampling techniques and a hyper-ensemble approach which is able to handle extremely imbalanced datasets. It outperforms previous methods in the context of non-coding variants associated with Mendelian diseases or complex diseases. I show that imbalance-aware ML is a key issue for the design of robust and accurate prediction algorithms. Open-source implementations of hyperSMURF are available in R and Java, such that it can be applied effectively in other scientific projects to discover disease-associated variants out of millions of neutral sites from whole-genome sequencing. In addition the algorithm was used to create a new pathogenicity score for regulatory Mendelian mutations (ReMM score), which is significantly better than other commonly used scores to rank regulatory variants from rare genetic disorders. The score is integrated in Genomiser, an analysis framework that goes beyond scoring the relevance of variation in the non-coding genome. The tool is able to associate regulatory variants to specific Mendelian diseases. Genomiser scores variants through pathogenicity scores, like ReMM score for non-coding, and combines them with allele frequency, regulatory sequences, chromosomal topological domains, and phenotypic relevance to discover variants associated to specific Mendelian disorders. Overall, Genomiser is able to identify causal regulatory variants, allowing effective detection and discovery of regulatory variants in Mendelian disease.Bei der Genomsequenzierung stellt die Interpretation der nicht-kodierenden Bereiche des Genomes immer noch eine bedeutende Herausforderung dar. Im Vergleich zu den häufigen, meist neutralen, genetischen Veränderungen stellen Varianten, welche mit Krankheiten oder anderen Eigenschaften assoziiert sind, eine winzige Minderheit dar. In diesem Sinne stehen Methoden zur Vorhersage von nicht-kodierenden, krankheitsassozierten Varianten durch Maschinelles Lernen (ML) dem Henne-Ei-Problem gegenüber – solche Veränderungen sind ohne ML schwierig zu finden, aber ML ist meistens erst dann erfolgreich, wenn eine ausreichende Anzahl von Beispielen gefunden wurde. Die neuesten Methoden zur Vorhersage von Varianten durch ML integrieren keine speziellen Vorhersagetechniken um dieses Ungleichgewicht zu behandeln, was zu einer relativ schlechten Performanz mit reduzierter Sensitivität führt, da die zugrundeliegenden Anwendungen zur genomweiten Bewertung von Varianten nicht im Gleichgewicht sind. In dieser Arbeit stelle ich hyperSMURF vor, einen Algorithmus, der Verfahren zum Lernen von Daten mit extremer Differenz zwischen Observationsmengen benutzt, basierend auf Techniken zur Stichprobewiederholung und einer Hyper-Vereinigung. Im Bereich von nicht-kodierenden Varianten, welche mit Mendel’schen oder komplexen Erkrankungen assoziiert sind, übertrifft er vorherige Methoden. Ich zeige, dass das ML durch explizit entwickelte Techniken für Daten mit hohem Ungleichgewicht ein Schlüsselkonzept für eine robuste und genaue Vorhersage in diesem Bereich ist. HyperSMURF ist open-source und in R und Java implementiert und kann somit mühelos in anderen Wissenschaftsprojekten genutzt werden um krankheits-assoziierte Varianten unter Millionen von neutralen Veränderngen bei Genomsequenzierung zu finden. Des Weiteren wurde mit Hilfe des Algorithmus eine neue Bewertungsfunktion für Mendel’sche regulatorische Mutationen entwickelt (ReMM score). Sie ist signifikant besser als andere Bewertungen zum Erkennen von regulatorischen Varianten bei seltenen genetischen Funktions- störungen. ReMM score ist in dem Analyseframework Genomiser integriert, welches nicht nur kodierende, sondern auch relevante nicht-kodierende genomische Varianten bewertet und diese dann einer Erkrankung zuordnen kann. Genomiser benutzt hierfür Bewertungsfunktionen und kombiniert diese mit Allelefrequenzen, der Raumstruktur von Chromosomen und der phänotypischen Relevanz von Varianten zu bekannten Syndromen. Dadurch wird Genomiser zu einem effizienten Tool zur Entdeckung von neuen regulatorischen Varianten bei Medel’schen Erkrankungen

    The non-coding genome in early human development-Recent advancements

    Get PDF
    Not that long ago, the human genome was discovered to be mainly non-coding, that is comprised of DNA se-quences that do not code for proteins. The initial paradigm that non-coding is also non-functional was soon overturned and today the work to uncover the functions of non-coding DNA and RNA in human early embryogenesis has commenced. Early human development is characterized by large-scale changes in genomic activity and the transcriptome that are partly driven by the coordinated activation and repression of repetitive DNA elements scattered across the genome. Here we provide examples of recent novel discoveries of non-coding DNA and RNA interactions and mechanisms that ensure accurate non-coding activity during human maternal-to -zygotic transition and lineage segregation. These include studies on small and long non-coding RNAs, trans-posable element regulation, and RNA tailing in human oocytes and early embryos. High-throughput approaches to dissect the non-coding regulatory networks governing early human development are a foundation for func-tional studies of specific genomic elements and molecules that has only begun and will provide a wider un-derstanding of early human embryogenesis and causes of infertility.Peer reviewe

    Orion: Detecting regions of the human non-coding genome that are intolerant to variation using population genetics

    Get PDF
    There is broad agreement that genetic mutations occurring outside of the protein-coding regions play a key role in human disease. Despite this consensus, we are not yet capable of discerning which portions of non-coding sequence are important in the context of human disease. Here, we present Orion, an approach that detects regions of the non-coding genome that are depleted of variation, suggesting that the regions are intolerant of mutations and subject to purifying selection in the human lineage. We show that Orion is highly correlated with known intolerant regions as well as regions that harbor putatively pathogenic variation. This approach provides a mechanism to identify pathogenic variation in the human non-coding genome and will have immediate utility in the diagnostic interpretation of patient genomes and in large case control studies using whole-genome sequences

    Comparative transcriptomics of pathogenic and non-pathogenic Listeria species

    Get PDF
    Comparative RNA-seq analysis of two related pathogenic and non-pathogenic bacterial strains reveals a hidden layer of divergence in the non-coding genome as well as conserved, widespread regulatory structures called ‘Excludons', which mediate regulation through long non-coding antisense RNAs

    Non-coding genome contributions to the development and evolution of mammalian organs

    Get PDF
    Protein-coding sequences only cover 1-2% of a typical mammalian genome. The remaining non-coding space hides thousands of genomic elements, some of which act via their DNA sequence while others are transcribed into non-coding RNAs. Many well-characterized non-coding elements are involved in the regulation of other genes, a process essential for the emergence of different cell types and organs during development. Changes in the expression of conserved genes during development are in turn thought to facilitate evolutionary innovation in form and function. Thus, non-coding genomic elements are hypothesized to play important roles in developmental and evolutionary processes. However, challenges related to the identification and characterization of these elements, in particular in non-model organisms, has limited the study of their overall contributions to mammalian organ development and evolution. During my dissertation work, I addressed this gap by studying two major classes of non-coding elements, long non-coding RNAs (lncRNAs) and cis-regulatory elements (CREs). In the first part of my thesis, I analyzed the expression profiles of lncRNAs during the development of seven major organs in six mammals and a bird. I showed that, unlike protein-coding genes, only a small fraction of lncRNAs is expressed in reproducibly dynamic patterns during organ development. These lncRNAs are enriched for a series of features associated with functional relevance, including increased evolutionary conservation and regulatory complexity, highlighting them as candidates for further molecular characterization. I then associated these lncRNAs with specific genes and functions based on their spatiotemporal expression profiles. My analyses also revealed differences in lncRNA contributions across organs and developmental stages, identifying a developmental transition from broadly expressed and conserved lncRNAs towards an increasing number of lineage- and organ-specific lncRNAs. Following up on these global analyses, I then focused on a newly-identified lncRNA in the marsupial opossum, Female Specific on chromosome X (FSX). The broad and likely autonomous female-specific expression of FSX suggests a role in marsupial X-chromosome inactivation (XCI). I showed that FSX shares many expression and sequence features with another lncRNA, RSX — a known regulator of XCI in marsupials. Comparisons to other marsupials revealed that both RSX and FSX emerged in the common marsupial ancestor and have since been preserved in marsupial genomes, while their broad and female-specific expression has been retained for at least 76 million years of evolution. Taken together, my analyses highlighted FSX as a novel candidate for regulating marsupial XCI. In the third part of this work, I shifted my focus to CREs and their cell type-specific activities in the developing mouse cerebellum. After annotating cerebellar cell types and states based on single-cell chromatin accessibility data, I identified putative CREs and characterized their spatiotemporal activity across cell types and developmental stages. Focusing on progenitor cells, I described temporal changes in CRE activity that are shared between early germinal zones, supporting a model of cell fate induction through common developmental cues. By examining chromatin accessibility dynamics during neuronal differentiation, I revealed a gradual divergence in the regulatory programs of major cerebellar neuron types. In the final part, I explored the evolutionary histories of CREs and their potential contributions to gene expression changes between species. By comparing mouse CREs to vertebrate genomes and chromatin accessibility profiles from the marsupial opossum, I identified a temporal decrease in CRE conservation, which is shared across cerebellar cell types. However, I also found differences in constraint between cell types, with microglia having the fastest evolving CREs in the mouse cerebellum. Finally, I used deep learning models to study the regulatory grammar of cerebellar cell types in human and mouse, showing that the sequence rules determining CRE activity are conserved across mammals. I then used these models to retrace the evolutionary changes leading to divergent CRE activity between species. Collectively, my PhD work provides insights into the evolutionary dynamics of non-coding genes and regulatory elements, the processes associated with their conservation, and their contributions to the development and evolution of mammalian cell types and organs
    corecore