13,351 research outputs found

    Gene expression regulation in allopolyploid fish

    Get PDF
    Plants, invertebrates and even lower vertebrates are known to deal with hybridization and polyploidy very successfully, surpassing the genetic constrains those phenomena bring. However, (allo)polyploidy in animals have been strongly neglected so, this matter remains largely unexplored. In that sense, the general goal of this thesis was to expand the existing limited knowledge on the topic, standing a significant step forward in the scarce information available on animal allopolyploid gene expression regulation. The inception of this work was a theory of occurrence of global dosage compensation by allele copy silencing in Squalius alburnoides complex. The elucidation of the inherent gene expression processes and mechanisms operating in S. alburnoides, and if they are a particular feature of this complex or have a more widespread occurrence among allopolyploids, were the main goals. The first step taken was the exclusion of ploidy mosaicism, a phenomenon here for the first time described to occur in S. alburnoides, as the source of the allele specific expression differences previously found. Despite it was corroborated that S. alburnoides triploids are affected by a significant down regulation of gene expression, that does not correspond to a genome wide exact functional diploidization. Instead, a certain level of flexibility of expression within a range of mRNA amounts per locus was observed. That feature might be a key point in the mechanisms that allow lower vertebrates to endure and maintain ploidy changes so effectively. The down regulation of gene expression in triploid S. alburnoides was also found to be not dependent of allele copy silencing, as previously speculated. Extreme homoeolog expression bias, comprehending the complete silencing of alleles, have been found to affect a significant percentage of genes in S. alburnoides, as in laboratory produced triploid hybrid Oryzias latipes. However, the incidence of the homoeolog expression bias was not significantly affected by the ploidy level of the individuals, and the allelic silencing rate was similar between diploids and triploids. Additionally, the hypothesis of a down regulation of gene expression mediated by massive methylation occurrence in triploid hybrid genomes was not sustained, neither for S. alburnoides nor for P. formosa

    Computational models of gene expression regulation

    Get PDF
    Throughout the last several decades, many efforts have been put into elucidating the genetic or epigenetic defects that result in various diseases. Gene regulation, i.e., the process of how genes are turned on and off in the right place and at the right time, is a paramount and prevailing question for researchers. Thanks to the discoveries made by researchers in this field, our understanding of interactions between proteins and DNA or proteins with themselves, as well as the dynamics of chromatin structure under different conditions, have substantially advanced. Even though there has been a lot achieved through these discoveries, there are still many unknown aspects about gene regulation. For instance, proteins called transcription factors (TFs) recognize and bind to specific regions of DNA and recruit the transcriptional machinery, which is essential for gene regulation. As there have been more than 2000 TFs identified in the human genome, it is important to study where they bind to or which genes they target. Computational approaches are important, in particular, as the biological experiments are often very expensive and cannot be done for all TFs. In 2016, a competition named DREAM Challenge was held encouraging researchers to develop novel computational tools for predicting the binding sites of several TFs. The first chapter of this thesis describes our machine learning approach to address this challenge within the scope of the competition. Using ensembles of random forest classifiers, we formulated our framework such that it is able to benefit from the tissue specificity inherent in the data leading to better generalization. Also, our models were tailored for spotting cofactors involved in the binding of TFs of interest. Comparing the important TFs that our computational models suggested with protein-protein association networks revealed that the models preferentially select motifs of TFs that are potential interaction partners in those networks. Another important aspect beyond predicting TF binding is to link epigeneomics, such as histone modification (HM) data, with gene expression. We, particularly, concentrated on predicting expression in a subset of genes called bidirectional. Bidirectional genes are referred to as pairs of genes that are located on opposite strands of DNA close to each other. As the sequencing technologies advance, more such bidirectional configurations are being detected. This indicates that in order to understand the gene regulatory mechanisms, it would be beneficial to account for such promoter architectures. In the second and third chapters, we focused on genes having bidirectional promoter architectures utilizing high resolution epigenomic signatures and single cell RNA-seq data to dissect the complex epigenetic architecture at these promoters. Using single-cell RNA-seq data as the estimate of gene expression, we were able to generate a hypothetical model for gene regulation in bidirectional promoters. We showed that bidirectional promoters can be categorized into three architecture types with distinct characteristics. Each of these categories corresponds to a unique gene expression profile at single cell level. The single cell RNA-seq data proved to be a powerful means for studying gene regulation. Therefore, in the last chapter, we proposed a novel approach for predicting gene expression at the single cell level using cis-regulatory motifs as well as epigenetic features. To achieve this, we designed a tree-guided multi-task learning framework that considers each cell as a task. Through this framework we were able to explain the single cell gene expression values using either TF binding affinities or TF ChIP-seq data measured at specific genomic regions. This allowed us to identify distinct TFs that show cell-type specific regulation in induced pluripotent stem cells. Our approach does not only limit to TFs, rather it can take any type of data that can potentially be used in explaining gene expression at single cell level. We believe that our findings can be used in drug discovery and development that can regulate the presence of TFs or other regulatory factors, which lead the cell fate into abnormal states, to prevent or cure diseases.In den letzten Jahrzehnten wurden große Anstrengungen unternommen, um die genetischen oder epigenetischen Defekte aufzuklären, die zu verschiedenen Krankheiten führen. Die Genregulation, d.h. der Prozess der Ein- und Abschaltung der Gene am richtigen Ort und zur richtigen Zeit reguliert, ist für die Forscher eine Frage von zentraler Bedeutung. Dank der Entdeckungen von Forschern auf diesem Gebiet ist unser Verständnis der Wechselwirkungen zwischen zwischen den Proteinen und der DNA oder der Proteine untereinander sowie der Dynamik der Chromatinstruktur unter verschiedenen Bedingungen wesentlich fortgeschritten. Obwohl durch diese Entdeckungen viel erreicht wurde, gibt es noch viele unbekannte Aspekte der Genregulation. Beispielsweise erkennen Proteine, sogenannte Transkriptionsfaktoren (Transcription Factors, TFs), bestimmte Bereiche der DNA und binden an diese und rekrutieren die Transkriptionsmaschinerie, die für die Genregulation erforderlich ist. Da mehr als 2000 TFs im menschlichen Genom identifiziert wurden, ist es wichtig zu untersuchen, wo sie binden oder auf welche Gene sie abzielen. Rechnerische Ansätze sind insbesondere wichtig, da die biologischen Experimente oft sehr teuer sind und nicht für alle TFs durchgeführt werden können. Im Jahr 2016 fand ein Wettbewerb namens DREAM Challenge statt, bei dem Forscher aufgefordert wurden, neuartige Rechenwerkzeuge zur Vorhersage der Bindungsstellen mehrerer TFs zu entwickeln. Das erste Kapitel dieser Arbeit beschreibt unseren Ansatz des maschinellen Lernens, um diese Herausforderung im Rahmen des Wettbewerbs anzugehen. Unter Verwendung von Ensembles von Random Forest Klassifikatoren haben wir unser Framework so formuliert, dass es von der Gewebespezifität der Daten profitiert und damit zu einer besseren Generalisierung führt. Außerdem wurden unsere Modelle auf das Erkennen von Kofaktoren angepasst, die an der Bindung von TFs beteiligt sind, die für uns von Interesse sind. Der Vergleich der wichtigen TFs, die unsere Computermodelle mit Protein-Protein-Assoziationsnetzwerken vorschlugen, ergab, dass die Modelle bevorzugt Motive von TFs auswählen, die potenzielle Interaktionspartner in diesen Netzwerken sind. Ein weiterer wichtiger Aspekt, der über die Vorhersage der TF-Bindung hinausgeht, besteht darin, epigeneomische Faktoren wie Histonmodifikationsdaten (HM-Daten) mit der Genexpression zu verknüpfen. Wir konzentrierten uns insbesondere auf die Vorhersage der Expression in einer Untergruppe von Genen, die als bidirektional bezeichnet werden. Bidirektionale Gene werden als Paare von Genen bezeichnet, die sich auf gegenüberliegenden DNA-Strängen befinden und nahe beieinander liegen. Mit dem Fortschritt der Sequenzierungstechnologien werden immer mehr solche bidirektionalen Konfigurationen erkannt. Dies weist darauf hin, dass es zum Verständnis der Genregulationsmechanismen vorteilhaft wäre, solche Promotorarchitekturen zu berücksichtigen. Im zweiten und dritten Kapitel konzentrierten wir uns auf Gene mit bidirektionalen Promotorarchitekturen, um mit Hilfe von epigenomischen Signaturen und Einzelzell-RNA-Sequenzdaten die komplexe epigenetische Architektur an diesen Promotoren zu analysieren. Unter Verwendung von Einzelzell-RNA-Sequenzdaten als Schätzung der Genexpression konnten wir ein hypothetisches Modell für die Genregulation in bidirektionalen Promotoren aufstellen. Wir haben gezeigt, dass bidirektionale Promotoren in drei Architekturtypen mit unterschiedlichen Merkmalen eingeteilt werden können. Jede dieser Kategorien entspricht einem eindeutigen Genexpressionsprofil auf Einzelzellebene. Die Einzelzell-RNA-Sequenzdaten erwiesen sich als leistungsstarkes Mittel zur Untersuchung der Genregulation. Daher haben wir im letzten Kapitel einen neuen Ansatz zur Vorhersage der Genexpression auf Einzelzellebene unter Verwendung von cis-regulatorischen Motiven sowie epigenetischen Merkmalen vorgeschlagen. Um dies zu erreichen, haben wir ein baumgesteuertes Multitasking-Lernsystem entwickelt, das jede Zelle als eine Aufgabe betrachtet. Durch dieses Gerüst konnten wir die Einzelzellgenexpressionswerte entweder mit TF-Bindungsaffinitäten oder mit TF-ChIP-Sequenzdaten erklären, die in bestimmten Genomregionen gemessen wurden. Dies ermöglichte es uns, verschiedene TFs zu identifizieren, die eine zelltypspezifische Regulation in induzierten pluripotenten Stammzellen zeigen. Unser Ansatz beschränkt sich nicht nur auf TFs, sondern kann jede Art von Daten verwenden, die potentiell zur Erklärung der Genexpression auf Einzelzellebene verwendet werden können. Wir glauben, dass unsere Erkenntnisse für die Entdeckung und Entwicklung von Arzneimitteln verwendet werden können, die das Vorhandensein von TFs oder anderen regulatorischen Faktoren regulieren können, die die Zellen abnormal werden lassen, um Krankheiten zu verhindern oder zu heilen

    The context of gene expression regulation

    Get PDF
    Recent advances in sequencing technologies have uncovered a world of RNAs that do not code for proteins, known as non-protein coding RNAs, that play important roles in gene regulation. Along with histone modifications and transcription factors, non-coding RNA is part of a layer of transcriptional control on top of the DNA code. This layer of components and their interactions specifically enables (or disables) the modulation of three-dimensional folding of chromatin to create a context for transcriptional regulation that underlies cell-specific transcription. In this perspective, we propose a structural and functional hierarchy, in which the DNA code, proteins and non-coding RNAs act as context creators to fold chromosomes and regulate genes

    Gene expression regulation in pneumoviruses

    Get PDF
    Members of the Pneumoviridae virus family are responsible for severe respiratory tract disease in their hosts. Human respiratory syncytial virus (hRSV) is responsible for over 200,000 deaths worldwide each year and bovine respiratory syncytial virus (bRSV) causes major economic loss to the cattle industry worldwide. The current model for all nonsegmented negative-sense single stranded RNA virus gene expression, is that mRNA is generated in a polar gradient, with decreasing levels of mRNA transcribed from genes further along the genome from the 3´ end. With the exception of translation of ORF-2 located on the bicistronic M2 mRNA, translation of Pneumoviridae mRNAs is thought to be regulated through the levels of mRNA abundance. Translation of M2 ORF-2 has been characterised as being regulated by the non-canonical mechanism of coupled translation termination/initiation in pneumonia virus of mice (PVM), hRSV and avian metapneumovirus (APV). This mechanism is reliant on a proportion of the elongating ribosome translating the upstream M2 ORF-1, terminating and reinitiating translation of M2 ORF-2. Although the initiation site for M2 ORF-2 is similar in bRSV to other members of this family that use the mechanism of coupled translation, the mechanism has not been characterised. Using the technique of ribosomal profiling to analyse steady state viral mRNA abundance and viral translation in both hRSV and bRSV-infected cells, it was observed that for certain viral mRNAs, levels of mRNA abundance did not follow the standard polar transcription model. This was characterised by an increase in the levels of mRNA abundance between the mRNA’s respective gene and its upstream neighbour. The increase was observed in the same group of mRNAs in both viruses suggesting that factors other than the transcription polar gradient influence levels of viral mRNA abundance. It was also observed that levels of proportional translation did not match the respective proportional levels of mRNA abundance for certain viral mRNAs in both viruses. This would suggest that translation of viral genomes is not primarily controlled by mRNA abundance and instead other translational regulatory factors influence levels of translation. The mechanism of bRSV M2 ORF-2 translation was also characterised using reporter plasmids assays. It was identified that the mechanism of initiation of translation of M2 ORF2 used, was not that of coupled translation termination/initiation used by other members of this family. Instead it was observed that translation of M2 ORF-2 used an internal initiation mechanism located inside M2 ORF-1 to initiate translation. The mechanism of coupled translation termination/initiation used for translation of PVM M2 ORF-2 was also further characterised. It was observed that translation of M2 ORF-2 was reliant on upstream sequence in the M2 ORF-1 sequence. A predicted mRNA secondary structure was identified in this region and when disrupted, inhibited translation of M2 ORF2. This was similar to the mechanism of coupled translation used in hRSV, suggesting that the mechanism used by this family is reliant on a mRNA secondary structure located upstream of the initiation site

    Small RNA Profile in Moso Bamboo Root and Leaf Obtained by High Definition Adapters

    Get PDF
    Moso bamboo (Phyllostachy heterocycla cv. pubescens L.) is an economically important fast-growing tree. In order to gain better understanding of gene expression regulation in this important species we used next generation sequencing to profile small RNAs in leaf and roots of young seedlings. Since standard kits to produce cDNA of small RNAs are biased for certain small RNAs, we used High Definition adapters that reduce ligation bias. We identified and experimentally validated five new microRNAs and a few other small non-coding RNAs that were not microRNAs. The biological implication of microRNA expression levels and targets of microRNAs are discussed

    Nuclear speckles: dynamic hubs of gene expression regulation

    Get PDF
    Complex, multistep biochemical reactions that routinely take place in our cells require high concentrations of enzymes, substrates, and other structural components to proceed efficiently and typically require chemical environments that can inhibit other reactions in their immediate vicinity. Eukaryotic cells solve these problems by restricting such reactions into diffusion-restricted compartments within the cell called organelles that can be separated from their environment by a lipid membrane, or into membrane-less compartments that form through liquid–liquid phase separation (LLPS). One of the most easily noticeable and the earliest discovered organelle is the nucleus, which harbors the genetic material in cells where transcription by RNA polymerases produces most of the messenger RNAs and a plethora of noncoding RNAs, which in turn are required for translation of mRNAs in the cytoplasm. The interior of the nucleus is not a uniform soup of biomolecules and rather consists of a variety of membrane-less bodies, such as the nucleolus, nuclear speckles (NS), paraspeckles, Cajal bodies, histone locus bodies, and more. In this review, we will focus on NS with an emphasis on recent developments including our own findings about the formation of NS by two large IDR-rich proteins SON and SRRM2

    RNA Polymerase II Phosphorylation and Gene Expression Regulation

    Get PDF
    RNA polymerases (RNAPs) are among the most important cellular enzymes. They are present in all living organisms from Bacteria and Archaea to Eukarya and are responsible for DNA-dependent transcription. Although in Bacteria and Archaea there is only one RNAP, Eukarya possess up to three RNAPs in animals (I, II and III) and five in plants (IV and V). All of the RNAPs are evolutionarily related and have common structural and functional properties.This work was supported by a grant from the Spanish Ministerio de Ciencia e Innovación (BFU 2009-07179) to OC. AG was supported by a fellowship from the Junta de Castilla y León. The IBFG acknowledges support from “Ramón Areces Foundation”.Peer reviewe

    Differential timing of gene expression regulation between leptocephali of the two Anguilla eel species in the Sargasso Sea

    Get PDF
    The unique life-history characteristics of North Atlantic catadromous eels have long intrigued evolutionary biologists, especially with respect to mechanisms that could explain their persistence as two ecologically very similar but reproductively and geographically distinct species. Differential developmental schedules during young larval stages have commonly been hypothesized to represent such a key mechanism. We performed a comparative analysis of gene expression by means of microarray experiments with American and European eel leptocephali collected in the Sargasso Sea in order to test the alternative hypotheses of (1) differential timing of gene expression regulation during early development versus (2) species-specific differences in expression of particular genes. Our results provide much stronger support for the former hypothesis since no gene showed consistent significant differences in expression levels between the two species. In contrast, 146 genes showed differential timings of expression between species, although the observed expression level differences between the species were generally small. Consequently, species-specific gene expression regulation seems to play a minor role in species differentiation. Overall, these results show that the basis of the early developmental divergence between the American and European eel is probably influenced by differences in the timing of gene expression regulation for genes involved in a large array of biological functions

    (Im) Perfect robustness and adaptation of metabolic networks subject to metabolic and gene-expression regulation: marrying control engineering with metabolic control analysis

    Get PDF
    Background: Metabolic control analysis (MCA) and supply–demand theory have led to appreciable understanding of the systems properties of metabolic networks that are subject exclusively to metabolic regulation. Supply–demand theory has not yet considered gene-expression regulation explicitly whilst a variant of MCA, i.e. Hierarchical Control Analysis (HCA), has done so. Existing analyses based on control engineering approaches have not been very explicit about whether metabolic or gene-expression regulation would be involved, but designed different ways in which regulation could be organized, with the potential of causing adaptation to be perfect. Results: This study integrates control engineering and classical MCA augmented with supply–demand theory and HCA. Because gene-expression regulation involves time integration, it is identified as a natural instantiation of the ‘integral control’ (or near integral control) known in control engineering. This study then focuses on robustness against and adaptation to perturbations of process activities in the network, which could result from environmental perturbations, mutations or slow noise. It is shown however that this type of ‘integral control’ should rarely be expected to lead to the ‘perfect adaptation’: although the gene-expression regulation increases the robustness of important metabolite concentrations, it rarely makes them infinitely robust. For perfect adaptation to occur, the protein degradation reactions should be zero order in the concentration of the protein, which may be rare biologically for cells growing steadily. Conclusions: A proposed new framework integrating the methodologies of control engineering and metabolic and hierarchical control analysis, improves the understanding of biological systems that are regulated both metabolically and by gene expression. In particular, the new approach enables one to address the issue whether the intracellular biochemical networks that have been and are being identified by genomics and systems biology, correspond to the ‘perfect’ regulatory structures designed by control engineering vis-à-vis optimal functions such as robustness. To the extent that they are not, the analyses suggest how they may become so and this in turn should facilitate synthetic biology and metabolic engineering

    Long non-coding RNA expression profiling in the NCI60 cancer cell line panel using high-throughput RT-qPCR

    Get PDF
    Long non-coding RNAs (lncRNAs) form a new class of RNA molecules implicated in various aspects of protein coding gene expression regulation. To study lncRNAs in cancer, we generated expression profiles for 1707 human lncRNAs in the NCI60 cancer cell line panel using a high-throughput nanowell RT-qPCR platform. We describe how qPCR assays were designed and validated and provide processed and normalized expression data for further analysis. Data quality is demonstrated by matching the lncRNA expression profiles with phenotypic and genomic characteristics of the cancer cell lines. This data set can be integrated with publicly available omics and pharmacological data sets to uncover novel associations between lncRNA expression and mRNA expression, miRNA expression, DNA copy number, protein coding gene mutation status or drug response
    corecore