144 research outputs found

    CORENup: a combination of convolutional and recurrent deep neural networks for nucleosome positioning identification

    Get PDF
    Background: Nucleosomes wrap the DNA into the nucleus of the Eukaryote cell and regulate its transcription phase. Several studies indicate that nucleosomes are determined by the combined effects of several factors, including DNA sequence organization. Interestingly, the identification of nucleosomes on a genomic scale has been successfully performed by computational methods using DNA sequence as input data. Results: In this work, we propose CORENup, a deep learning model for nucleosome identification. CORENup processes a DNA sequence as input using one-hot representation and combines in a parallel fashion a fully convolutional neural network and a recurrent layer. These two parallel levels are devoted to catching both non-periodic and periodic DNA string features. A dense layer is devoted to their combination to give a final classification. Conclusions: Results computed on public data sets of different organisms show that CORENup is a state of the art methodology for nucleosome positioning identification based on a Deep Neural Network architecture. The comparisons have been carried out using two groups of datasets, currently adopted by the best performing methods, and CORENup has shown top performance both in terms of classification metrics and elapsed computation time

    Systematic clustering of transcription start site landscapes

    No full text
    Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earlier studies have shown that the TSSDs have biological implications in both regulation and function. However, no systematic study has been made to explore how many types of TSSDs and by extension core promoters exist and to understand which biological features distinguish them. In this study, we developed a new non-parametric dissimilarity measure and clustering approach to explore the similarities and stabilities of clusters of TSSDs. Previous studies have used arbitrary thresholds to arrive at two general classes: broad and sharp. We demonstrated that in addition to the previous broad/sharp dichotomy an additional category of promoters exists. Unlike typical TATA-driven sharp TSSDs where the TSS position can vary a few nucleotides, in this category virtually all TSSs originate from the same genomic position. These promoters lack epigenetic signatures of typical mRNA promoters and a substantial subset of them are mapping upstream of ribosomal protein pseudogenes. We present evidence that these are likely mapping errors, which have confounded earlier analyses, due to the high similarity of ribosomal gene promoters in combination with known G addition bias in the CAGE libraries. Thus, previous two-class separations of promoter based on TSS distributions are motivated, but the ultra-sharp TSS distributions will confound downstream analyses if not removed.This work was supported by a grant from the Novo Nordisk Foundation, http://www.novonordiskfonden.dk/. The European Research Council (http:// erc.europa.eu/) has provided financial support to Dr. Sandelin under the EU 7th Framework Programme (FP7/2007-2013)/ERC grant agreement 204135

    Systematic clustering of transcription start site landscapes

    Get PDF
    Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earlier studies have shown that the TSSDs have biological implications in both regulation and function. However, no systematic study has been made to explore how many types of TSSDs and by extension core promoters exist and to understand which biological features distinguish them. In this study, we developed a new non-parametric dissimilarity measure and clustering approach to explore the similarities and stabilities of clusters of TSSDs. Previous studies have used arbitrary thresholds to arrive at two general classes: broad and sharp. We demonstrated that in addition to the previous broad/sharp dichotomy an additional category of promoters exists. Unlike typical TATA-driven sharp TSSDs where the TSS position can vary a few nucleotides, in this category virtually all TSSs originate from the same genomic position. These promoters lack epigenetic signatures of typical mRNA promoters and a substantial subset of them are mapping upstream of ribosomal protein pseudogenes. We present evidence that these are likely mapping errors, which have confounded earlier analyses, due to the high similarity of ribosomal gene promoters in combination with known G addition bias in the CAGE libraries. Thus, previous two-class separations of promoter based on TSS distributions are motivated, but the ultra-sharp TSS distributions will confound downstream analyses if not removed

    Computational analysis of nucleosome positioning datasets

    Get PDF
    Chromatin is a complex of DNA and histone proteins that constitutes the elemental material of eukaryotic chromosomes. The basic repeating sub-unit of chromatin, the nucleosome core particle, is comprised of approximately 146 base pairs (bp) of DNA wrapped around an octamer of core histones. Core particles are joined together by variable lengths of linker DNA to form chains of nucleosomes that are folded into higher-order structures. The specific distribution of nucleosomes along the DNA fibre is known to influence this folding process. Furthermore, on a local level, the positioning of nucleosomes can control access to DNA sequence motifs, and thus plays a fundamental role in regulating gene expression. Despite considerable experimental effort, neither the folding process nor the mechanisms for gene regulation are currently well understood.Monomer extension (ME) is an established in vitro experimental technique which maps the positions adopted by reconstituted core histone octamers on a defined DNA sequence. It provides quantitative positioning information, at high resolution, over long continuous stretches of DNA sequence. This technique has been employed to map several genes: globin genes (8 kbp), the beta-lactoglobulin gene (10 kbp) and various imprinting genes (4 kbp).This study explores and analyses this unique dataset, utilising computational and stochastic techniques, to gain insight into the potential influence of nucleosomal positioning on the structure and function of chromatin. The first section of this thesis expands upon prior analyses, explores general features of the dataset using common bioinformatics tools, and attempts to relate the quantitative positioning information from ME to data from other commonly used competitive reconstitution protocols. Finally, evidence of a correlation between the in vitro ME dataset and in vivo nucleosome positions for the beta-lactoglobulin gene region is presented.The second section presents the development of a novel method for the analysis of ME maps using Monte Carlo simulation methods. The goal was to use the ME datasets to simulate a higher order chromatin fibre, taking advantage of the longrange and quantitative nature of the ME datasets.The Monte Carlo simulations have allowed new insights to be gleaned from the datasets. Analysis of the beta-lactoglobulin positioning map indicates the potential for discrete disruption of nucleosomal organisation, at specific physiological nucleosome densities, over regions found to have unusual chromatin structure in vivo. This suggests a correspondence between the quantitative histone octamer positioning information in vitro and the positioning of nucleosomes in vivo.Further, the simulations demonstrate that histone density-dependent changes in nucleosomal organisation, in both the beta-lactoglobulin and globin positioning maps, often occur in regions involved in gene regulation. This implies that irregular chromatin structures may form over certain biologically significant regions.Taken together, these studies lend weight to the hypothesis that nucleosome positioning information encoded within DNA plays a fundamental role in directing chromatin structure in vivo

    Assembling pieces of the centromere epigenetics puzzle

    Get PDF
    The centromere is a key region for cell division where the kinetochore assembles, recognizes and attaches to microtubules so that each sister chromatid can segregate to each daughter cell. The centromeric chromatin is a unique rigid chromatin state promoted by the presence of the histone H3 variant CENP-A, in which epigenetic histone modifications of both heterochromatin or euchromatin states and associated protein elements are present. Although DNA sequence is not regarded as important for the establishment of centromere chromatin, it has become clear that this structure is formed as a result of a highly regulated epigenetic event that leads to the recruitment and stability of kinetochore proteins. We describe an integrative model for epigenetic processes that conform regional chromatin interactions indispensable for the recruitment and stability of kinetochore proteins. If alterations of these chromatin regions occur, chromosomal instability is promoted, although segregation may still take place

    Identification of co-regulated candidate genes by promoter analysis.

    Get PDF
    EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Functional Chromatin Extraction: A method to study DNA accessibility in higher-order structures of chromatin

    Get PDF
    Inside the cell nucleus DNA is compacted through the assembly with histones and other proteins into chromatin. The first layer of DNA packaging is the nucleosome core particle, which consists of DNA wrapped around a histone octamer. Nucleosome core particles are spaced by linker DNA forming a ”beads-on-a-string” structure. According to the textbook model nucleosome arrays are further regularly folded into distinct higher-order structures of chromatin. Since all DNA dependent processes require access to the DNA template, chromatin organization and folding into higher-order structures is thought to regulate genome activity. This thesis investigates how chromatin is structurally organized inside the nucleus to modulate DNA accessibility. A high-throughput approach, called Functional Chromatin Extraction, was developed to analyze DNA accessibility in native chromatin. Therefore, chromatin was digested with different intensities directly inside the nucleus of living cells using the endonuclease MNase. DNA accessibility was assessed on both global and local scale by the differential release rates of nucleosomes from partially (low-MNase) and fully digested chromatin (high-MNase). Thorough analysis of the extracted nucleosomal DNA revealed, that AT rich nucleosomes are prone to over-degradation to sub-nucleosomal fragments in high-MNase. Therefore, nucleosomes of GC rich regions are overrepresented in high-MNase. In contrast, low-MNase results in a homogenous nucleosome distribution not affected by the DNA sequence, thereby obtaining an accurate representation of the global nucleosome landscape. Surprisingly, after correcting for the sequence preferences of MNase, differentially accessible chromatin domains could not be identified. Euchromatin and heterochromatin exhibit similar accessibilities, suggesting that DNA in heterochromatin is in general available for small molecules, like transcription factors. Nevertheless, active regulatory sites, such as promoter and enhancer elements, reveal increased accessibility compared to other regions of the genome and are occupied by fragile nucleosomes showing, that DNA accessibility is modulated locally to regulate gene expression. In summary, the results of this study indicate, that chromatin forms an accessible and dynamic polymer and domains of higher-order structures of chromatin do not exist in human cells. In a second chapter, this thesis focuses on the chromatin architecture of Adenoviruses and dynamic changes during early infection. Similarly to eukaryotic genomes, adenoviral DNA in incoming virions is mainly associated with the structural protein VII (pVII) forming a nucleoprotein complex. However, little is known about the adenoviral chromatin organization and how it relates to viral gene activation during infection. Functional Chromatin Extraction combined with transcriptome sequencing was applied during early infection of human cells. The viral DNA organization into pVII complexes was assessed, showing a defined and functional DNA packaging into nucleosome-like arrays. The chromatin structure of invading viruses correlates with the spatiotemporal activation of viral genes showing an open chromatin conformation with lower pVII densities at early gene loci. Investigation of dynamic chromatin changes within the first four hours of infection, revealed viral chromatin de-condensation and nucleosome assembly preferentially at early gene loci. Remarkably, nucleosomes replace pVII molecules directly at the +1 site of early genes thereby resembling the structure of active host promoter. The time resolved analysis demonstrated, that remodeling of the viral chromatin precedes transcriptional activation and is a prerequisite to generate a transcription competent template

    Elastic Network Models in Biology: From Protein Mode Spectra to Chromatin Dynamics

    Get PDF
    Biomacromolecules perform their functions by accessing conformations energetically favored by their structure-encoded equilibrium dynamics. Elastic network model (ENM) analysis has been widely used to decompose the equilibrium dynamics of a given molecule into a spectrum of modes of motions, which separates robust, global motions from local fluctuations. The scalability and flexibility of the ENMs permit us to efficiently analyze the spectral dynamics of large systems or perform comparative analysis for large datasets of structures. I showed in this thesis how ENMs can be adapted (1) to analyze protein superfamilies that share similar tertiary structures but may differ in their sequence and functional dynamics, and (2) to analyze chromatin dynamics using contact data from Hi-C experiments, and (3) to perform a comparative analysis of genome topology across different types of cell lines. The first study showed that protein family members share conserved, highly cooperative (global) modes of motion. A low-to-intermediate frequency spectral regime was shown to have a maximal impact on the functional differentiation of families into subfamilies. The second study demonstrated the Gaussian Network Model (GNM) can accurately model chromosomal mobility and couplings between genomic loci at multiple scales: it can quantify the spatial fluctuations in the positions of gene loci, detect large genomic compartments and smaller topologically-associating domains (TADs) that undergo en bloc movements, and identify dynamically coupled distal regions along the chromosomes. The third study revealed close similarities between chromosomal dynamics across different cell lines on a global scale, but notable cell-specific variations in the spatial fluctuations of genomic loci. It also called attention to the role of the intrinsic spatial dynamics of chromatin as a determinant of cell differentiation. Together, these studies provide a comprehensive view of the versatility and utility of the ENMs in analyzing spatial dynamics of biomolecules, from individual proteins to the entire chromatin

    DISSECTION OF THE MECHANISMS CONTROLLING HIGH CONSTITUTIVE ACTIVITY OF HOUSEKEEPING AND TISSUE-SPECIFIC CIS-REGULATORY ELEMENTS

    Get PDF
    The genetic information is identical within the organism but the mechanisms by which different cell types achieve specialized functions interpreting the same set of instructions is not completely understood. It is now increasingly accepted that the combination of different genomic elements, both promoters and enhancers, favors the recruitment of different TFs, which in turn promotes the assembly of different pre-initiation complexes, guaranteeing heterogeneity in transcriptional outputs across different tissues. Nevertheless, the cis-regulatory elements and the transcriptional rules that control and maintain the expression of constitutively active genes are still poorly characterized. Specifically, whether the constitutive activity of promoters and enhancers relies on entirely distinct or instead shared regulators is unknown. By dissecting the cis-regulatory repertoire of macrophages, we found that the ELF subfamily of ETS proteins selectively bound within 60 bp from the transcription start sites of highly active housekeeping genes. ELFs also bound constitutively active, but not poised macrophage-specific enhancers and promoters. The role of ELFs in promoting constitutive transcription is suggested by multiple evidences: ELF sites enabled transcriptional activation by endogenous and minimal synthetic promoters; ELF recruitment was stabilized by the transcriptional machinery, and ELF proteins mediated recruitment of transcriptional and chromatin regulators to core promoters. These data indicate that a distinct subfamily of ETS proteins imparts high transcriptional activity to a broad range of housekeeping and tissue-specific cis-regulatory elements, which is consistent with the role of an ETS family ancestor in core promoter regulation in a lower eukaryote
    • 

    corecore