342 research outputs found

    Global and specific responses of the histone acetylene to systematic perturbation

    Get PDF
    Regulation of histone acetylation is fundamental to the utilization of eukaryotic genomes in chromatin. Aberrant acetylation contributes to disease and can be clinically combated by inhibiting the responsible enzymes. Our knowledge of the histone acetylation system is patchy because we so far lacked themethodology to describe acetylation patterns and their genesis by integrated enzyme activities. We devised a generally applicable, mass spectrometry-based strategy to precisely and accurately quantify combinatorial modification motifs. This was applied to generate a comprehensive inventory of acetylation motifs on histones H3 and H4 in Drosophila cells. Systematic depletion of known or suspected acetyltransferases and deacetylases revealed specific alterations of histone acetylation signatures, established enzyme-substrate relationships, and unveiled an extensive crosstalk between neighboring modifications. Unexpectedly, overall histone acetylation levels remained remarkably constant upon depletion of individual acetyltransferases. Conceivably, the acetylation level is adjusted to maintain the global charge neutralization of chromatin and the stability of nuclei

    Development of Computational Techniques for Regulatory DNA Motif Identification Based on Big Biological Data

    Get PDF
    Accurate regulatory DNA motif (or motif) identification plays a fundamental role in the elucidation of transcriptional regulatory mechanisms in a cell and can strongly support the regulatory network construction for both prokaryotic and eukaryotic organisms. Next-generation sequencing techniques generate a huge amount of biological data for motif identification. Specifically, Chromatin Immunoprecipitation followed by high throughput DNA sequencing (ChIP-seq) enables researchers to identify motifs on a genome scale. Recently, technological improvements have allowed for DNA structural information to be obtained in a high-throughput manner, which can provide four DNA shape features. The DNA shape has been found as a complementary factor to genomic sequences in terms of transcription factor (TF)-DNA binding specificity prediction based on traditional machine learning models. Recent studies have demonstrated that deep learning (DL), especially the convolutional neural network (CNN), enables identification of motifs from DNA sequence directly. Although numerous algorithms and tools have been proposed and developed in this field, (1) the lack of intuitive and integrative web servers impedes the progress of making effective use of emerging algorithms and tools; (2) DNA shape has not been integrated with DL; and (3) existing DL models still suffer high false positive and false negative issues in motif identification. This thesis focuses on developing an integrated web server for motif identification based on DNA sequences either from users or built-in databases. This web server allows further motif-related analysis and Cytoscape-like network interpretation and visualization. We then proposed a DL framework for both sequence and shape motif identification from ChIP-seq data using a binomial distribution strategy. This framework can accept as input the different combinations of DNA sequence and DNA shape. Finally, we developed a gated convolutional neural network (GCNN) for capturing motif dependencies among long DNA sequences. Results show that our developed web server enables providing comprehensive motif analysis functionalities compared with existing web servers. The DL framework can identify motifs using an optimized threshold and disclose the strong predictive power of DNA shape in TF-DNA binding specificity. The identified sequence and shape motifs can contribute to TF-DNA binding mechanism interpretation. Additionally, GCNN can improve TF-DNA binding specificity prediction than CNN on most of the datasets

    Systematic analysis of lysine acetyltransferases

    Get PDF

    The word landscape of the non-coding segments of the Arabidopsis thaliana genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression.</p> <p>Results</p> <p>Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of <it>Arabidopsis thaliana</it>. Focusing on promoter regions, introns, and 3' and 5' untranslated regions (3'UTRs and 5'UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others.</p> <p>Conclusion</p> <p>Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the <it>Arabidopsis </it>genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the <it>Arabidopsis </it>genome.</p

    A Bayesian system for modeling promoter structure: A case study of histone promoters

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Computational analysis of transcriptional regulation in metazoans

    Get PDF
    This HDR thesis presents my work on transcriptional regulation in metazoans (animals). As a computational biologist, my research activities cover both the development of new bioinformatics tools, and contributions to a better understanding of biological questions. The first part focuses on transcription factors, with a study of the evolution of Hox and ParaHox gene families across meta- zoans, for which I developed HoxPred, a bioinformatics tool to automatically classify these genes into their groups of homology. Transcription factors regulate their target genes by binding to short cis-regulatory elements in DNA. The second part of this thesis introduces the prediction of these cis-regulatory elements in genomic sequences, and my contributions to the development of user- friendly computational tools (RSAT software suite and TRAP). The third part covers the detection of these cis-regulatory elements using high-throughput sequencing experiments such as ChIP-seq or ChIP-exo. The bioinformatics developments include reusable pipelines to process these datasets, and novel motif analysis tools adapted to these large datasets (RSAT peak-motifs and ExoProfiler). As all these approaches are generic, I naturally apply them to diverse biological questions, in close collaboration with experimental groups. In particular, this third part presents the studies uncover- ing new DNA sequences that are driving or preventing the binding of the glucocorticoid receptor. Finally, my research perspectives are introduced, especially regarding further developments within the RSAT suite enabling cross-species conservation analyses, and new collaborations with exper- imental teams, notably to tackle the epigenomic remodelling during osteoporosis.Cette thèse d’HDR présente mes travaux concernant la régulation transcriptionelle chez les métazoaires (animaux). En tant que biologiste computationelle, mes activités de recherche portent sur le développement de nouveaux outils bioinformatiques, et contribuent à une meilleure compréhension de questions biologiques. La première partie concerne les facteurs de transcriptions, avec une étude de l’évolution des familles de gènes Hox et ParaHox chez les métazoaires. Pour cela, j’ai développé HoxPred, un outil bioinformatique qui classe automatiquement ces gènes dans leur groupe d’homologie. Les facteurs de transcription régulent leurs gènes cibles en se fixant à l’ADN sur des petites régions cis-régulatrices. La seconde partie de cette thèse introduit la prédiction de ces éléments cis-régulateurs au sein de séquences génomiques, et présente mes contributions au développement d’outils accessibles aux non-spécialistes (la suite RSAT et TRAP). La troisième partie couvre la détection de ces éléments cis-régulateurs grâce aux expériences basées sur le séquençage à haut débit comme le ChIP-seq ou le ChIP-exo. Les développements bioinformatiques incluent des pipelines réutilisables pour analyser ces jeux de données, ainsi que de nouveaux outils d’analyse de motifs adaptés à ces grands jeux de données (RSAT peak-motifs et ExoProfiler). Comme ces approches sont génériques, je les applique naturellement à des questions biologiques diverses, en étroite collaboration avec des groupes expérimentaux. En particulier, cette troisième partie présente les études qui ont permis de mettre en évidence de nouvelles séquences d’ADN qui favorisent ou empêchent la fixation du récepteur aux glucocorticoides. Enfin, mes perspectives de recherche sont présentées, plus particulièrement concernant les nouveaux développements au sein de la suite RSAT pour permettre des analyses basées sur la conservation inter-espèces, mais aussi de nouvelles collaborations avec des équipes expérimentales, notamment pour éudier le remodelage épigénomique au cours de l’ostéoporose

    Global and specific responses of the histone acetylene to systematic perturbation

    Get PDF
    Regulation of histone acetylation is fundamental to the utilization of eukaryotic genomes in chromatin. Aberrant acetylation contributes to disease and can be clinically combated by inhibiting the responsible enzymes. Our knowledge of the histone acetylation system is patchy because we so far lacked themethodology to describe acetylation patterns and their genesis by integrated enzyme activities. We devised a generally applicable, mass spectrometry-based strategy to precisely and accurately quantify combinatorial modification motifs. This was applied to generate a comprehensive inventory of acetylation motifs on histones H3 and H4 in Drosophila cells. Systematic depletion of known or suspected acetyltransferases and deacetylases revealed specific alterations of histone acetylation signatures, established enzyme-substrate relationships, and unveiled an extensive crosstalk between neighboring modifications. Unexpectedly, overall histone acetylation levels remained remarkably constant upon depletion of individual acetyltransferases. Conceivably, the acetylation level is adjusted to maintain the global charge neutralization of chromatin and the stability of nuclei
    corecore