6,396 research outputs found
Interpretable Machine Learning Methods for Prediction and Analysis of Genome Regulation in 3D
With the development of chromosome conformation capture-based techniques, we now know that chromatin is packed in three-dimensional (3D) space inside the cell nucleus. Changes in the 3D chromatin architecture have already been implicated in diseases such as cancer. Thus, a better understanding of this 3D conformation is of interest to help enhance our comprehension of the complex, multipronged regulatory mechanisms of the genome. The work described in this dissertation largely focuses on development and application of interpretable machine learning methods for prediction and analysis of long-range genomic interactions output from chromatin interaction experiments. In the first part, we demonstrate that the genetic sequence information at the ge- nomic loci is predictive of the long-range interactions of a particular locus of interest (LoI). For example, the genetic sequence information at and around enhancers can help predict whether it interacts with a promoter region of interest. This is achieved by building string kernel-based support vector classifiers together with two novel, in- tuitive visualization methods. These models suggest a potential general role of short tandem repeat motifs in the 3D genome organization. But, the insights gained out of these models are still coarse-grained. To this end, we devised a machine learning method, called CoMIK for Conformal Multi-Instance Kernels, capable of providing more fine-grained insights. When comparing sequences of variable length in the su- pervised learning setting, CoMIK can not only identify the features important for classification but also locate them within the sequence. Such precise identification of important segments of the whole sequence can help in gaining de novo insights into any role played by the intervening chromatin towards long-range interactions. Although CoMIK primarily uses only genetic sequence information, it can also si- multaneously utilize other information modalities such as the numerous functional genomics data if available. The second part describes our pipeline, pHDee, for easy manipulation of large amounts of 3D genomics data. We used the pipeline for analyzing HiChIP experimen- tal data for studying the 3D architectural changes in Ewing sarcoma (EWS) which is a rare cancer affecting adolescents. In particular, HiChIP data for two experimen- tal conditions, doxycycline-treated and untreated, and for primary tumor samples is analyzed. We demonstrate that pHDee facilitates processing and easy integration of large amounts of 3D genomics data analysis together with other data-intensive bioinformatics analyses.Mit der Entwicklung von Techniken zur Bestimmung der Chromosomen-Konforma- tion wissen wir jetzt, dass Chromatin in einer dreidimensionalen (3D) Struktur in- nerhalb des Zellkerns gepackt ist. Änderungen in der 3D-Chromatin-Architektur sind bereits mit Krankheiten wie Krebs in Verbindung gebracht worden. Daher ist ein besseres Verständnis dieser 3D-Konformation von Interesse, um einen tieferen Einblick in die komplexen, vielschichtigen Regulationsmechanismen des Genoms zu ermöglichen. Die in dieser Dissertation beschriebene Arbeit konzentriert sich im Wesentlichen auf die Entwicklung und Anwendung interpretierbarer maschineller Lernmethoden zur Vorhersage und Analyse von weitreichenden genomischen Inter- aktionen aus Chromatin-Interaktionsexperimenten. Im ersten Teil zeigen wir, dass die genetische Sequenzinformation an den genomis- chen Loci prädiktiv für die weitreichenden Interaktionen eines bestimmten Locus von Interesse (LoI) ist. Zum Beispiel kann die genetische Sequenzinformation an und um Enhancer-Elemente helfen, vorherzusagen, ob diese mit einer Promotorregion von Interesse interagieren. Dies wird durch die Erstellung von String-Kernel-basierten Support Vector Klassifikationsmodellen zusammen mit zwei neuen, intuitiven Visual- isierungsmethoden erreicht. Diese Modelle deuten auf eine mögliche allgemeine Rolle von kurzen, repetitiven Sequenzmotiven (”tandem repeats”) in der dreidimensionalen Genomorganisation hin. Die Erkenntnisse aus diesen Modellen sind jedoch immer noch grobkörnig. Zu diesem Zweck haben wir die maschinelle Lernmethode CoMIK (für Conformal Multi-Instance-Kernel) entwickelt, welche feiner aufgelöste Erkennt- nisse liefern kann. Beim Vergleich von Sequenzen mit variabler Länge in überwachten Lernszenarien kann CoMIK nicht nur die für die Klassifizierung wichtigen Merkmale identifizieren, sondern sie auch innerhalb der Sequenz lokalisieren. Diese genaue Identifizierung wichtiger Abschnitte der gesamten Sequenz kann dazu beitragen, de novo Einblick in jede Rolle zu gewinnen, die das dazwischen liegende Chromatin für weitreichende Interaktionen spielt. Obwohl CoMIK hauptsächlich nur genetische Se- quenzinformationen verwendet, kann es gleichzeitig auch andere Informationsquellen nutzen, beispielsweise zahlreiche funktionellen Genomdaten sofern verfügbar. Der zweite Teil beschreibt unsere Pipeline pHDee für die einfache Bearbeitung großer Mengen von 3D-Genomdaten. Wir haben die Pipeline zur Analyse von HiChIP- Experimenten zur Untersuchung von dreidimensionalen Architekturänderungen bei der seltenen Krebsart Ewing-Sarkom (EWS) verwendet, welche Jugendliche betrifft. Insbesondere werden HiChIP-Daten für zwei experimentelle Bedingungen, Doxycyclin- behandelt und unbehandelt, und für primäre Tumorproben analysiert. Wir zeigen, dass pHDee die Verarbeitung und einfache Integration großer Mengen der 3D-Genomik- Datenanalyse zusammen mit anderen datenintensiven Bioinformatik-Analysen erle- ichtert
DNaseI hypersensitivity at gene-poor, FSH dystrophy-linked 4q35.2
A subtelomeric region, 4q35.2, is implicated in facioscapulohumeral muscular dystrophy (FSHD), a dominant disease thought to involve local pathogenic changes in chromatin. FSHD patients have too few copies of a tandem 3.3-kb repeat (D4Z4) at 4q35.2. No phenotype is associated with having few copies of an almost identical repeat at 10q26.3. Standard expression analyses have not given definitive answers as to the genes involved. To investigate the pathogenic effects of short D4Z4 arrays on gene expression in the very gene-poor 4q35.2 and to find chromatin landmarks there for transcription control, unannotated genes and chromatin structure, we mapped DNaseI-hypersensitive (DH) sites in FSHD and control myoblasts. Using custom tiling arrays (DNase-chip), we found unexpectedly many DH sites in the two large gene deserts in this 4-Mb region. One site was seen preferentially in FSHD myoblasts. Several others were mapped >0.7 Mb from genes known to be active in the muscle lineage and were also observed in cultured fibroblasts, but not in lymphoid, myeloid or hepatic cells. Their selective occurrence in cells derived from mesoderm suggests functionality. Our findings indicate that the gene desert regions of 4q35.2 may have functional significance, possibly also to FSHD, despite their paucity of known genes
The non-coding genome in Autism Spectrum Disorders
Autism Spectrum Disorders (ASD) are a group of neurodevelopmental disorders (NDDs) characterized by difficulties in social interaction and communication, repetitive behavior, and restricted interests. While ASD have been proven to have a strong genetic component, current research largely focuses on coding regions of the genome. However, non-coding DNA, which makes up for ∼99% of the human genome, has recently been recognized as an important contributor to the high heritability of ASD, and novel sequencing technologies have been a milestone in opening up new directions for the study of the gene regulatory networks embedded within the non-coding regions. Here, we summarize current progress on the contribution of non-coding alterations to the pathogenesis of ASD and provide an overview of existing methods allowing for the study of their functional relevance, discussing potential ways of unraveling ASD's “missing heritability”S
Thamodaran. P
Not AvailableUsually, most of the genes are biallelically expressed but imprinted gene exhibit monoallelic expression
based on their parental origin. Genomic imprinting exhibit differences in control between flowering
plants and mammals, for instance, imprinted gene are specifically activated by demethylation, rather
than targeted for silencing in plants and imprinted gene expression in plant which occur in endosperm.
It also displays sexual dimorphism like differential timing in imprint establishment and RNA based
silencing mechanism in paternally repressed imprinted gene. Within imprinted regions, the unusual
occurrence and distribution of various types of repetitive elements may act as genomic imprinting
signatures. Imprinting regulation probably at many loci involves insulator protein dependent and
higher-order chromatin interaction, and/or non-coding RNAs mediated mechanisms. However, placentaspecific
imprinting involves repressive histone modifications and non-coding RNAs. The higher-order
chromatin interaction involves differentially methylated domains (DMDs) exhibiting sex-specific
methylation that act as scaffold for imprinting, regulate allelic-specific imprinted gene expression. The
paternally methylated differentially methylated regions (DMRs) contain less CpGs than the maternally
methylated DMRs. The non-coding RNAs mediated mechanisms include C/D RNA and microRNA, which
are invovled in RNA-guided post-transcriptional RNA modifications and RNA-mediated gene silencing,
respectively. The maintenance and reprogramming of imprinting are not significantly affected by
reduced expression of Dicer1 and the evolution of imprinting might be related to acquisition of DNMT3L
(de novo methyltransferase 3L) by a common ancestor of eutherians and marsupials. The common
feature among diverse imprinting control elements and evolutionary significance of imprinting need to
be identified.Not Availabl
RNA, the Epicenter of Genetic Information
The origin story and emergence of molecular biology is muddled. The early triumphs in bacterial genetics and the complexity of animal and plant genomes complicate an intricate history. This book documents the many advances, as well as the prejudices and founder fallacies. It highlights the premature relegation of RNA to simply an intermediate between gene and protein, the underestimation of the amount of information required to program the development of multicellular organisms, and the dawning realization that RNA is the cornerstone of cell biology, development, brain function and probably evolution itself. Key personalities, their hubris as well as prescient predictions are richly illustrated with quotes, archival material, photographs, diagrams and references to bring the people, ideas and discoveries to life, from the conceptual cradles of molecular biology to the current revolution in the understanding of genetic information. Key Features Documents the confused early history of DNA, RNA and proteins - a transformative history of molecular biology like no other. Integrates the influences of biochemistry and genetics on the landscape of molecular biology. Chronicles the important discoveries, preconceptions and misconceptions that retarded or misdirected progress. Highlights major pioneers and contributors to molecular biology, with a focus on RNA and noncoding DNA. Summarizes the mounting evidence for the central roles of non-protein-coding RNA in cell and developmental biology. Provides a thought-provoking retrospective and forward-looking perspective for advanced students and professional researchers
Recommended from our members
The evolutionary genomics of CTCF binding and functional signatures in mouse.
Genetic differences within and between species predominantly lie in the noncoding sequence of the regulatory regions of the genome whose function and significance largely remain poorly understood. Despite significant progress in the field of genomics and the rapid progress in sequencing methods and the subsequent explosion of genomic data, our understanding of the role of the non- coding genetic sequence in the regulation of tissue- and species-specific gene expression is still lagging behind, limiting our comprehension of the evolutionary mechanisms and pressures that shape those expression profiles, and their involvement in the health and disease.
The CTCF protein demarcates mammalian genomes into discrete transcriptionally active domains, providing the platform for complex spatial and temporal regulatory processing of genetic information that govern biological processes. In this thesis, I investigate the dynamics and functional implications of evolutionarily novel CTCF binding sites in two Mus genus mouse subspecies, Mus musculus domesticus and Mus musculus castaneus, separated by a short evolutionary time of only one million years. The project investigated the subspecies-specific binding of CTCF in terms of the repeat content, evolution, functional impact and involvement in chromatin conformation. The key findings of this investigation are: (1) the incorporation of young CTCF sites into the non-coding genome via action of transposable elements is followed rapidly with the exhibition of various characteristics of biological function; (2) Unlike other tissue-specific transcription factors, allele- specific CTCF occupancy is affected by cis- and trans-acting regulatory mechanisms that exhibit similar functional characteristics; (3) CTCF evolutionary dynamics support both maintenance of pre-existing structures and functions and provide template for novel ones.
In summary, this thesis discusses the evolutionary dynamics of CTCF genomic occupancy and functional signatures in short evolutionary time, and
illustrates how either novel species-specific CTCF sites, or common sites with newly-acquired genotypic variants integrate into existing genomic architecture and begin to exert their effects
Exon-phase symmetry and intrinsic structural disorder promote modular evolution in the human genome
A key signature of module exchange in the genome is phase symmetry of exons, suggestive of exon shuffling events that occurred without disrupting translation reading frame. At the protein level, intrinsic structural disorder may be another key element because disordered regions often serve as functional elements that can be effectively integrated into a protein structure. Therefore, we asked whether exon-phase symmetry in the human genome and structural disorder in the human proteome are connected, signalling such evolutionary mechanisms in the assembly of multi-exon genes. We found an elevated level of structural disorder of regions encoded by symmetric exons and a preferred symmetry of exons encoding for mostly disordered regions (>70% predicted disorder). Alternatively spliced symmetric exons tend to correspond to the most disordered regions. The genes of mostly disordered proteins (>70% predicted disorder) tend to be assembled from symmetric exons, which often arise by internal tandem duplications. Preponderance of certain types of short motifs (e.g. SH3-binding motif) and domains (e.g. high-mobility group domains) suggests that certain disordered modules have been particularly effective in exon-shuffling events. Our observations suggest that structural disorder has facilitated modular assembly of complex genes in evolution of the human genome. © 2013 The Author(s)
Organization of chromosome ends in the rice blast fungus, Magnaporthe oryzae
Eukaryotic pathogens of humans often evade the immune system by switching the expression of surface proteins encoded by subtelomeric gene families. To determine if plant pathogenic fungi use a similar mechanism to avoid host defenses, we sequenced the 14 chromosome ends of the rice blast pathogen, Magnaporthe oryzae. One telomere is directly joined to ribosomal RNA-encoding genes, at the end of the ∼2 Mb rDNA array. Two are attached to chromosome-unique sequences, and the remainder adjoin a distinct subtelomere region, consisting of a telomere-linked RecQ-helicase (TLH) gene flanked by several blocks of tandem repeats. Unlike other microbes, M.oryzae exhibits very little gene amplification in the subtelomere regions—out of 261 predicted genes found within 100 kb of the telomeres, only four were present at more than one chromosome end. Therefore, it seems unlikely that M.oryzae uses switching mechanisms to evade host defenses. Instead, the M.oryzae telomeres have undergone frequent terminal truncation, and there is evidence of extensive ectopic recombination among transposons in these regions. We propose that the M.oryzae chromosome termini play more subtle roles in host adaptation by promoting the loss of terminally-positioned genes that tend to trigger host defenses
Role Of Sirna Pathway In Epigenetic Modifications Of The Drosophila Melanogaster X Chromosome
Eukaryotic genomes are organized into large domains of coordinated regulation. The role of small RNAs in formation of these domains is largely unexplored. An extraordinary example of domain-wide regulation is X chromosome compensation in Drosophila melanogaster males. This process occurs by hypertranscription of genes on the single male X chromosome. Extensive research in this field has shown that the Male Specific Lethal (MSL) complex binds X-linked genes and modifies chromatin to increase expression. The components of this complex, and their actions on chromatin, are well studied. In contrast, the mechanism that results in exclusive recruitment to the X chromosome is not understood. Our research focuses on the process by which male flies selectively modulate expression from their single X chromosome. Prior studies in the lab have found that the siRNAs produced from repetitive sequences on the X chromosome and the repeat DNA itself, participates in dosage compensation in flies. Interestingly, the siRNA pathway contributes to X-localization of the MSL complex. The basis of enhanced localization is unknown, and no RNAi components have been found to interact directly with the MSL complex. This suggests that siRNA influences X-recognition by an indirect and novel mechanism. I found evidence that chromatin around these repeats is modulated by the siRNA pathway. I demonstrated that FLAG-tagged Argonaute2 protein localizes at these repeats. I show that numerous Agonaute2-interacting proteins show evidence of participation in compensation. One of these, Su(var)3-9, deposits H3K9me2 in and near the repeats. When a repeat-containing transgene is inserted on an autosome, H3K9me2 is enriched in surrounding chromatin, an effect that is enhanced by ectopic production of cognate siRNA. In accord with the idea that these repeats contribute to recruitment of dosage compensation, genes as much as 100 kb from the autosomal insertion increase in expression upon expression of ectopic siRNA. My studies demonstrate that chromatin around a group of X-enriched sequences is modulated by siRNA, and supports the idea that siRNA contributes to the elevated expression that characterizes the compensated male X chromosome. This study advances our understanding of the mechanism of X recognition by showing a direct relationship between siRNA-directed chromatin modification and a class of repetitive elements that helps mark the X chromosome
- …