Search CORE

6,396 research outputs found

Genetic sequence-based prediction of long-range chromatin interactions suggests a potential role of short tandem repeat sequences in genome organization

Author: A Malaspina
A Sanyal
A Tanay
BE Boser
C Elkan
C Widmer
E de Wit
E Lieberman-Aiden
F Ay
G Rätsch
G Rätsch
H Hamada
J Dekker
J Dekker
J Dostie
J Harrow
JO Yáñez-Cuna
JR Dixon
JR Hughes
KJ Brookes
L Jacob
M Simonis
MJ Fullwood
MJ Zeitz
N Cope
N Heidari
N Varoquaux
Nico Pfeifer
P Meinicke
P Vogt
R Edgar
S Ramamoorthy
Sarvesh Nikumbh
SSP Rao
T Evgeniou
T Evgeniou
T Lingner
TD Schneider
WA Bickmore
Z Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Interpretable Machine Learning Methods for Prediction and Analysis of Genome Regulation in 3D

Author: Nikumbh Sarvesh
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2019
Field of study

With the development of chromosome conformation capture-based techniques, we now know that chromatin is packed in three-dimensional (3D) space inside the cell nucleus. Changes in the 3D chromatin architecture have already been implicated in diseases such as cancer. Thus, a better understanding of this 3D conformation is of interest to help enhance our comprehension of the complex, multipronged regulatory mechanisms of the genome. The work described in this dissertation largely focuses on development and application of interpretable machine learning methods for prediction and analysis of long-range genomic interactions output from chromatin interaction experiments. In the first part, we demonstrate that the genetic sequence information at the ge- nomic loci is predictive of the long-range interactions of a particular locus of interest (LoI). For example, the genetic sequence information at and around enhancers can help predict whether it interacts with a promoter region of interest. This is achieved by building string kernel-based support vector classifiers together with two novel, in- tuitive visualization methods. These models suggest a potential general role of short tandem repeat motifs in the 3D genome organization. But, the insights gained out of these models are still coarse-grained. To this end, we devised a machine learning method, called CoMIK for Conformal Multi-Instance Kernels, capable of providing more fine-grained insights. When comparing sequences of variable length in the su- pervised learning setting, CoMIK can not only identify the features important for classification but also locate them within the sequence. Such precise identification of important segments of the whole sequence can help in gaining de novo insights into any role played by the intervening chromatin towards long-range interactions. Although CoMIK primarily uses only genetic sequence information, it can also si- multaneously utilize other information modalities such as the numerous functional genomics data if available. The second part describes our pipeline, pHDee, for easy manipulation of large amounts of 3D genomics data. We used the pipeline for analyzing HiChIP experimen- tal data for studying the 3D architectural changes in Ewing sarcoma (EWS) which is a rare cancer affecting adolescents. In particular, HiChIP data for two experimen- tal conditions, doxycycline-treated and untreated, and for primary tumor samples is analyzed. We demonstrate that pHDee facilitates processing and easy integration of large amounts of 3D genomics data analysis together with other data-intensive bioinformatics analyses.Mit der Entwicklung von Techniken zur Bestimmung der Chromosomen-Konforma- tion wissen wir jetzt, dass Chromatin in einer dreidimensionalen (3D) Struktur in- nerhalb des Zellkerns gepackt ist. Änderungen in der 3D-Chromatin-Architektur sind bereits mit Krankheiten wie Krebs in Verbindung gebracht worden. Daher ist ein besseres Verständnis dieser 3D-Konformation von Interesse, um einen tieferen Einblick in die komplexen, vielschichtigen Regulationsmechanismen des Genoms zu ermöglichen. Die in dieser Dissertation beschriebene Arbeit konzentriert sich im Wesentlichen auf die Entwicklung und Anwendung interpretierbarer maschineller Lernmethoden zur Vorhersage und Analyse von weitreichenden genomischen Inter- aktionen aus Chromatin-Interaktionsexperimenten. Im ersten Teil zeigen wir, dass die genetische Sequenzinformation an den genomis- chen Loci prädiktiv für die weitreichenden Interaktionen eines bestimmten Locus von Interesse (LoI) ist. Zum Beispiel kann die genetische Sequenzinformation an und um Enhancer-Elemente helfen, vorherzusagen, ob diese mit einer Promotorregion von Interesse interagieren. Dies wird durch die Erstellung von String-Kernel-basierten Support Vector Klassifikationsmodellen zusammen mit zwei neuen, intuitiven Visual- isierungsmethoden erreicht. Diese Modelle deuten auf eine mögliche allgemeine Rolle von kurzen, repetitiven Sequenzmotiven (”tandem repeats”) in der dreidimensionalen Genomorganisation hin. Die Erkenntnisse aus diesen Modellen sind jedoch immer noch grobkörnig. Zu diesem Zweck haben wir die maschinelle Lernmethode CoMIK (für Conformal Multi-Instance-Kernel) entwickelt, welche feiner aufgelöste Erkennt- nisse liefern kann. Beim Vergleich von Sequenzen mit variabler Länge in überwachten Lernszenarien kann CoMIK nicht nur die für die Klassifizierung wichtigen Merkmale identifizieren, sondern sie auch innerhalb der Sequenz lokalisieren. Diese genaue Identifizierung wichtiger Abschnitte der gesamten Sequenz kann dazu beitragen, de novo Einblick in jede Rolle zu gewinnen, die das dazwischen liegende Chromatin für weitreichende Interaktionen spielt. Obwohl CoMIK hauptsächlich nur genetische Se- quenzinformationen verwendet, kann es gleichzeitig auch andere Informationsquellen nutzen, beispielsweise zahlreiche funktionellen Genomdaten sofern verfügbar. Der zweite Teil beschreibt unsere Pipeline pHDee für die einfache Bearbeitung großer Mengen von 3D-Genomdaten. Wir haben die Pipeline zur Analyse von HiChIP- Experimenten zur Untersuchung von dreidimensionalen Architekturänderungen bei der seltenen Krebsart Ewing-Sarkom (EWS) verwendet, welche Jugendliche betrifft. Insbesondere werden HiChIP-Daten für zwei experimentelle Bedingungen, Doxycyclin- behandelt und unbehandelt, und für primäre Tumorproben analysiert. Wir zeigen, dass pHDee die Verarbeitung und einfache Integration großer Mengen der 3D-Genomik- Datenanalyse zusammen mit anderen datenintensiven Bioinformatik-Analysen erle- ichtert

Universaar

Acronym

MPG.PuRe

DNaseI hypersensitivity at gene-poor, FSH dystrophy-linked 4q35.2

Author: Alan P. Boyle
Alexiadis
Bagheri-Fam
Barski
Benko
Benson
Blackledge
Bosnakovski
Boyle
Buzhov
Caron
Clapp
Costantini
Crawford
Crawford
de Greef
Deak
Dixit
Ehrlich
Gabellini
Gabellini
Gabriels
Gelfand
Gregory E. Crawford
Gross
Guelen
Hillier
Hou
Hou
Janet Sowden
Jeong
Jiang
Jiang
Kadauke
Kikin
Kim
Koji Tsumagari
Laoudj-Chenivesse
Lemmers
Lemmers
Lingyun Song
Lyle
Masny
Masui
McCann
Melanie Ehrlich
Murmann
Nobrega
Nowbakht
Oliver
Osborne
Ottaviani
Ottaviani
Ovcharenko
Pauler
Phillips
Rabi Tawil
Scacheri
Snider
Stein
Sun
Tanoue
Terrence S. Furey
Tian
Tsumagari
van der Maarel
van Deutekom
van Deutekom
van Geel
van Overveld
Venter
Wang
Winokur
Winokur
Xi
Xueqing Xu
Yang
Zeng
Zuker
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

A subtelomeric region, 4q35.2, is implicated in facioscapulohumeral muscular dystrophy (FSHD), a dominant disease thought to involve local pathogenic changes in chromatin. FSHD patients have too few copies of a tandem 3.3-kb repeat (D4Z4) at 4q35.2. No phenotype is associated with having few copies of an almost identical repeat at 10q26.3. Standard expression analyses have not given definitive answers as to the genes involved. To investigate the pathogenic effects of short D4Z4 arrays on gene expression in the very gene-poor 4q35.2 and to find chromatin landmarks there for transcription control, unannotated genes and chromatin structure, we mapped DNaseI-hypersensitive (DH) sites in FSHD and control myoblasts. Using custom tiling arrays (DNase-chip), we found unexpectedly many DH sites in the two large gene deserts in this 4-Mb region. One site was seen preferentially in FSHD myoblasts. Several others were mapped >0.7 Mb from genes known to be active in the muscle lineage and were also observed in cultured fibroblasts, but not in lymphoid, myeloid or hepatic cells. Their selective occurrence in cells derived from mesoderm suggests functionality. Our findings indicate that the gene desert regions of 4q35.2 may have functional significance, possibly also to FSHD, despite their paucity of known genes

CiteSeerX

Crossref

PubMed Central

The non-coding genome in Autism Spectrum Disorders

Author: Carracedo Álvarez Ángel María
Domínguez Alonso Sara
Rodríguez Fontenla María Cristina
Publication venue: 'Elsevier BV'
Publication date: 01/01/2023
Field of study

Autism Spectrum Disorders (ASD) are a group of neurodevelopmental disorders (NDDs) characterized by difficulties in social interaction and communication, repetitive behavior, and restricted interests. While ASD have been proven to have a strong genetic component, current research largely focuses on coding regions of the genome. However, non-coding DNA, which makes up for ∼99% of the human genome, has recently been recognized as an important contributor to the high heritability of ASD, and novel sequencing technologies have been a milestone in opening up new directions for the study of the gene regulatory networks embedded within the non-coding regions. Here, we summarize current progress on the contribution of non-coding alterations to the pathogenesis of ASD and provide an overview of existing methods allowing for the study of their functional relevance, discussing potential ways of unraveling ASD's “missing heritability”S

Repositorio Institucional da Universidade de Santiago de Compostela

Thamodaran. P

Author: Muniswamy K
Thamodaran P
Publication venue: 'Academic Journals'
Publication date: 27/08/2013
Field of study

Not AvailableUsually, most of the genes are biallelically expressed but imprinted gene exhibit monoallelic expression based on their parental origin. Genomic imprinting exhibit differences in control between flowering plants and mammals, for instance, imprinted gene are specifically activated by demethylation, rather than targeted for silencing in plants and imprinted gene expression in plant which occur in endosperm. It also displays sexual dimorphism like differential timing in imprint establishment and RNA based silencing mechanism in paternally repressed imprinted gene. Within imprinted regions, the unusual occurrence and distribution of various types of repetitive elements may act as genomic imprinting signatures. Imprinting regulation probably at many loci involves insulator protein dependent and higher-order chromatin interaction, and/or non-coding RNAs mediated mechanisms. However, placentaspecific imprinting involves repressive histone modifications and non-coding RNAs. The higher-order chromatin interaction involves differentially methylated domains (DMDs) exhibiting sex-specific methylation that act as scaffold for imprinting, regulate allelic-specific imprinted gene expression. The paternally methylated differentially methylated regions (DMRs) contain less CpGs than the maternally methylated DMRs. The non-coding RNAs mediated mechanisms include C/D RNA and microRNA, which are invovled in RNA-guided post-transcriptional RNA modifications and RNA-mediated gene silencing, respectively. The maintenance and reprogramming of imprinting are not significantly affected by reduced expression of Dicer1 and the evolution of imprinting might be related to acquisition of DNMT3L (de novo methyltransferase 3L) by a common ancestor of eutherians and marsupials. The common feature among diverse imprinting control elements and evolutionary significance of imprinting need to be identified.Not Availabl

KRISHI Publications and Data Repository

RNA, the Epicenter of Genetic Information

Author: Amaral Paulo
Mattick John
Publication venue: 'Informa UK Limited'
Publication date: 22/07/2022
Field of study

The origin story and emergence of molecular biology is muddled. The early triumphs in bacterial genetics and the complexity of animal and plant genomes complicate an intricate history. This book documents the many advances, as well as the prejudices and founder fallacies. It highlights the premature relegation of RNA to simply an intermediate between gene and protein, the underestimation of the amount of information required to program the development of multicellular organisms, and the dawning realization that RNA is the cornerstone of cell biology, development, brain function and probably evolution itself. Key personalities, their hubris as well as prescient predictions are richly illustrated with quotes, archival material, photographs, diagrams and references to bring the people, ideas and discoveries to life, from the conceptual cradles of molecular biology to the current revolution in the understanding of genetic information. Key Features Documents the confused early history of DNA, RNA and proteins - a transformative history of molecular biology like no other. Integrates the influences of biochemistry and genetics on the landscape of molecular biology. Chronicles the important discoveries, preconceptions and misconceptions that retarded or misdirected progress. Highlights major pioneers and contributors to molecular biology, with a focus on RNA and noncoding DNA. Summarizes the mounting evidence for the central roles of non-protein-coding RNA in cell and developmental biology. Provides a thought-provoking retrospective and forward-looking perspective for advanced students and professional researchers

Directory of Open Access Books (DOAB)

Recommended from our members

The evolutionary genomics of CTCF binding and functional signatures in mouse.

Author: Azazi Dhoyazan Mohammed Ali
Publication venue: University of Cambridge
Publication date: 24/02/2020
Field of study

Genetic differences within and between species predominantly lie in the noncoding sequence of the regulatory regions of the genome whose function and significance largely remain poorly understood. Despite significant progress in the field of genomics and the rapid progress in sequencing methods and the subsequent explosion of genomic data, our understanding of the role of the non- coding genetic sequence in the regulation of tissue- and species-specific gene expression is still lagging behind, limiting our comprehension of the evolutionary mechanisms and pressures that shape those expression profiles, and their involvement in the health and disease. The CTCF protein demarcates mammalian genomes into discrete transcriptionally active domains, providing the platform for complex spatial and temporal regulatory processing of genetic information that govern biological processes. In this thesis, I investigate the dynamics and functional implications of evolutionarily novel CTCF binding sites in two Mus genus mouse subspecies, Mus musculus domesticus and Mus musculus castaneus, separated by a short evolutionary time of only one million years. The project investigated the subspecies-specific binding of CTCF in terms of the repeat content, evolution, functional impact and involvement in chromatin conformation. The key findings of this investigation are: (1) the incorporation of young CTCF sites into the non-coding genome via action of transposable elements is followed rapidly with the exhibition of various characteristics of biological function; (2) Unlike other tissue-specific transcription factors, allele- specific CTCF occupancy is affected by cis- and trans-acting regulatory mechanisms that exhibit similar functional characteristics; (3) CTCF evolutionary dynamics support both maintenance of pre-existing structures and functions and provide template for novel ones. In summary, this thesis discusses the evolutionary dynamics of CTCF genomic occupancy and functional signatures in short evolutionary time, and illustrates how either novel species-specific CTCF sites, or common sites with newly-acquired genotypic variants integrate into existing genomic architecture and begin to exert their effects

Apollo (Cambridge)

Exon-phase symmetry and intrinsic structural disorder promote modular evolution in the human genome

Author: Adams
Balazs
Buljan
Burra
Corvelo
Daughdrill
Davey
Davey
Diella
Dosztanyi
Dosztanyi
Dyson
Eva Schad
Fedorov
Fisher
Fujita
Fuxreiter
Fuxreiter
Gilbert
Greaser
Grover
Hernandez
Kaessmann
Kalmar
Kaplon
Kato
Kawasaki
Kiss
Kiss
Kovacs
Lajos Kalmar
Lee
Li
Long
Meszaros
Mittag
Modrek
Mosca
Oliver
Pancsa
Patthy
Patthy
Patthy
Pentony
Peter Tompa
Punta
Romero
Sarkar
Seet
Sire
Tompa
Tompa
Tompa
Tompa
Tompa
Tompa
Uversky
Van Roey
Vucetic
Ward
Weatheritt
Weatheritt
Zhang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2013
Field of study

A key signature of module exchange in the genome is phase symmetry of exons, suggestive of exon shuffling events that occurred without disrupting translation reading frame. At the protein level, intrinsic structural disorder may be another key element because disordered regions often serve as functional elements that can be effectively integrated into a protein structure. Therefore, we asked whether exon-phase symmetry in the human genome and structural disorder in the human proteome are connected, signalling such evolutionary mechanisms in the assembly of multi-exon genes. We found an elevated level of structural disorder of regions encoded by symmetric exons and a preferred symmetry of exons encoding for mostly disordered regions (>70% predicted disorder). Alternatively spliced symmetric exons tend to correspond to the most disordered regions. The genes of mostly disordered proteins (>70% predicted disorder) tend to be assembled from symmetric exons, which often arise by internal tandem duplications. Preponderance of certain types of short motifs (e.g. SH3-binding motif) and domains (e.g. high-mobility group domains) suggests that certain disordered modules have been particularly effective in exon-shuffling events. Our observations suggest that structural disorder has facilitated modular assembly of complex genes in evolution of the human genome. © 2013 The Author(s)

Crossref

Repository of the Academy's Library

Organization of chromosome ends in the rice blast fungus, Magnaporthe oryzae

Author: Adam
Altschul
Andrulis
Aparicio
Arkhipova
Bachrati
Bailey
Barry
Barry
Bass
Bateman
Baur
Bendtsen
Benson
Berriman
Bhattacharyya
Biessmann
Bok
Bonman
Britten
Broun
Brown
Butler
Carlson
Cathryn Rehmeyer
Chan
Chao
Charron
Chuck Staben
Cooper
Copenhaver
Corcoran
Couch
De Las Penas
Dean
Dernburg
Dioh
Donelson
Dore
Doug Brown
Duraisingh
Ellis
Ewing
Ewing
Fajkus
Farman
Farman
Farman
Farman
Flint
Freitas-Junior
Freitas-Junior
Gancelo
Gao
Gardiner
Gilson
Gonzalez
Gordon
Gotta
Gottschling
Hebert
Hecht
Hernandez-Rivas
Hernandez-Rivas
Inglis
Kang
Keely
Krauskopf
Laroche
Leech
Levis
Li
Liti
Louis
Louis
Luo
Mandell
Mark Farman
McKnight
Mefford
Meyne
Motoaki Kusaba
Moxon
Nakamura
Nakamura
Nakayashiki
Naumov
Naumov
Naumov
Naumov
Naumov
Nitta
Noutoshi
Orbach
Pace
Panaccione
Pays
Pedley
Penton
Perez-Gonzalez
Peyret
Proctor
Pryde
Pryde
Rachidi
Ralph Dean
Ransom
Regad
Richards
Riethman
Robinson
Royle
Salamov
Sanchez-Alonso
Schaffzin
Scherf
Scherf
Shore
Skinner
Tanaka
Teunissen
Trask
Tudzynski
Tudzynski
Valent
van Brabant
van Steensel
Vanhamme
Viswanathan
Wada
Wallrath
Walmsley
Weixi Li
Wicky
Wolpert
Yang
Yu
Yu
Yun-Sik Kim
Zakian
Zauner
Zou
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

Eukaryotic pathogens of humans often evade the immune system by switching the expression of surface proteins encoded by subtelomeric gene families. To determine if plant pathogenic fungi use a similar mechanism to avoid host defenses, we sequenced the 14 chromosome ends of the rice blast pathogen, Magnaporthe oryzae. One telomere is directly joined to ribosomal RNA-encoding genes, at the end of the ∼2 Mb rDNA array. Two are attached to chromosome-unique sequences, and the remainder adjoin a distinct subtelomere region, consisting of a telomere-linked RecQ-helicase (TLH) gene flanked by several blocks of tandem repeats. Unlike other microbes, M.oryzae exhibits very little gene amplification in the subtelomere regions—out of 261 predicted genes found within 100 kb of the telomeres, only four were present at more than one chromosome end. Therefore, it seems unlikely that M.oryzae uses switching mechanisms to evade host defenses. Instead, the M.oryzae telomeres have undergone frequent terminal truncation, and there is evidence of extensive ectopic recombination among transposons in these regions. We propose that the M.oryzae chromosome termini play more subtle roles in host adaptation by promoting the loss of terminally-positioned genes that tend to trigger host defenses

CiteSeerX

Crossref

PubMed Central

Role Of Sirna Pathway In Epigenetic Modifications Of The Drosophila Melanogaster X Chromosome

Author: Deshpande Nikita
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2018
Field of study

Eukaryotic genomes are organized into large domains of coordinated regulation. The role of small RNAs in formation of these domains is largely unexplored. An extraordinary example of domain-wide regulation is X chromosome compensation in Drosophila melanogaster males. This process occurs by hypertranscription of genes on the single male X chromosome. Extensive research in this field has shown that the Male Specific Lethal (MSL) complex binds X-linked genes and modifies chromatin to increase expression. The components of this complex, and their actions on chromatin, are well studied. In contrast, the mechanism that results in exclusive recruitment to the X chromosome is not understood. Our research focuses on the process by which male flies selectively modulate expression from their single X chromosome. Prior studies in the lab have found that the siRNAs produced from repetitive sequences on the X chromosome and the repeat DNA itself, participates in dosage compensation in flies. Interestingly, the siRNA pathway contributes to X-localization of the MSL complex. The basis of enhanced localization is unknown, and no RNAi components have been found to interact directly with the MSL complex. This suggests that siRNA influences X-recognition by an indirect and novel mechanism. I found evidence that chromatin around these repeats is modulated by the siRNA pathway. I demonstrated that FLAG-tagged Argonaute2 protein localizes at these repeats. I show that numerous Agonaute2-interacting proteins show evidence of participation in compensation. One of these, Su(var)3-9, deposits H3K9me2 in and near the repeats. When a repeat-containing transgene is inserted on an autosome, H3K9me2 is enriched in surrounding chromatin, an effect that is enhanced by ectopic production of cognate siRNA. In accord with the idea that these repeats contribute to recruitment of dosage compensation, genes as much as 100 kb from the autosomal insertion increase in expression upon expression of ectopic siRNA. My studies demonstrate that chromatin around a group of X-enriched sequences is modulated by siRNA, and supports the idea that siRNA contributes to the elevated expression that characterizes the compensated male X chromosome. This study advances our understanding of the mechanism of X recognition by showing a direct relationship between siRNA-directed chromatin modification and a class of repetitive elements that helps mark the X chromosome

Digital Commons@Wayne State University