144 research outputs found
The role of mutation rate variation and genetic diversity in the architecture of human disease
Background
We have investigated the role that the mutation rate and the structure of genetic variation at a locus play in determining whether a gene is involved in disease. We predict that the mutation rate and its genetic diversity should be higher in genes associated with disease, unless all genes that could cause disease have already been identified.
Results
Consistent with our predictions we find that genes associated with Mendelian and complex disease are substantially longer than non-disease genes. However, we find that both Mendelian and complex disease genes are found in regions of the genome with relatively low mutation rates, as inferred from intron divergence between humans and chimpanzees, and they are predicted to have similar rates of non-synonymous mutation as other genes. Finally, we find that disease genes are in regions of significantly elevated genetic diversity, even when variation in the rate of mutation is controlled for. The effect is small nevertheless.
Conclusions
Our results suggest that gene length contributes to whether a gene is associated with disease. However, the mutation rate and the genetic architecture of the locus appear to play only a minor role in determining whether a gene is associated with disease
eFORGE v2.0: updated analysis of cell type-specific signal in epigenomic data
SUMMARY: The Illumina Infinium EPIC BeadChip is a new high-throughput array for DNA methylation analysis, extending the earlier 450k array by over 400,000 new sites. Previously, a method named eFORGE was developed to provide insights into cell type-specific and cell composition effects for 450k data. Here, we present a significantly updated and improved version of eFORGE that can analyse both EPIC and 450k array data. New features include analysis of chromatin states, TF motifs and DNase I footprints, providing tools for EWAS interpretation and epigenome editing. AVAILABILITY: eFORGE v2.0 is implemented as a web tool available from https://eforge.altiusinstitute.org and https://eforge-tf.altiusinstitute.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online
Genetic, environmental and stochastic factors in monozygotic twin discordance with a focus on epigenetic differences
PMCID: PMC3566971This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited
The Hellenic type of nondeletional hereditary persistence of fetal hemoglobin results from a novel mutation (g.-109G>T) in the HBG2 gene promoter
Nondeletional hereditary persistence of fetal hemoglobin (nd-HPFH), a rare hereditary condition resulting in elevated levels of fetal hemoglobin (Hb F) in adults, is associated with promoter mutations in the human fetal globin (HBG1 and HBG2) genes. In this paper, we report a novel type of nd-HPFH due to a HBG2 gene promoter mutation (HBG2:g.-109G>T). This mutation, located at the 3′ end of the HBG2 distal CCAAT box, was initially identified in an adult female subject of Central Greek origin and results in elevated Hb F levels (4.1%) and significantly increased Gγ-globin chain production (79.2%). Family studies and DNA analysis revealed that the HBG2:g.-109G>T mutation is also found in the family members in compound heterozygosity with the HBG2:g.-158C>T single nucleotide polymorphism or the silent HBB:g.-101C>T β-thalassemia mutation, resulting in the latter case in significantly elevated Hb F levels (14.3%). Electrophoretic mobility shift analysis revealed that the HBG2:g.-109G>T mutation abolishes a transcription factor binding site, consistent with previous observations using DNA footprinting analysis, suggesting that guanine at position HBG2/1:g.-109 is critical for NF-E3 binding. These data suggest that the HBG2:g-109G>T mutation has a functional role in increasing HBG2 transcription and is responsible for the HPFH phenotype observed in our index cases
An Integrated Model of Multiple-Condition ChIP-Seq Data Reveals Predeterminants of Cdx2 Binding
Regulatory proteins can bind to different sets of genomic targets in various cell types or conditions. To reliably characterize such condition-specific regulatory binding we introduce MultiGPS, an integrated machine learning approach for the analysis of multiple related ChIP-seq experiments. MultiGPS is based on a generalized Expectation Maximization framework that shares information across multiple experiments for binding event discovery. We demonstrate that our framework enables the simultaneous modeling of sparse condition-specific binding changes, sequence dependence, and replicate-specific noise sources. MultiGPS encourages consistency in reported binding event locations across multiple-condition ChIP-seq datasets and provides accurate estimation of ChIP enrichment levels at each event. MultiGPS's multi-experiment modeling approach thus provides a reliable platform for detecting differential binding enrichment across experimental conditions. We demonstrate the advantages of MultiGPS with an analysis of Cdx2 binding in three distinct developmental contexts. By accurately characterizing condition-specific Cdx2 binding, MultiGPS enables novel insight into the mechanistic basis of Cdx2 site selectivity. Specifically, the condition-specific Cdx2 sites characterized by MultiGPS are highly associated with pre-existing genomic context, suggesting that such sites are pre-determined by cell-specific regulatory architecture. However, MultiGPS-defined condition-independent sites are not predicted by pre-existing regulatory signals, suggesting that Cdx2 can bind to a subset of locations regardless of genomic environment. A summary of this paper appears in the proceedings of the RECOMB 2014 conference, April 2–5.National Science Foundation (U.S.) (Graduate Research Fellowship under Grant 0645960)National Institutes of Health (U.S.) (grant P01 NS055923)Pennsylvania State University. Center for Eukaryotic Gene Regulatio
Predicting Human Nucleosome Occupancy from Primary Sequence
Nucleosomes are the fundamental repeating unit of chromatin and comprise the structural building blocks of the living eukaryotic genome. Micrococcal nuclease (MNase) has long been used to delineate nucleosomal organization. Microarray-based nucleosome mapping experiments in yeast chromatin have revealed regularly-spaced translational phasing of nucleosomes. These data have been used to train computational models of sequence-directed nuclesosome positioning, which have identified ubiquitous strong intrinsic nucleosome positioning signals. Here, we successfully apply this approach to nucleosome positioning experiments from human chromatin. The predictions made by the human-trained and yeast-trained models are strongly correlated, suggesting a shared mechanism for sequence-based determination of nucleosome occupancy. In addition, we observed striking complementarity between classifiers trained on experimental data from weakly versus heavily digested MNase samples. In the former case, the resulting model accurately identifies nucleosome-forming sequences; in the latter, the classifier excels at identifying nucleosome-free regions. Using this model we are able to identify several characteristics of nucleosome-forming and nucleosome-disfavoring sequences. First, by combining results from each classifier applied de novo across the human ENCODE regions, the classifier reveals distinct sequence composition and periodicity features of nucleosome-forming and nucleosome-disfavoring sequences. Short runs of dinucleotide repeat appear as a hallmark of nucleosome-disfavoring sequences, while nucleosome-forming sequences contain short periodic runs of GC base pairs. Second, we show that nucleosome phasing is most frequently predicted flanking nucleosome-free regions. The results suggest that the major mechanism of nucleosome positioning in vivo is boundary-event-driven and affirm the classical statistical positioning theory of nucleosome organization
Chromatin loop anchors are associated with genome instability in cancer and recombination hotspots in the germline
Abstract Background Chromatin loops form a basic unit of interphase nuclear organization, with chromatin loop anchor points providing contacts between regulatory regions and promoters. However, the mutational landscape at these anchor points remains under-studied. Here, we describe the unusual patterns of somatic mutations and germline variation associated with loop anchor points and explore the underlying features influencing these patterns. Results Analyses of whole genome sequencing datasets reveal that anchor points are strongly depleted for single nucleotide variants (SNVs) in tumours. Despite low SNV rates in their genomic neighbourhood, anchor points emerge as sites of evolutionary innovation, showing enrichment for structural variant (SV) breakpoints and a peak of SNVs at focal CTCF sites within the anchor points. Both CTCF-bound and non-CTCF anchor points harbour an excess of SV breakpoints in multiple tumour types and are prone to double-strand breaks in cell lines. Common fragile sites, which are hotspots for genome instability, also show elevated numbers of intersecting loop anchor points. Recurrently disrupted anchor points are enriched for genes with functions in cell cycle transitions and regions associated with predisposition to cancer. We also discover a novel class of CTCF-bound anchor points which overlap meiotic recombination hotspots and are enriched for the core PRDM9 binding motif, suggesting that the anchor points have been foci for diversity generated during recent human evolution. Conclusions We suggest that the unusual chromatin environment at loop anchor points underlies the elevated rates of variation observed, marking them as sites of regulatory importance but also genomic fragility
Genome-wide binding of the CRISPR endonuclease Cas9 in mammalian cells
Bacterial type II CRISPR-Cas9 systems have been widely adapted for RNA-guided genome editing and transcription regulation in eukaryotic cells, yet their in vivo target specificity is poorly understood. Here we mapped genome-wide binding sites of a catalytically inactive Cas9 (dCas9) from Streptococcus pyogenes loaded with single guide RNAs (sgRNAs) in mouse embryonic stem cells (mESCs). Each of the four sgRNAs we tested targets dCas9 to between tens and thousands of genomic sites, frequently characterized by a 5-nucleotide seed region in the sgRNA and an NGG protospacer adjacent motif (PAM). Chromatin inaccessibility decreases dCas9 binding to other sites with matching seed sequences; thus 70% of off-target sites are associated with genes. Targeted sequencing of 295 dCas9 binding sites in mESCs transfected with catalytically active Cas9 identified only one site mutated above background levels. We propose a two-state model for Cas9 binding and cleavage, in which a seed match triggers binding but extensive pairing with target DNA is required for cleavage.National Institutes of Health (U.S.) (Grant RO1-GM34277)National Institutes of Health (U.S.) (Grant R01-CA133404)National Cancer Institute (U.S.) (Grant PO1-CA42063)National Cancer Institute (U.S.) (Cancer Center Support (Core) Grant P30-CA14051)National Institutes of Health (U.S.) (Director's Pioneer Award 1DP1-MH100706)Damon Runyon Cancer Research FoundationKinship Foundation. Searle Scholars ProgramSimons Foundatio
- …