Search CORE

16 research outputs found

HOT or not: Examining the basis of high-occupancy target regions

Author: Akalin A.
Franke V.
Uyar B.
Wreczycka K.
Wurmus R.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 31/07/2017
Field of study

High-occupancy target (HOT) regions are the segments of the genome with unusually high number of transcription factor binding sites. These regions are observed in multiple species and thought to have biological importance due to high transcription factor occupancy. Furthermore, they coincide with house-keeping gene promoters and the associated genes are stably expressed across multiple cell types. Despite these features, HOT regions are solemnly defined using ChIP-seq experiments and shown to lack canonical motifs for transcription factors that are thought to be bound there. Although, ChIP-seq experiments are the golden standard for finding genome-wide binding sites of a protein, they are not noise free. Here, we show that HOT regions are likely to be ChIP-seq artifacts and they are similar to previously proposed “hyper-ChIPable” regions. Using ChIP-seq data sets for knocked-out transcription factors, we demonstrate presence of false positive signals on HOT regions. We observe sequence characteristics and genomic features that are discriminatory of HOT regions, such as GC/CpG-rich k-mers and enrichment of RNA-DNA hybrids (R-loops) and DNA tertiary structures (G-quadruplex DNA). The artificial ChIP-seq enrichment on HOT regions could be associated to these discriminatory features. Furthermore, we propose strategies to deal with such artifacts for the future ChIP-seq studies

MDC Repository

Strategies for analyzing bisulfite sequencing data

Author: Akalin A.
Assenov Y.
Gosdschan A.
Gruening B.
Wreczycka K.
Yusuf D.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 09/08/2017
Field of study

DNA methylation is one of the main epigenetic modifications in the eukaryotic genome and has been shown to play a role in cell-type specific regulation of gene expression, and therefore cell-type identity. Bisulfite sequencing is the gold-standard for measuring methylation over the genomes of interest. Here, we review several techniques used for the analysis of high-throughput bisulfite sequencing. We introduce specialized short-read alignment techniques as well as pre/post-alignment quality check methods to ensure data quality. Furthermore, we discuss subsequent analysis steps after alignment. We introduce various differential methylation methods and compare their performance using simulated and real bisulfite-sequencing datasets. We also discuss the methods used to segment methylomes in order to pinpoint regulatory regions. We introduce annotation methods that can be used further classification of regions returned by segmentation or differential methylation methods. Lastly, we review software packages that implement strategies to efficiently deal with large bisulfite sequencing datasets locally and also discuss online analysis workflows that do not require any prior programming skills. The analysis strategies described in this review will guide researchers at any level to the best practices of bisulfite sequencing analysis

MDC Repository

Strategies for analyzing bisulfite sequencing data

Author: Akalin A.
Assenov Y.
Gosdschan A.
Grüning B.
Wreczycka K.
Yusuf D.
Publication venue: 'Elsevier BV'
Publication date: 10/11/2017
Field of study

DNA methylation is one of the main epigenetic modifications in the eukaryotic genome; it has been shown to play a role in cell-type specific regulation of gene expression, and therefore cell-type identity. Bisulfite sequencing is the gold-standard for measuring methylation over the genomes of interest. Here, we review several techniques used for the analysis of high-throughput bisulfite sequencing. We introduce specialized short-read alignment techniques as well as pre/post-alignment quality check methods to ensure data quality. Furthermore, we discuss subsequent analysis steps after alignment. We introduce various differential methylation methods and compare their performance using simulated and real bisulfite sequencing datasets. We also discuss the methods used to segment methylomes in order to pinpoint regulatory regions. We introduce annotation methods that can be used for further classification of regions returned by segmentation and differential methylation methods. Finally, we review software packages that implement strategies to efficiently deal with large bisulfite sequencing datasets locally and we discuss online analysis workflows that do not require any prior programming skills. The analysis strategies described in this review will guide researchers at any level to the best practices of bisulfite sequencing analysis

MDC Repository

HOT or not: examining the basis of high-occupancy target regions

Author: Akalin A.
Bulut S.
Franke V.
Tursun B.
Uyar B.
Wreczycka K.
Wurmus R.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 20/06/2019
Field of study

High-occupancy target (HOT) regions are segments of the genome with unusually high number of transcription factor binding sites. These regions are observed in multiple species and thought to have biological importance due to high transcription factor occupancy. Furthermore, they coincide with house-keeping gene promoters and consequently associated genes are stably expressed across multiple cell types. Despite these features, HOT regions are solemnly defined using ChIP-seq experiments and shown to lack canonical motifs for transcription factors that are thought to be bound there. Although, ChIP-seq experiments are the golden standard for finding genome-wide binding sites of a protein, they are not noise free. Here, we show that HOT regions are likely to be ChIP-seq artifacts and they are similar to previously proposed 'hyper-ChIPable' regions. Using ChIP-seq data sets for knocked-out transcription factors, we demonstrate presence of false positive signals on HOT regions. We observe sequence characteristics and genomic features that are discriminatory of HOT regions, such as GC/CpG-rich k-mers, enrichment of RNA-DNA hybrids (R-loops) and DNA tertiary structures (G-quadruplex DNA). The artificial ChIP-seq enrichment on HOT regions could be associated to these discriminatory features. Furthermore, we propose strategies to deal with such artifacts for the future ChIP-seq studies

Crossref

MDC Repository

Reproducible genomics analysis pipelines with GNU Guix

Author: Akalin A.
Franke V.
Gosdschan A.
Osberg B.
Ronen J.
Uyar B.
Wreczycka K.
Wurmus R.
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 21/04/2018
Field of study

In bioinformatics, as well as other computationally-intensive research fields, there is a need for workflows that can reliably produce consistent output, independent of the software environment or configuration settings of the machine on which they are executed. Indeed, this is essential for controlled comparison between different observations or for the wider dissemination of workflows. Providing this type of reproducibility, however, is often complicated by the need to accommodate the myriad dependencies included in a larger body of software, each of which generally come in various versions. Moreover, in many fields (bioinformatics being a prime example), these versions are subject to continual change due to rapidly evolving technologies, further complicating problems related to reproducibility. Here, we propose a principled approach for building analysis pipelines and managing their dependencies. As a case study to demonstrate the utility of our approach, we present a set of highly reproducible pipelines for the analysis of RNA-seq, ChIP-seq, Bisulfite-seq, and single-cell RNA-seq. All pipelines process raw experimental data, and generate reports containing publication-ready plots and figures, with interactive report elements and standard observables. Users may install these highly reproducible packages and apply them to their own datasets without any special computational expertise beyond the use of the command line. We hope such a toolkit will provide immediate benefit to laboratory workers wishing to process their own data sets or bioinformaticians seeking to automate all, or parts of, their analyses. In the long term, we hope our approach to reproducibility will serve as a blueprint for reproducible workflows in other areas. Our pipelines, along with their corresponding documentation and sample reports, are available at http://bioinformatics.mdc-berlin.de/pig

MDC Repository

PiGx: reproducible genomics analysis pipelines with GNU Guix

Author: Akalin A.
Franke V.
Gosdschan A.
Osberg B.
Ronen J.
Uyar B.
Wreczycka K.
Wurmus R.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 21/04/2018
Field of study

In bioinformatics, as well as other computationally-intensive research fields, there is a need for workflows that can reliably produce consistent output, from known sources, independent of the software environment or configuration settings of the machine on which they are executed. Indeed, this is essential for controlled comparison between different observations or for the wider dissemination of workflows. Providing this type of reproducibility and traceability, however, is often complicated by the need to accommodate the myriad dependencies included in a larger body of software, each of which generally come in various versions. Moreover, in many fields (bioinformatics being a prime example), these versions are subject to continual change due to rapidly evolving technologies, further complicating problems related to reproducibility. Here, we propose a principled approach for building analysis pipelines and managing their dependencies with GNU Guix. As a case study to demonstrate the utility of our approach, we present a set of highly reproducible pipelines called PiGx for the analysis of RNA-seq, ChIP-seq, Bisulfite-seq, and single-cell RNA-seq. All pipelines process raw experimental data, and generate reports containing publication-ready plots and figures, with interactive report elements and standard observables. Users may install these highly reproducible packages and apply them to their own datasets without any special computational expertise beyond the use of the command line. We hope such a toolkit will provide immediate benefit to laboratory workers wishing to process their own data sets or bioinformaticians seeking to automate all, or parts of, their analyses. In the long term, we hope our approach to reproducibility will serve as a blueprint for reproducible workflows in other areas. Our pipelines, along with their corresponding documentation and sample reports, are available at http://bioinformatics.mdc-berlin.de/pigx

MDC Repository

Cardiovascular disease biomarkers derived from circulating cell-free DNA methylation

Author: Akalin A.
Arnal H.G.
Blume A.
Cuadrat R.R.C.
Ebenal V.
Gündüz I.B.
Haghikia A.
Hartung J.
Jakobs K.
Kratzer A.
Landmesser Ulf
Leistner D.M.
Mauno T.
Meteva D.
Moobed M.
Osberg B.
Rathgeber A.C.
Seppelt C.
Wreczycka K.
Publication venue: Oxford University Press
Publication date: 01/06/2023
Field of study

Acute coronary syndrome (ACS) remains a major cause of worldwide mortality. The syndrome occurs when blood flow to the heart muscle is decreased or blocked, causing muscle tissues to die or malfunction. There are three main types of ACS: Non-ST-elevation myocardial infarction, ST-elevation myocardial infarction, and unstable angina. The treatment depends on the type of ACS, and this is decided by a combination of clinical findings, such as electrocardiogram and plasma biomarkers. Circulating cell-free DNA (ccfDNA) is proposed as an additional marker for ACS since the damaged tissues can release DNA to the bloodstream. We used ccfDNA methylation profiles for differentiating between the ACS types and provided computational tools to repeat similar analysis for other diseases. We leveraged cell type specificity of DNA methylation to deconvolute the ccfDNA cell types of origin and to find methylation-based biomarkers that stratify patients. We identified hundreds of methylation markers associated with ACS types and validated them in an independent cohort. Many such markers were associated with genes involved in cardiovascular conditions and inflammation. ccfDNA methylation showed promise as a non-invasive diagnostic for acute coronary events. These methods are not limited to acute events, and may be used for chronic cardiovascular diseases as well

MDC Repository

Occupancy maps of 208 chromatin-associated proteins in one human cell type

Author: A Jolma
A Liberzon
A Mathelier
A Mortazavi
A Sandelin
A Subramanian
A Visel
Ali Mortazavi
Andrew A. Hardigan
AP Boyle
AR Oliphant
AR Quinlan
AR Quinlan
B Wei
Barbara J. Wold
C Fletez-Brant
C Moorman
C. Luke Messer
Camden S. Jansen
Candice J. Coppola
CY McLean
D Hnisz
D Mellacheruvu
D Panne
D Savic
Daniel Savic
DS Johnson
E Hervouet
E Morgunova
E Wingender
E. Christopher Partridge
EC Partridge
EL Huttlin
EM Mendenhall
Emma C. Dean
Eric M. Mendenhall
F Pedregosa
F Ramírez
G Robertson
H Li
H Li
H Shin
H Wickham
I Dror
JB Black
Jeremy W. Prokop
JM Vaquerizas
JS Dasen
K Günther
K Wreczycka
Kimberly M. Newberry
KS Zaret
L Teytelman
Laurel A. Brandsmeier
LTM Dao
M Conacci-Sorrell
M Ghandi
M Iwafuchi-Doi
M Teng
M Uhlén
Mark Mackiewicz
MB Gerstein
MB Gerstein
MN Wright
MS Kowalczyk
MT Weirauch
N Faherty
N Yosef
P Machanick
PV Kharchenko
R Andersson
R Cowper-Sal‧lari
R Worsley Hunt
R Worsley Hunt
RI Sherwood
Richard M. Myers
RK Dale
Ryne C. Ramaker
S Gupta
SA Lambert
Sarah K. Meadows
Say-Tar Goh
SG Landt
Shan Jiang
Surya B. Chhetri
TL Bailey
TS Mikkelsen
V Busskamp
W Ma
WA Whyte
WI Choi
WJR Longabaugh
X Chen
Y Zhang
Y Zhang
Z Gu
Z Liang
Publication venue: Nature Publishing Group
Publication date: 01/07/2020
Field of study

Transcription factors are DNA-binding proteins that have key roles in gene regulation. Genome-wide occupancy maps of transcriptional regulators are important for understanding gene regulation and its effects on diverse biological processes. However, only a minority of the more than 1,600 transcription factors encoded in the human genome has been assayed. Here we present, as part of the ENCODE (Encyclopedia of DNA Elements) project, data and analyses from chromatin immunoprecipitation followed by high-throughput sequencing (ChIP–seq) experiments using the human HepG2 cell line for 208 chromatin-associated proteins (CAPs). These comprise 171 transcription factors and 37 transcriptional cofactors and chromatin regulator proteins, and represent nearly one-quarter of CAPs expressed in HepG2 cells. The binding profiles of these CAPs form major groups associated predominantly with promoters or enhancers, or with both. We confirm and expand the current catalogue of DNA sequence motifs for transcription factors, and describe motifs that correspond to other transcription factors that are co-enriched with the primary ChIP target. For example, FOX family motifs are enriched in ChIP–seq peaks of 37 other CAPs. We show that motif content and occupancy patterns can distinguish between promoters and enhancers. This catalogue reveals high-occupancy target regions at which many CAPs associate, although each contains motifs for only a minority of the numerous associated transcription factors. These analyses provide a more complete overview of the gene regulatory networks that define this cell type, and demonstrate the usefulness of the large-scale production efforts of the ENCODE Consortium

Crossref

eScholarship - University of California

Caltech Authors