105 research outputs found
Deep sequencing approaches for the analysis of prokaryotic transcriptional boundaries and dynamics
The identification of the protein-coding regions of a genome is straightforward due to the universality of start and stop codons. However, the boundaries of the transcribed regions, conditional operon structures, non-coding RNAs and the dynamics of transcription, such as pausing of elongation, are non-trivial to identify, even in the comparatively simple genomes of prokaryotes. Traditional methods for the study of these areas, such as tiling arrays, are noisy, labour-intensive and lack the resolution required for densely-packed bacterial genomes. Recently, deep sequencing has become increasingly popular for the study of the transcriptome due to its lower costs, higher accuracy and single nucleotide resolution. These methods have revolutionised our understanding of prokaryotic transcriptional dynamics. Here, we review the deep sequencing and data analysis techniques that are available for the study of transcription in prokaryotes, and discuss the bioinformatic considerations of these analyses
Identification of active regulatory regions from DNA methylation data
We have recently shown that transcription factor binding leads to defined reduction in DNA methylation, allowing for the identification of active regulatory regions from high-resolution methylomes. Here, we present MethylSeekR, a computational tool to accurately identify such footprints from bisulfite-sequencing data. Applying our method to a large number of published human methylomes, we demonstrate its broad applicability and generalize our previous findings from a neuronal differentiation system to many cell types and tissues. MethylSeekR is available as an R package at www.bioconductor.or
QuasR: quantification and annotation of short reads in R
Summary: QuasR is a package for the integrated analysis of high-throughput sequencing data in R, covering all steps from read preprocessing, alignment and quality control to quantification. QuasR supports different experiment types (including RNA-seq, ChIP-seq and Bis-seq) and analysis variants (e.g. paired-end, stranded, spliced and allele-specific), and is integrated in Bioconductor so that its output can be directly processed for statistical analysis and visualization. Availability and implementation: QuasR is implemented in R and C/C++. Source code and binaries for major platforms (Linux, OS X and MS Windows) are available from Bioconductor (www.bioconductor.org/packages/release/bioc/html/QuasR.html). The package includes a ‘vignette' with step-by-step examples for typical work flows. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin
Twin methodology in epigenetic studies
Since the final decades of the last century, twin studies have made a remarkable contribution to the genetics of human complex traits and diseases. With the recent rapid development in modern biotechnology of high-throughput genetic and genomic analyses, twin modelling is expanding from analysis of diseases to molecular phenotypes in functional genomics especially in epigenetics, a thriving field of research that concerns the environmental regulation of gene expression through DNA methylation, histone modification, microRNA and long non-coding RNA expression, etc. The application of the twin method to molecular phenotypes offers new opportunities to study the genetic (nature) and environmental (nurture) contributions to epigenetic regulation of gene activity during developmental, ageing and disease processes. Besides the classical twin model, the case co-twin design using identical twins discordant for a trait or disease is becoming a popular and powerful design for epigenome-wide association study in linking environmental exposure to differential epigenetic regulation and to disease status while controlling for individual genetic make-up. It can be expected that novel uses of twin methods in epigenetic studies are going to help with efficiently unravelling the genetic and environmental basis of epigenomics in human complex diseases.</jats:p
Genetic, environmental and stochastic factors in monozygotic twin discordance with a focus on epigenetic differences
Genetic-epidemiological studies on monozygotic (MZ) twins have been used for decades to tease out the relative contributions of genes and the environment to a trait. Phenotypic discordance in MZ twins has traditionally been ascribed to non-shared environmental factors acting after birth, however recent data indicate that this explanation is far too simple. In this paper, we review other reasons for discordance, including differences in the in utero environment, genetic mosaicism, and stochastic factors, focusing particularly on epigenetic discordance. Epigenetic differences are gaining increasing recognition. Although it is clear that in specific cases epigenetic alterations provide a causal factor in disease etiology, the overall significance of epigenetics in twin discordance remains unclear. It is also challenging to determine the causality and relative contributions of environmental, genetic, and stochastic factors to epigenetic variability. Epigenomic profiling studies have recently shed more light on the dynamics of temporal methylation change and methylome heritability, yet have not given a definite answer regarding their relevance to disease, because of limitations in establishing causality. Here, we explore the subject of epigenetics as another component in human phenotypic variability and its links to disease focusing particularly on evidence from MZ twin studies
Tissue of origin determines cancer-associated CpG island promoter hypermethylation patterns
ABSTRACT: BACKGROUND: Aberrant CpG island promoter DNA hypermethylation is frequently observed in cancer and is believed to contribute to tumor progression by silencing the expression of tumor suppressor genes. Previously, we observed that promoter hypermethylation in breast cancer reflects cell lineage rather than tumor progression and occurs at genes that are already repressed in a lineage-specific manner. To investigate the generality of our observation we analyzed the methylation profiles of 1,154 cancers from 7 different tissue types. RESULTS: We find that 1,009 genes are prone to hypermethylation in these 7 types of cancer. Nearly half of these genes varied in their susceptibility to hypermethylation between different cancer types. We show that the expression status of hypermethylation prone genes in the originator tissue determines their propensity to become hypermethylated in cancer; specifically, genes that are normally repressed in a tissue are prone to hypermethylation in cancers derived from that tissue. We also show that the promoter regions of hypermethylation-prone genes are depleted of repetitive elements and that DNA sequence around the same promoters is evolutionarily conserved. We propose that these two characteristics reflect tissue-specific gene promoter architecture regulating the expression of these hypermethylation prone genes in normal tissues. CONCLUSIONS: As aberrantly hypermethylated genes are already repressed in pre-cancerous tissue, we suggest that their hypermethylation does not directly contribute to cancer development via silencing. Instead aberrant hypermethylation reflects developmental history and the perturbation of epigenetic mechanisms maintaining these repressed promoters in a hypomethylated state in normal cells.Publisher PDFPeer reviewe
Nucleosome repositioning links DNA (de)methylation and differential CTCF binding during stem cell development
During differentiation of embryonic stem cells, chromatin reorganizes to establish cell type-specific expression programs. Here, we have dissected the linkages between DNA methylation (5mC), hydroxymethylation (5hmC), nucleosome repositioning, and binding of the transcription factor CTCF during this process. By integrating MNase-seq and ChIP-seq experiments in mouse embryonic stem cells (ESC) and their differentiated counterparts with biophysical modeling, we found that the interplay between these factors depends on their genomic context. The mostly unmethylated CpG islands have reduced nucleosome occupancy and are enriched in cell type-independent binding sites for CTCF. The few remaining methylated CpG dinucleotides are preferentially associated with nucleosomes. In contrast, outside of CpG islands most CpGs are methylated, and the average methylation density oscillates so that it is highest in the linker region between nucleosomes. Outside CpG islands, binding of TET1, an enzyme that converts 5mC to 5hmC, is associated with labile, MNase-sensitive nucleosomes. Such nucleosomes are poised for eviction in ESCs and become stably bound in differentiated cells where the TET1 and 5hmC levels go down. This process regulates a class of CTCF binding sites outside CpG islands that are occupied by CTCF in ESCs but lose the protein during differentiation. We rationalize this cell type-dependent targeting of CTCF with a quantitative biophysical model of competitive binding with the histone octamer, depending on the TET1, 5hmC, and 5mC state
Read Annotation Pipeline for High-Throughput Sequencing Data
Mapping reads to a reference sequence is a common step when analyzing allele effects in high throughput sequencing data. The choice of reference is critical because its effect on quantitative sequence analysis is non-negligible. Recent studies suggest aligning to a single standard reference sequence, as is common practice, can lead to an underlying bias depending the genetic distances of the target sequences from the reference. To avoid this bias researchers have resorted to using modified reference sequences. Even with this improvement, various limitations and problems remain unsolved, which include reduced mapping ratios, shifts in read mappings, and the selection of which variants to include to remove biases. To address these issues, we propose a novel and generic multi-alignment pipeline. Our pipeline integrates the genomic variations from known or suspected founders into separate reference sequences and performs alignments to each one. By mapping reads to multiple reference sequences and merging them afterward, we are able to rescue more reads and diminish the bias caused by using a single common reference. Moreover, the genomic origin of each read is determined and annotated during the merging process, providing a better source of information to assess differential expression than simple allele queries at known variant positions. Using RNA-seq of a diallel cross, we compare our pipeline with the single reference pipeline and demonstrate our advantages of more aligned reads and a higher percentage of reads with assigned origins. These authors contributed equally to this work
- …
