18 research outputs found

    Differential methylation analysis of reduced representation bisulfite sequencing experiments using edgeR [version 2; referees: 2 approved, 1 approved with reservations]

    Get PDF
    Cytosine methylation is an important DNA epigenetic modification. In vertebrates, methylation occurs at CpG sites, which are dinucleotides where a cytosine is immediately followed by a guanine in the DNA sequence from 5' to 3'. When located in the promoter region of a gene, DNA methylation is often associated with transcriptional silencing of the gene. Aberrant DNA methylation is associated with the development of various diseases such as cancer. Bisulfite sequencing (BS-seq) is the current "gold-standard" technology for high-resolution profiling of DNA methylation. Reduced representation bisulfite sequencing (RRBS) is an efficient form of BS-seq that targets CpG-rich DNA regions in order to save sequencing costs. A typical bioinformatics aim is to identify CpGs that are differentially methylated (DM) between experimental conditions. This workflow demonstrates that differential methylation analysis of RRBS data can be conducted using software and methodology originally developed for RNA-seq data. The RNA-seq pipeline is adapted to methylation by adding extra columns to the design matrix to account for read coverage at each CpG, after which the RRBS and RNA-seq pipelines are almost identical. This approach is statistically natural and gives analysts access to a rich collection of analysis tools including generalized linear models, gene set testing and pathway analysis. The article presents a complete start to finish case study analysis of RRBS profiles of different cell populations from the mouse mammary gland using the Bioconductor package edgeR. We show that lineage-committed cells are typically hyper-methylated compared to progenitor cells and this is true on all the autosomes but not the sex chromosomes. We demonstrate a strong negative correlation between methylation of promoter regions and gene expression as measured by RNA-seq for the same cell types, showing that methylation is a regulatory mechanism involved in epithelial linear commitment

    Differential expression analysis of complex RNA-seq experiments

    No full text
    © 2013 Dr. Yunshun ChenAs the cost of DNA sequencing decreases, sequencing technologies become more and more attractive to many researchers as platforms for studying gene expression. Although there exist many different combinations of technologies and protocols, we use the term `RNA-Seq' to denote the very broad class of experiments in which gene expression is studied by sequencing RNA. A very common goal for analysing RNA-Seq experiments is to identify genes that are differentially expressed across specified conditions in a designed experiment. It has proven to be very challenging for statisticians since it is a high-dimensional multiple testing problem in which one or more tests are performed for each of tens of thousands of genes. RNA-Seq data takes the form of integer counts, and there are a few issues that we need to address. Firstly, it is important to model the variance of the RNA-Seq count data correctly. Secondly, there is a need to estimate the variabilities of gene counts accounting for all sources of variation for any complex experimental design. Thirdly, the variation between biological replicates needs to be estimated as reliably as possible from a very small number of replicate libraries, and different genes should allow different degrees of biological variation. Finally, a powerful testing method is required for the purpose of detecting as many differentially expressed genes as possible while controlling the false discovery rate. These issues will be fully discussed in the thesis. In short, the mixture structure of biological variation and measurement error in the RNA-Seq counts implies a quadratic mean-variance relationship, hence negative binomial models for modelling the variance. Generalized linear models are used as a flexible statistical framework for the analysis of read counts from RNA-Seq gene expression studies, which provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biological variation between RNA samples is estimated separately from the technical variation associated with sequencing technologies using an adjusted profile likelihood approach. Also, the high-dimensional nature of sequencing data allows possibilities for borrowing information from the ensemble of genes which can assist in inference about each gene individually. A novel weighted likelihood empirical Bayes method is proposed to allow each gene to have its own specific variability while learning from the others. Finally, testing hypothesis while accounting for the variability of estimated parameters is discussed, and new methods for testing for differential expression are proposed. The analysis pipeline proposed in this thesis is implemented in the edgeR package of the Bioconductor project. The software implementation and computational algorithms designed for the purpose of computational efficiency are also discussed in this thesis. The methods developed here can be applied to count data arising from DNA-Seq applications, including ChIP-Seq for epigenetic marks and DNA methylation analyses

    Differential methylation analysis of reduced representation bisulfite sequencing experiments using edgeR [version 1; referees: 2 approved, 1 approved with reservations]

    No full text
    Studies in epigenetics have shown that DNA methylation is a key factor in regulating gene expression. Aberrant DNA methylation is often associated with DNA instability, which could lead to development of diseases such as cancer. DNA methylation typically occurs in CpG context. When located in a gene promoter, DNA methylation often acts to repress transcription and gene expression. The most commonly used technology of studying DNA methylation is bisulfite sequencing (BS-seq), which can be used to measure genomewide methylation levels on the single-nucleotide scale. Notably, BS-seq can also be combined with enrichment strategies, such as reduced representation bisulfite sequencing (RRBS), to target CpG-rich regions in order to save per-sample costs. A typical DNA methylation analysis involves identifying differentially methylated regions (DMRs) between different experimental conditions. Many statistical methods have been developed for finding DMRs in BS-seq data. In this workflow, we propose a novel approach of detecting DMRs using edgeR. By providing a complete analysis of RRBS profiles of epithelial populations in the mouse mammary gland, we will demonstrate that differential methylation analyses can be fit into the existing pipelines specifically designed for RNA-seq differential expression studies. In addition, the edgeR generalized linear model framework offers great flexibilities for complex experimental design, while still accounting for the biological variability. The analysis approach illustrated in this article can be applied to any BS-seq data that includes some replication, but it is especially appropriate for RRBS data with small numbers of biological replicates

    Electrophoto-catalytic decoupled radical relay enables highly efficient and enantioselective benzylic C-H functionalization

    No full text
    Asymmetric sp3 C-H functionalization has been demonstrated to substantially expedite target molecule synthesis, spanning from feedstocks upgradation to late-stage modification of complex molecules. Herein, we report a highly efficient and sustainable method for enantioselective benzylic C-H cyanation by merging electrophoto- and copper-catalysis. A novel catalytic system allows one to independently regulate the hydrogen atom transfer step for benzylic radical formation and speciation of Cu(II)/Cu(I) to effectively capture the transient radical intermediate, through tuning the electronic property of anthraquinone-type photocatalyst and simply modulating the applied current, respectively. Such decoupled radical relay catalysis enables a unified approach for enantioselective benzylic C-H cyanation of alkylarenes with wide range electron property from E-poor to super E-rich, most of which are much less reactive or even unreactive using the existing method relied on coupled radical relay. Moreover, the current protocol is also amenable to late-stage functionalization of bioactive molecules, including natural products and drugs

    Lung Basal Stem Cells Rapidly Repair DNA Damage Using the Error-Prone Nonhomologous End-Joining Pathway

    Get PDF
    Lung squamous cell carcinoma (SqCC), the second most common subtype of lung cancer, is strongly associated with tobacco smoking and exhibits genomic instability. The cellular origins and molecular processes that contribute to SqCC formation are largely unexplored. Here we show that human basal stem cells (BSCs) isolated from heavy smokers proliferate extensively, whereas their alveolar progenitor cell counterparts have limited colony-forming capacity. We demonstrate that this difference arises in part because of the ability of BSCs to repair their DNA more efficiently than alveolar cells following ionizing radiation or chemical-induced DNA damage. Analysis of mice harbouring a mutation in the DNA-dependent protein kinase catalytic subunit (DNA-PKcs), a key enzyme in DNA damage repair by nonhomologous end joining (NHEJ), indicated that BSCs preferentially repair their DNA by this error-prone process. Interestingly, polyploidy, a phenomenon associated with genetically unstable cells, was only observed in the human BSC subset. Expression signature analysis indicated that BSCs are the likely cells of origin of human SqCC and that high levels of NHEJ genes in SqCC are correlated with increasing genomic instability. Hence, our results favour a model in which heavy smoking promotes proliferation of BSCs, and their predilection for error-prone NHEJ could lead to the high mutagenic burden that culminates in SqCC. Targeting DNA repair processes may therefore have a role in the prevention and therapy of SqCC

    Human and mouse BSCs express markers of nonhomologous end joining.

    No full text
    <p>(A) Immunohistochemistry for RAD51, an early marker of homologous recombination, on WT mouse trachea and lung 1 h post γ-irradiation (6 Gy). The insert is a positive control, a mammary tumour from a MMTV-cre;Brca1<sup>fl/fl</sup>p53<sup>+/-</sup> mouse. Black arrows indicate RAD51-positive nuclei. Representative images from <i>n</i> = 3 mice at each time point. Scale bar = 100 μm. (B) Expression of key genes in the NHEJ repair pathway in human BSCs and AT2 cells. <i>n</i> = 3 patients (a 64-y-old male exsmoker, an 83-y-old male exsmoker, and a 53-y-old male current smoker). RPKM, reads per kilobase per million mapped reads. Paired <i>t</i> test. (C) Immunofluorescence staining of phospho-DNA-PKcs and T1α in human airways and alveoli of three patients. Patient 1, a 56-y-old male smoker; patient 2, a 69-y-old female exsmoker; patient 3, a 70-y-old male smoker. Inset, isotype control. Scale bar = 20 μm. (D) Immunofluorescence staining of phospho-DNA-PKcs and T1α in trachea and lung of WT mice following IR (6 Gy). Representative images of one of <i>n</i> = 3 mice at each time point. Inset, isotype control. Scale bar = 20 μm. The underlying data for panel B can be found in the <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.2000731#pbio.2000731.s009" target="_blank">S1 Data</a> file.</p

    BSCs use error-prone nonhomologous end joining to repair DNA double-strand breaks.

    No full text
    <p>(A) Immunofluorescence staining of γH2AX and T1α in the lungs and tracheas of WT and SCID<sup><i>prkdc</i></sup> mice that are nonirradiated or 1, 4 or 8 h post irradiation (6 Gy). Representative images of <i>n</i> = 3 mice at each time point. Arrows indicate γH2AX<sup>+</sup> T1α<sup>+</sup> BSCs. Scale bar = 20 μm. (B) Representative FACS plots showing the expression of γH2AX in EpCAM<sup>+</sup> lung epithelial cells and T1α<sup>+</sup> tracheal BSCs in WT and SCID<sup><i>prkdc</i></sup> mice 0, 4, and 7 h following IR (6 Gy). The timing corresponds to the number of hours between time of irradiation and generation of single-cell suspension for FACS analysis. (C) Percentage of γH2AX-positive cells in WT and SCID<sup><i>prkdc</i></sup> mice in EpCAM<sup>+</sup> lung epithelial cells and T1α<sup>+</sup> tracheal BSCs 0, 4, and 7 h following irradiation. <i>n</i> = 6 animals per group. Student’s <i>t</i> test. The timing corresponds to the number of hours between time of irradiation and generation of single-cell suspension for FACS analysis. (D) Immunofluorescence staining of cleaved caspase 3 (CC3), T1α, and 4′,6-diamidino-2-phenylindole (DAPI) in WT and SCID<sup><i>prkdc</i></sup> tracheas that are nonirradiated or 4, 24, or 96 h post irradiation (6 Gy). Representative images of <i>n</i> = 3 mice at each time point. Arrows indicate CC3<sup>+</sup> T1α<sup>+</sup> BSCs. Scale bar = 20 μm. (E) FACS detection of cells in subG1 in tracheal BSCs (T1α<sup>+</sup>) cells isolated from WT or SCID<sup><i>prkdc</i></sup> mice 24 h post irradiation (6 Gy). <i>n</i> = 7 mice for WT mice and <i>n</i> = 12 for SCID<sup><i>prkdc</i></sup> mice. Student’s <i>t</i> test. The underlying data for panels C and E can be found in the <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.2000731#pbio.2000731.s009" target="_blank">S1 Data</a> file.</p

    Human lung BSCs are the putative cells of origin of lung squamous cell carcinoma.

    No full text
    <p>(A) Representative histogram of intracellular DAPI staining of BSCs and AT2 cells isolated from a 56-y-old male exsmoker patient. Gates indicate 2N, 4N, and polyploid cells. (B) Proportion of polyploidy cells in the BSC and AT2 subsets. <i>n</i> = 3 patients (a 69-y-old female exsmoker, a 56-y-old male exsmoker, and a 70-y-old female exsmoker). Paired <i>t</i> test. (C) Boxplots of human lung BSC expression scores by lung tumour subtypes (ADC, adenocarcinoma; SCLC, small cell lung cancer; SqCC, squamous cell carcinoma). The width of each box indicates the sample size. (D) Barcode plot showing strong correlation of the human lung BSC expression signature with that of SqCCs (ROAST <i>p</i> = 0.0001). Genes are sorted left to right from most up- to most down-regulated in SqCC relative to all other cancer subtypes. Positive BSC signature genes are marked with vertical red bars, and negative signatures genes are marked in blue. Variable-height bars show log-fold-change strength for each signature gene. (E) Fold changes in the expression of genes frequently altered in lung SqCC between human BSCs and other human lung epithelial cell types. <i>n</i> = 3 patients (a 64-y-old male exsmoker, an 83-y-old male exsmoker, and a 53-y-old male current smoker). (F) Violin plots showing expression levels of <i>PRKDC</i> and <i>XRCC6</i> in normal lung tissue (<i>n</i> = 54), lung ADCs (<i>n</i> = 125), and lung SqCCs (<i>n</i> = 224) from The Cancer Genome Atlas (TCGA). Violin bodies show log2 counts per million (log CPM) expression values as smoothed densities. All pairwise <i>p</i>-values are <10<sup>−6</sup> by moderated <i>t</i> tests. (G) Proportion of genome altered versus <i>PRKDC</i> and <i>XRCC6</i> expression levels in TCGA lung SqCC data (<i>n</i> = 179). Expression levels are split into quartiles. Significance was determined by Student’s <i>t</i> tests. The underlying data for panels A, B, C, D, E, and G can be found in the <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.2000731#pbio.2000731.s009" target="_blank">S1 Data</a> file.</p
    corecore