110 research outputs found
An evaluation of processing methods for HumanMethylation450 BeadChip data
BackgroundIllumina's HumanMethylation450 arrays provide the most cost-effective means of high-throughput DNA methylation analysis. As with other types of microarray platforms, technical artifacts are a concern, including background fluorescence, dye-bias from the use of two color channels, bias caused by type I/II probe design, and batch effects. Several approaches and pipelines have been developed, either targeting a single issue or designed to address multiple biases through a combination of methods. We evaluate the effect of combining separate approaches to improve signal processing.ResultsIn this study nine processing methods, including both within- and between- array methods, are applied and compared in four datasets. For technical replicates, we found both within- and between-array methods did a comparable job in reducing variance across replicates. For evaluating biological differences, within-array processing always improved differential DNA methylation signal detection over no processing, and always benefitted from performing background correction first. Combinations of within-array procedures were always among the best performing methods, with a slight advantage appearing for the between-array method Funnorm when batch effects explained more variation in the data than the methylation alterations between cases and controls. However, when this occurred, RUVm, a new batch correction method noticeably improved reproducibility of differential methylation results over any of the signal-processing methods alone.ConclusionsThe comparisons in our study provide valuable insights in preprocessing HumanMethylation450 BeadChip data. We found the within-array combination of Noob + BMIQ always improved signal sensitivity, and when combined with the RUVm batch-correction method, outperformed all other approaches in performing differential DNA methylation analysis. The effect of the data processing method, in any given data set, was a function of both the signal and noise
Recommended from our members
Mutational signatures in colon cancer.
ObjectiveRecently, many tumor sequencing studies have inferred and reported on mutational signatures, short nucleotide patterns at which particular somatic base substitutions appear more often. A number of signatures reflect biological processes in the patient and factors associated with cancer risk. Our goal is to infer mutational signatures appearing in colon cancer, a cancer for which environmental risk factors vary by cancer subtype, and compare the signatures to those in adult stem cells from normal colon. We also compare the mutational signatures to others in the literature.ResultsWe apply a probabilistic mutation signature model to somatic mutations previously reported for six adult normal colon stem cells and 431 colon adenocarcinomas. We infer six mutational signatures in colon cancer, four being specific to tumors with hypermutation. Just two signatures explained the majority of mutations in the small number of normal aging colon samples. All six signatures are independently identified in a series of 295 Chinese colorectal cancers
Non-specific filtering of beta-distributed data.
BackgroundNon-specific feature selection is a dimension reduction procedure performed prior to cluster analysis of high dimensional molecular data. Not all measured features are expected to show biological variation, so only the most varying are selected for analysis. In DNA methylation studies, DNA methylation is measured as a proportion, bounded between 0 and 1, with variance a function of the mean. Filtering on standard deviation biases the selection of probes to those with mean values near 0.5. We explore the effect this has on clustering, and develop alternate filter methods that utilize a variance stabilizing transformation for Beta distributed data and do not share this bias.ResultsWe compared results for 11 different non-specific filters on eight Infinium HumanMethylation data sets, selected to span a variety of biological conditions. We found that for data sets having a small fraction of samples showing abnormal methylation of a subset of normally unmethylated CpGs, a characteristic of the CpG island methylator phenotype in cancer, a novel filter statistic that utilized a variance-stabilizing transformation for Beta distributed data outperformed the common filter of using standard deviation of the DNA methylation proportion, or its log-transformed M-value, in its ability to detect the cancer subtype in a cluster analysis. However, the standard deviation filter always performed among the best for distinguishing subgroups of normal tissue. The novel filter and standard deviation filter tended to favour features in different genome contexts; for the same data set, the novel filter always selected more features from CpG island promoters and the standard deviation filter always selected more features from non-CpG island intergenic regions. Interestingly, despite selecting largely non-overlapping sets of features, the two filters did find sample subsets that overlapped for some real data sets.ConclusionsWe found two different filter statistics that tended to prioritize features with different characteristics, each performed well for identifying clusters of cancer and non-cancer tissue, and identifying a cancer CpG island hypermethylation phenotype. Since cluster analysis is for discovery, we would suggest trying both filters on any new data sets, evaluating the overlap of features selected and clusters discovered
Identifying susceptibility genes by using joint tests of association and linkage and accounting for epistasis
Simulated Genetic Analysis Workshop14 data were analyzed by jointly testing linkage and association and by accounting for epistasis using a candidate gene approach. Our group was unblinded to the "answers." The 48 single-nucleotide polymorphisms (SNPs) within the six disease loci were analyzed in addition to five SNPs from each of two non-disease-related loci. Affected sib-parent data was extracted from the first 10 replicates for populations Aipotu, Kaarangar, and Danacaa, and analyzed separately for each replicate. We developed a likelihood for testing association and/or linkage using data from affected sib pairs and their parents. Identical-by-descent (IBD) allele sharing between sibs was explicitly modeled using a conditional logistic regression approach and incorporating a covariate that represents expected IBD allele sharing given the genotypes of the sibs and their parents. Interactions were accounted for by performing likelihood ratio tests in stages determined by the highest order interaction term in the model. In the first stage, main effects were tested independently, and in subsequent stages, multilocus effects were tested conditional on significant marginal effects. A reduction in the number of tests performed was achieved by prescreening gene combinations with a goodness-of-fit chi square statistic that depended on mating-type frequencies. SNP-specific joint effects of linkage and association were identified for loci D1, D2, D3, and D4 in multiple replicates. The strongest effect was for SNP B03T3056, which had a median p-value of 1.98 × 10(-34). No two- or three-locus effects were found in more than one replicate
Modeling measurement error in tumor characterization studies
<p>Abstract</p> <p>Background</p> <p>Etiologic studies of cancer increasingly use molecular features such as gene expression, DNA methylation and sequence mutation to subclassify the cancer type. In large population-based studies, the tumor tissues available for study are archival specimens that provide variable amounts of amplifiable DNA for molecular analysis. As molecular features measured from small amounts of tumor DNA are inherently noisy, we propose a novel approach to improve statistical efficiency when comparing groups of samples. We illustrate the phenomenon using the MethyLight technology, applying our proposed analysis to compare <it>MLH1 </it>DNA methylation levels in males and females studied in the Colon Cancer Family Registry.</p> <p>Results</p> <p>We introduce two methods for computing empirical weights to model heteroscedasticity that is caused by sampling variable quantities of DNA for molecular analysis. In a simulation study, we show that using these weights in a linear regression model is more powerful for identifying differentially methylated loci than standard regression analysis. The increase in power depends on the underlying relationship between variation in outcome measure and input DNA quantity in the study samples.</p> <p>Conclusions</p> <p>Tumor characteristics measured from small amounts of tumor DNA are inherently noisy. We propose a statistical analysis that accounts for the measurement error due to sampling variation of the molecular feature and show how it can improve the power to detect differential characteristics between patient groups.</p
Transcriptomic profiling of primary alveolar epithelial cell differentiation in human and rat
AbstractCell-type specific gene regulation is a key to gaining a full understanding of how the distinct phenotypes of differentiated cells are achieved and maintained. Here we examined how changes in transcriptional activation during alveolar epithelial cell (AEC) differentiation determine phenotype. We performed transcriptomic profiling using in vitro differentiation of human and rat primary AEC. This model recapitulates in vitro an in vivo process in which AEC transition from alveolar type 2 (AT2) cells to alveolar type 1 (AT1) cells during normal maintenance and regeneration following lung injury. Here we describe in detail the quality control, preprocessing, and normalization of microarray data presented within the associated study (Marconett et al., 2013). We also include R code for reproducibility of the referenced data and easily accessible processed data tables
Recommended from our members
Particulate Matter, DNA Methylation in Nitric Oxide Synthase, and Childhood Respiratory Disease
Background: Air pollutants have been associated with childhood asthma and wheeze. Epigenetic regulation of nitric oxide synthase—the gene responsible for nitric oxide production—may be affected by air pollutants and contribute to the pathogenesis of asthma and wheeze. Objective: Our goal was to investigate the association between air pollutants, DNA methylation, and respiratory outcomes in children. Methods: Given residential address and buccal sample collection date, we estimated 7-day, 1-month, 6-month, and 1-year cumulative average and (particulate matter ≤ 2.5 and ≤ 10 µm aerodynamic diameter, respectively) exposures for 940 participants in the Children’s Health Study. Methylation of 12 CpG sites in three NOS (nitric oxide synthase) genes was measured using a bisulfite-polymerase chain reaction Pyrosequencing assay. Beta regression models were used to estimate associations between air pollutants, percent DNA methylation, and respiratory outcomes. Results: A 5-µg/ increase in was associated with a 0.20% [95% confidence interval (CI): –0.32, –0.07] to 1.0% (95% CI: –1.61, –0.56) lower DNA methylation at NOS2A position 1, 0.06% (95% CI: –0.18, 0.06) to 0.58% (95% CI: –1.13, –0.02) lower methylation at position 2, and 0.34% (95% CI: –0.57, –0.11) to 0.89% (95% CI: –1.57, –0.21) lower methylation at position 3, depending on the length of exposure and CpG locus. One-year exposure was associated with 0.33% (95% CI: 0.01, 0.65) higher in average DNA methylation of 4 loci in the NOS2A CpG island. A 5-µg/ increase in 7-day and 1-year was associated with 0.6% (95% CI: 0.13, 0.99) and 2.8% (95% CI: 1.77, 3.75) higher NOS3 DNA methylation. No associations were observed for NOS1. showed similar but weaker associations with DNA methylation in these genes. Conclusions: exposure was associated with percent DNA methylation of several CpG loci in NOS genes, suggesting an epigenetic mechanism through which these pollutants may alter production of nitric oxide
Using DNA Methylation Patterns to Infer Tumor Ancestry
Background: Exactly how human tumors grow is uncertain because serial observations are impractical. One approach to reconstruct the histories of individual human cancers is to analyze the current genomic variation between its cells. The greater the variations, on average, the greater the time since the last clonal evolution cycle (‘‘a molecular clock hypothesis’’). Here we analyze passenger DNA methylation patterns from opposite sides of 12 primary human colorectal cancers (CRCs) to evaluate whether the variation (pairwise distances between epialleles) is consistent with a single clonal expansion after transformation. Methodology/Principal Findings: Data from 12 primary CRCs are compared to epigenomic data simulated under a single clonal expansion for a variety of possible growth scenarios. We find that for many different growth rates, a single clonal expansion can explain the population variation in 11 out of 12 CRCs. In eight CRCs, the cells from different glands are all equally distantly related, and cells sampled from the same tumor half appear no more closely related than cells sampled from opposite tumor halves. In these tumors, growth appears consistent with a single ‘‘symmetric’ ’ clonal expansion. In three CRCs, the variation in epigenetic distances was different between sides, but this asymmetry could be explained by a single clonal expansion with one region of a tumor having undergone more cell division than the other. The variation in one CRC was complex and inconsistent with a simple single clonal expansion
Development and evaluation of new mask protocols for gene expression profiling in humans and chimpanzees
Abstract
Background
Cross-species gene expression analyses using oligonucleotide microarrays designed to evaluate a single species can provide spurious results due to mismatches between the interrogated transcriptome and arrayed probes. Based on the most recent human and chimpanzee genome assemblies, we developed updated and accessible probe masking methods that allow human Affymetrix oligonucleotide microarrays to be used for robust genome-wide expression analyses in both species. In this process, only data from oligonucleotide probes predicted to have robust hybridization sensitivity and specificity for both transcriptomes are retained for analysis.
Results
To characterize the utility of this resource, we applied our mask protocols to existing expression data from brains, livers, hearts, testes, and kidneys derived from both species and determined the effects probe numbers have on expression scores of specific transcripts. In all five tissues, probe sets with decreasing numbers of probes showed non-linear trends towards increased variation in expression scores. The relationships between expression variation and probe number in brain data closely matched those observed in simulated expression data sets subjected to random probe masking. However, there is evidence that additional factors affect the observed relationships between gene expression scores and probe number in tissues such as liver and kidney. In parallel, we observed that decreasing the number of probes within probe sets lead to linear increases in both gained and lost inferences of differential cross-species expression in all five tissues, which will affect the interpretation of expression data subject to masking.
Conclusion
We introduce a readily implemented and updated resource for human and chimpanzee transcriptome analysis through a commonly used microarray platform. Based on empirical observations derived from the analysis of five distinct data sets, we provide novel guidelines for the interpretation of masked data that take the number of probes present in a given probe set into consideration. These guidelines are applicable to other customized applications that involve masking data from specific subsets of probes
- …