836 research outputs found
Removing batch effects for prediction problems with frozen surrogate variable analysis
Batch effects are responsible for the failure of promising genomic prognos-
tic signatures, major ambiguities in published genomic results, and retractions
of widely-publicized findings. Batch effect corrections have been developed to
re- move these artifacts, but they are designed to be used in population
studies. But genomic technologies are beginning to be used in clinical
applications where sam- ples are analyzed one at a time for diagnostic,
prognostic, and predictive applica- tions. There are currently no batch
correction methods that have been developed specifically for prediction. In
this paper, we propose an new method called frozen surrogate variable analysis
(fSVA) that borrows strength from a training set for individual sample batch
correction. We show that fSVA improves prediction ac- curacy in simulations and
in public genomic studies. fSVA is available as part of the sva Bioconductor
package
An evaluation of processing methods for HumanMethylation450 BeadChip data
BackgroundIllumina's HumanMethylation450 arrays provide the most cost-effective means of high-throughput DNA methylation analysis. As with other types of microarray platforms, technical artifacts are a concern, including background fluorescence, dye-bias from the use of two color channels, bias caused by type I/II probe design, and batch effects. Several approaches and pipelines have been developed, either targeting a single issue or designed to address multiple biases through a combination of methods. We evaluate the effect of combining separate approaches to improve signal processing.ResultsIn this study nine processing methods, including both within- and between- array methods, are applied and compared in four datasets. For technical replicates, we found both within- and between-array methods did a comparable job in reducing variance across replicates. For evaluating biological differences, within-array processing always improved differential DNA methylation signal detection over no processing, and always benefitted from performing background correction first. Combinations of within-array procedures were always among the best performing methods, with a slight advantage appearing for the between-array method Funnorm when batch effects explained more variation in the data than the methylation alterations between cases and controls. However, when this occurred, RUVm, a new batch correction method noticeably improved reproducibility of differential methylation results over any of the signal-processing methods alone.ConclusionsThe comparisons in our study provide valuable insights in preprocessing HumanMethylation450 BeadChip data. We found the within-array combination of Noob + BMIQ always improved signal sensitivity, and when combined with the RUVm batch-correction method, outperformed all other approaches in performing differential DNA methylation analysis. The effect of the data processing method, in any given data set, was a function of both the signal and noise
Cell-type deconvolution in epigenome-wide association studies: a review and recommendations
A major challenge faced by epigenome-wide association studies (EWAS) is cell-type heterogeneity. As many EWAS have already demonstrated, adjusting for changes in cell-type composition can be critical when analyzing and interpreting findings from such studies. Because of their importance, a great number of different statistical algorithms, which adjust for cell-type composition, have been proposed. Some of the methods are ‘reference based’ in that they require a priori defined reference DNA methylation profiles of cell types that are present in the tissue of interest, while other algorithms are ‘reference free.’ At present, however, it is unclear how best to adjust for cell-type heterogeneity, as this may also largely depend on the type of tissue and phenotype being considered. Here, we provide a critical review of the major existing algorithms for correcting cell-type composition in the context of Illumina Infinium Methylation Beadarrays, with the aim of providing useful recommendations to the EWAS community
Comparison of pre-processing methodologies for Illumina 450k methylation array data in familial analyses
Neuroconductor: an R platform for medical imaging analysis
Neuroconductor (https://neuroconductor.org) is an open-source platform for rapid testing and dissemination of reproducible computational imaging software. The goals of the project are to: (i) provide a centralized repository of R software dedicated to image analysis, (ii) disseminate software updates quickly, (iii) train a large, diverse community of scientists using detailed tutorials and short courses, (iv) increase software quality via automatic and manual quality controls, and (v) promote reproducibility of image data analysis. Based on the programming language R (https://www.r-project.org/), Neuroconductor starts with 51 inter-operable packages that cover multiple areas of imaging including visualization, data processing and storage, and statistical inference. Neuroconductor accepts new R package submissions, which are subject to a formal review and continuous automated testing. We provide a description of the purpose of Neuroconductor and the user and developer experience
Genetic Influences on Brain Gene Expression in Rats Selected for Tameness and Aggression
Inter-individual differences in many behaviors are partly due to genetic
differences, but the identification of the genes and variants that influence
behavior remains challenging. Here, we studied an F2 intercross of two outbred
lines of rats selected for tame and aggressive behavior towards humans for more
than 64 generations. By using a mapping approach that is able to identify
genetic loci segregating within the lines, we identified four times more loci
influencing tameness and aggression than by an approach that assumes fixation
of causative alleles, suggesting that many causative loci were not driven to
fixation by the selection. We used RNA sequencing in 150 F2 animals to identify
hundreds of loci that influence brain gene expression. Several of these loci
colocalize with tameness loci and may reflect the same genetic variants.
Through analyses of correlations between allele effects on behavior and gene
expression, differential expression between the tame and aggressive rat
selection lines, and correlations between gene expression and tameness in F2
animals, we identify the genes Gltscr2, Lgi4, Zfp40 and Slc17a7 as candidate
contributors to the strikingly different behavior of the tame and aggressive
animals
- …