Search CORE

Differential expression analysis with global network adjustment

Author: A Antonellis
A Zellner
AE Hoerl
AI Su
D Bates
DB Dahl
E Choy
EJ Cosgrove
H Zou
J Friedman
J Ruan
J Schoumans
J Wettenhall
Jannine D Cody
Jonathan A Gelfond
Joseph G Ibrahim
JT Leek
M Gustafsson
M Newton
Mayetri Gupta
Ming-Hui Chen
R Development Core Team
R Tibshirani
RJ Prill
S Pounds
SC Smith
SM Siepka
T Barrett
T Barrett
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene’s expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments. Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient “over-shrinkage” method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods. Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.</p&gt

Carolina Digital Repository

Enlighten

Criteria for the use of omics-based predictors in clinical trials.

Author: A Dupuy
AM Molinaro
B Freidlin
Barbara A. Conley
David A. Eberhard
Deborah J. Shuman
HM Moore
J Subramanian
James H. Doroshow
James V. Tricoli
Jeremy M. G. Taylor
Jill P. Mesirov
JT Leek
Kelly Y. Kim
KK Dobbin
L Shi
Lisa M. McShane
LM McShane
Margaret M. Cavenagh
Mei-Yin C. Polley
P. Mickey Williams
R Simon
Richard M. Simon
RM Simon
Tracy G. Lively
William L. Bigbee
Publication venue: eScholarship, University of California
Publication date: 01/10/2013
Field of study

The US National Cancer Institute (NCI), in collaboration with scientists representing multiple areas of expertise relevant to 'omics'-based test development, has developed a checklist of criteria that can be used to determine the readiness of omics-based tests for guiding patient care in clinical trials. The checklist criteria cover issues relating to specimens, assays, mathematical modelling, clinical trial design, and ethical, legal and regulatory aspects. Funding bodies and journals are encouraged to consider the checklist, which they may find useful for assessing study quality and evidence strength. The checklist will be used to evaluate proposals for NCI-sponsored clinical trials in which omics tests will be used to guide therapy

eScholarship - University of California

Complex trait subtypes identification using transcriptome profiling reveals an interaction between two QTL affecting adiposity in chicken

Author: A Ghazalpour
CD Friguet C
Colette Désert
David Causeur
EE Schadt
EE Schadt
ES Lander
FC Causeur D
G Le Mignon
Guillaume Le Mignon
JM Elsen
JT Leek
JT Leek
M Kirst
MA Groenen
MB Elsen JM
MC Filangi O
ML Wayne
ML Wayne
N Hubner
Olivier Demeure
Olivier Filangi
P Le Roy
Pascale Le Roy
R DeCook
R Kustra
S Ponsuksili
Sandrine Lagarrigue
VK Mootha
Y Blum
YHY Benjamini
Yuna Blum
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Integrative genomics approaches that combine genotyping and transcriptome profiling in segregating populations have been developed to dissect complex traits. The most common approach is to identify genes whose eQTL colocalize with QTL of interest, providing new functional hypothesis about the causative mutation. Another approach includes defining subtypes for a complex trait using transcriptome profiles and then performing QTL mapping using some of these subtypes. This approach can refine some QTL and reveal new ones. In this paper we introduce Factor Analysis for Multiple Testing (FAMT) to define subtypes more accurately and reveal interaction between QTL affecting the same trait. The data used concern hepatic transcriptome profiles for 45 half sib male chicken of a sire known to be heterozygous for a QTL affecting abdominal fatness (AF) on chromosome 5 distal region around 168 cM. Results Using this methodology which accounts for hidden dependence structure among phenotypes, we identified 688 genes that are significantly correlated to the AF trait and we distinguished 5 subtypes for AF trait, which are not observed with gene lists obtained by classical approaches. After exclusion of one of the two lean bird subtypes, linkage analysis revealed a previously undetected QTL on chromosome 5 around 100 cM. Interestingly, the animals of this subtype presented the same q paternal haplotype at the 168 cM QTL. This result strongly suggests that the two QTL are in interaction. In other words, the "q configuration" at the 168 cM QTL could hide the QTL existence in the proximal region at 100 cM. We further show that the proximal QTL interacts with the previous one detected on the chromosome 5 distal region. Conclusion Our results demonstrate that stratifying genetic population by molecular phenotypes followed by QTL analysis on various subtypes can lead to identification of novel and interacting QTL.</p

Edinburgh Research Explorer

ProdInra

HAL-Rennes 1

Batch effect correction for genome-wide methylation data with Illumina Infinium platform

Author: A Etcheverry
AE Teschendorff
AH Sims
BA Walker
BH Mecham
BM Bolstad
C Chen
CG Bell
Christopher J Klein
E Eisenberg
High Seng Chai
HM Byun
J Liu
J Luo
J Staaf
Jean-Pierre A Kocher
JT Bell
JT Leek
JT Leek
JY Park
K Kerkel
Krishna V Donkena
M Benito
M Bibikova
M Ko
N Vasiljevic
O Alter
P Du
PW Laird
PW Laird
R Chari
S Sun
Terry M Therneau
Vesna D Garovic
VK Rakyan
WE Johnson
Wendy M White
X Wang
Y Kobayashi
Yanhong Wu
Zhifu Sun
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Genome-wide methylation profiling has led to more comprehensive insights into gene regulation mechanisms and potential therapeutic targets. Illumina Human Methylation BeadChip is one of the most commonly used genome-wide methylation platforms. Similar to other microarray experiments, methylation data is susceptible to various technical artifacts, particularly batch effects. To date, little attention has been given to issues related to normalization and batch effect correction for this kind of data. Methods We evaluated three common normalization approaches and investigated their performance in batch effect removal using three datasets with different degrees of batch effects generated from HumanMethylation27 platform: quantile normalization at average β value (QNβ); two step quantile normalization at probe signals implemented in "lumi" package of R (lumi); and quantile normalization of A and B signal separately (ABnorm). Subsequent Empirical Bayes (EB) batch adjustment was also evaluated. Results Each normalization could remove a portion of batch effects and their effectiveness differed depending on the severity of batch effects in a dataset. For the dataset with minor batch effects (Dataset 1), normalization alone appeared adequate and "lumi" showed the best performance. However, all methods left substantial batch effects intact in the datasets with obvious batch effects and further correction was necessary. Without any correction, 50 and 66 percent of CpGs were associated with batch effects in Dataset 2 and 3, respectively. After QNβ, lumi or ABnorm, the number of CpGs associated with batch effects were reduced to 24, 32, and 26 percent for Dataset 2; and 37, 46, and 35 percent for Dataset 3, respectively. Additional EB correction effectively removed such remaining non-biological effects. More importantly, the two-step procedure almost tripled the numbers of CpGs associated with the outcome of interest for the two datasets. Conclusion Genome-wide methylation data from Infinium Methylation BeadChip can be susceptible to batch effects with profound impacts on downstream analyses and conclusions. Normalization can reduce part but not all batch effects. EB correction along with normalization is recommended for effective batch effect removal.</p

Batch effect confounding leads to strong bias in performance estimates obtained by cross-validation.

Author: A Brazma
A Liaw
A Sadanandam
AH Sims
AL Boulesteix
AM Molinaro
C Ambroise
C Chen
C Cortes
C Lazar
Charlotte Soneson
E Budinska
E Van Cutsem
H Zou
HS Parker
IA Wood
J Gagnon-Bartsch
J Luo
JH Kim
JM Akey
JT Leek
JT Leek
L Breiman
L Shi
M Benito
M Lukk
Mauro Delorenzi
MD Radmacher
MK Kerr
O Alter
PO Brown
R Edgar
R Simon
R Tibshirani
S Varma
Sarah Gerster
Shu-Dong Zhang
W Johnson
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

BACKGROUND: With the large amount of biological data that is currently publicly available, many investigators combine multiple data sets to increase the sample size and potentially also the power of their analyses. However, technical differences ("batch effects") as well as differences in sample composition between the data sets may significantly affect the ability to draw generalizable conclusions from such studies. FOCUS: The current study focuses on the construction of classifiers, and the use of cross-validation to estimate their performance. In particular, we investigate the impact of batch effects and differences in sample composition between batches on the accuracy of the classification performance estimate obtained via cross-validation. The focus on estimation bias is a main difference compared to previous studies, which have mostly focused on the predictive performance and how it relates to the presence of batch effects. DATA: We work on simulated data sets. To have realistic intensity distributions, we use real gene expression data as the basis for our simulation. Random samples from this expression matrix are selected and assigned to group 1 (e.g., 'control') or group 2 (e.g., 'treated'). We introduce batch effects and select some features to be differentially expressed between the two groups. We consider several scenarios for our study, most importantly different levels of confounding between groups and batch effects. METHODS: We focus on well-known classifiers: logistic regression, Support Vector Machines (SVM), k-nearest neighbors (kNN) and Random Forests (RF). Feature selection is performed with the Wilcoxon test or the lasso. Parameter tuning and feature selection, as well as the estimation of the prediction performance of each classifier, is performed within a nested cross-validation scheme. The estimated classification performance is then compared to what is obtained when applying the classifier to independent data

Serveur académique lausannois

Public Library of Science (PLOS)

FigShare

Scalable Transcriptome Preparation for Massive Parallel Sequencing

Author: A Mortazavi
Beata Werne
CW Fuller
D Klevebring
D Ramskold
DR Bentley
E Farias-Hesson
E Pettersson
Ellen Sherwood
Henrik Stranneheim
J Shendure
Joakim Lundeberg
JT Leek
ML Metzker
NJ Lennon
PL Auer
S Lundin
Vladimir Brusic
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

Background: The tremendous output of massive parallel sequencing technologies requires automated robust and scalable sample preparation methods to fully exploit the new sequence capacity. Methodology: In this study, a method for automated library preparation of RNA prior to massively parallel sequencing is presented. The automated protocol uses precipitation onto carboxylic acid paramagnetic beads for purification and size selection of both RNA and DNA. The automated sample preparation was compared to the standard manual sample preparation. Conclusion/Significance: The automated procedure was used to generate libraries for gene expression profiling on the Illumina HiSeq 2000 platform with the capacity of 12 samples per preparation with a significantly improved throughput compared to the standard manual preparation. The data analysis shows consistent gene expression profiles in terms of sensitivity and quantification of gene expression between the two library preparation methods

CiteSeerX

Differences in smoking associated DNA methylation patterns in South Asians and Europeans

Author: BR Joubert
C Ge
EA Houseman
EJ Jensen
ES Wan
H Chavan
HB Fraser
HR Elliott
JR Gibbs
JT Bell
JT Leek
K Spencer
L Breiman
LP Breitling
LP Breitling
N Touleimat
NS Shenker
NS Shenker
P Du
P Du
PM Dietz
RA Philibert
RT Barfield
S Davis
S Zeilinger
SS Lim
T Tillin
TH Jafar
WE Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

This is a freely-available open access publication. Please cite the published version which is available via the DOI link in this record.Background DNA methylation is strongly associated with smoking status at multiple sites across the genome. Studies have largely been restricted to European origin individuals yet the greatest increase in smoking is occurring in low income countries, such as the Indian subcontinent. We determined whether there are differences between South Asians and Europeans in smoking related loci, and if a smoking score, combining all smoking related DNA methylation scores, could differentiate smokers from non-smokers. Results Illumina HM450k BeadChip arrays were performed on 192 samples from the Southall And Brent REvisited (SABRE) cohort. Differential methylation in smokers was identified in 29 individual CpG sites at 18 unique loci. Interaction between smoking status and ethnic group was identified at the AHRR locus. Ethnic differences in DNA methylation were identified in non-smokers at two further loci, 6p21.33 and GNG12. With the exception of GFI1 and MYO1G these differences were largely unaffected by adjustment for cell composition. A smoking score based on methylation profile was constructed. Current smokers were identified with 100% sensitivity and 97% specificity in Europeans and with 80% sensitivity and 95% specificity in South Asians. Conclusions Differences in ethnic groups were identified in both single CpG sites and combined smoking score. The smoking score is a valuable tool for identification of true current smoking behaviour. Explanations for ethnic differences in DNA methylation in association with smoking may provide valuable clues to disease pathways.Wellcome Trust Enhancement grantMedical Research CouncilDiabetes UKthe British Heart Foundatio

UCL Discovery

UDORA - University of Derby Online Research Archive

Open Research Exeter

Explore Bristol Research

Increasing consistency of disease biomarker prediction across datasets

Author: A Achiron
A Kuhn
AE Teschendorff
AR Abbas
B Zheng
C Riveros
Christos Hatzis
D Arasappan
DD Kang
E Kotelnikova
F Gilli
F Zhang
GC Tseng
H Choi
HJ Eysenck
HJ Eysenck
I Borozan
I Kupershmidt
J ichi Satoh
JA Gagnon-Bartsch
JK Choi
JR Stevens
JT Leek
JT Leek
KS Gandhi
L Ein-Dor
M Gurevich
M Hecker
M Hecker
M Kapushesky
M Zhang
Maria D. Chikina
MM Goldenberg
R Bomprezzi
R Breitling
RA Irizarry
RC Axtell
S Chakraborty
S Lu
SE Bushnell
Stuart C. Sealfon
T Manoli
V Annibali
WI McDonald
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/04/2014
Field of study

Microarray studies with human subjects often have limited sample sizes which hampers the ability to detect reliable biomarkers associated with disease and motivates the need to aggregate data across studies. However, human gene expression measurements may be influenced by many non-random factors such as genetics, sample preparations, and tissue heterogeneity. These factors can contribute to a lack of agreement among related studies, limiting the utility of their aggregation. We show that it is feasible to carry out an automatic correction of individual datasets to reduce the effect of such 'latent variables' (without prior knowledge of the variables) in such a way that datasets addressing the same condition show better agreement once each is corrected. We build our approach on the method of surrogate variable analysis but we demonstrate that the original algorithm is unsuitable for the analysis of human tissue samples that are mixtures of different cell types. We propose a modification to SVA that is crucial to obtaining the improvement in agreement that we observe. We develop our method on a compendium of multiple sclerosis data and verify it on an independent compendium of Parkinson's disease datasets. In both cases, we show that our method is able to improve agreement across varying study designs, platforms, and tissues. This approach has the potential for wide applicability to any field where lack of inter-study agreement has been a concern. © 2014 Chikina, Sealfon

Public Library of Science (PLOS)

D-Scholarship@Pitt

FigShare

Altered DNA methylation associated with a translocation linked to major mental illness

Author: A Corvin
A Rampino
AE Jaffe
AK Smith
B Jin
C Dalman
C Montano
CJ Zepeda-Mendoza
CL Hyde
DF Callen
DH Blackwood
E Berra
E Eden
E Hannon
E Walton
EA Houseman
EL Dempster
JK Millar
JL Rapoport
JT Leek
JT Leek
JW Smoller
LM Butcher
M Byrne
M Lemire
MA Carless
MJ Aryee
N Cai
N Teroganova
NJ Brandon
OM Doyle
P Sklar
PA Thomson
PA Thomson
PA Thomson
R Pidsley
R Schmidt-Kastner
S Falcon
S Ripke
TE Graber
TJ Morris
TW Mühleisen
WA Bickmore
X Zhou
Y Benjamini
YA Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2018
Field of study

Recent work has highlighted a possible role for altered epigenetic modifications, including differential DNA methylation, in susceptibility to psychiatric illness. Here, we investigate blood-based DNA methylation in a large family where a balanced translocation between chromosomes 1 and 11 shows genome-wide significant linkage to psychiatric illness. Genome-wide DNA methylation was profiled in whole-blood-derived DNA from 41 individuals using the Infinium HumanMethylation450 BeadChip (Illumina Inc., San Diego, CA). We found significant differences in DNA methylation when translocation carriers (n = 17) were compared to related non-carriers (n = 24) at 13 loci. All but one of the 13 significant differentially methylated positions (DMPs) mapped to the regions surrounding the translocation breakpoints. Methylation levels of five DMPs were associated with genotype at SNPs in linkage disequilibrium with the translocation. Two of the five genes harbouring significant DMPs, DISC1 and DUSP10, have been previously shown to be differentially methylated in schizophrenia. Gene Ontology analysis revealed enrichment for terms relating to neuronal function and neurodevelopment among the genes harbouring the most significant DMPs. Differentially methylated region (DMR) analysis highlighted a number of genes from the MHC region, which has been implicated in psychiatric illness previously through genetic studies. We show that inheritance of a translocation linked to major mental illness is associated with differential DNA methylation at loci implicated in neuronal development/function and in psychiatric illness. As genomic rearrangements are over-represented in individuals with psychiatric illness, such analyses may be valuable more widely in the study of these conditions

Cold Spring Harbor Laboratory Institutional Repository