Search CORE

44 research outputs found

Filtering, FDR and power

Author: Boer Judith M
Menezes Renée X
van Iterson Maarten
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Background: In high-dimensional data analysis such as differential gene expression analysis, people often use filtering methods like fold-change or variance filters in an attempt to reduce the multiple testing penalty and improve power. However, filtering may introduce a bias on the multiple testing correction. The precise amount of bias depends on many quantities, such as fraction of probes filtered out, filter statistic and test statistic used.Results: We show that a biased multiple testing correction results if non-differentially expressed probes are not filtered out with equal probability from the entire range of p-values. We illustrate our results using both a simulation study and an experimental dataset, where the FDR is shown to be biased mostly by filters that are associated with the hypothesis being tested, such as the fold change. Filters that induce little bias on the FDR yield less additional power of detecting differentially expressed genes. Finally, we propose a statistical test that can be used in practice to determine whether any chosen filter introduces bias on the FDR estimate used, given a general experimental setup.Conclusions: Filtering out of probes must be used with care as it may bias the multiple testing correction. Researchers can use our test for FDR bias to guide their choice of filter and amount of filtering in practice

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Leiden University Scholary Publications

Erasmus University Digital Repository

Testing for association between RNA-Seq and high-dimensional data

Author: Jonker Marianne A.
Menezes Renée X.
Rauschenberger Armin
van de Wiel Mark A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Background: Testing for association between RNA-Seq and other genomic data is challenging due to high variability of the former and high dimensionality of the latter. Results: Using the negative binomial distribution and a random-effects model, we develop an omnibus test that overcomes both difficulties. It may be conceptualised as a test of overall significance in regression analysis, where the response variable is overdispersed and the number of explanatory variables exceeds the sample size. Conclusions: The proposed test can detect genetic and epigenetic alterations that affect gene expression. It can examine complex regulatory mechanisms of gene expression. The R package globalSeq is available from Bioconductor

Crossref

Springer - Publisher Connector

PubMed Central

Radboud Repository

Open Repository and Bibliography - Luxembourg

Sparse classification with paired covariates

Author: Ciocănea-Teodorescu Iuliana
Jonker Marianne A
Menezes Renée X
Rauschenberger Armin
van de Wiel Mark A
Publication venue: Advances in Data Analysis and Classification
Publication date: 01/01/2020
Field of study

Funder: Department of Epidemiology and Biostatistics, Amsterdam UMC, VU University AmsterdamAbstractThis paper introduces the paired lasso: a generalisation of the lasso for paired covariate settings. Our aim is to predict a single response from two high-dimensional covariate sets. We assume a one-to-one correspondence between the covariate sets, with each covariate in one set forming a pair with a covariate in the other set. Paired covariates arise, for example, when two transformations of the same data are available. It is often unknown which of the two covariate sets leads to better predictions, or whether the two covariate sets complement each other. The paired lasso addresses this problem by weighting the covariates to improve the selection from the covariate sets and the covariate pairs. It thereby combines information from both covariate sets and accounts for the paired structure. We tested the paired lasso on more than 2000 classification problems with experimental genomics data, and found that for estimating sparse but predictive models, the paired lasso outperforms the standard and the adaptive lasso. The R package is available from cran.</jats:p

Radboud Repository

Apollo (Cambridge)

Open Repository and Bibliography - Luxembourg

Can subtle changes in gene expression be consistently detected with different microarray platforms?

Author: 't Hoen Peter AC
Ariyurek Yavuz
Boer Judith M
de Hollander Mattias
de Menezes Renée X
den Dunnen Johan T
Kuiper Rowan
Pedotti Paola
Schenk Geert J
van Ommen Gertjan JB
Vossen Rolf HAM
Vreugdenhil Erno
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Background: The comparability of gene expression data generated with different microarray platforms is still a matter of concern. Here we address the performance and the overlap in the detection of differentially expressed genes for five different microarray platforms in a challenging biological context where differences in gene expression are few and subtle. Results: Gene expression profiles in the hippocampus of five wild-type and five transgenic δC-doublecortin-like kinase mice were evaluated with five microarray platforms: Applied Biosystems, Affymetrix, Agilent, Illumina, LGTC home-spotted arrays. Using a fixed false discovery rate of 10% we detected surprising differences between the number of differentially expressed genes per platform. Four genes were selected by ABI, 130 by Affymetrix, 3,051 by Agilent, 54 by Illumina, and 13 by LGTC. Two genes were found significantly differentially expressed by all platforms and the four genes identified by the ABI platform were found by at least three other platforms. Quantitative RT-PCR analysis confirmed 20 out of 28 of the genes detected by two or more platforms and 8 out of 15 of the genes detected by Agilent only. We observed improved correlations between platforms when ranking the genes based on the significance level than with a fixed statistical cut-off. We demonstrate significant overlap in the affected gene sets identified by the different platforms, although biological processes were represented by only partially overlapping sets of genes. Aberrances in GABA-ergic signalling in the transgenic mice were consistently found by all platforms. Conclusion: The different microarray platforms give partially complementary views on biological processes affected. Our data indicate that when analyzing samples with only subtle differences in gene expression the use of two different platforms might be more attractive than increasing the number of replicates. Commercial two-color platforms seem to have higher power for finding differentially expressed genes between groups with small differences in expression

Crossref

AIR Universita degli studi di Milano

Springer - Publisher Connector

PubMed Central

Erasmus University Digital Repository

Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms

Author: Beaudoing
Bentley
Brenner
Canales
Cloonan
Deuel
Dohm
Engels
Erno Vreugdenhil
Evans
Feldker
Feldker
Ge
Gert-Jan B. van Ommen
Goeman
Grigoriadis
Harbers
Helene H. Thygesen
Irizarry
Ishii
Johan T. den Dunnen
Jongeneel
Judith M. Boer
Katayama
Kim
Kochetov
Lin
Liu
Lu
Margulies
Marioni
Mortazavi
Nagalakshmi
Nielsen
Pauws
Pedotti
Perocchi
Peter A. C. 't Hoen
Renée X. de Menezes
Rolf H. A. M. Vossen
Ruijter
Shang
Shendure
Shi
Siddiqui
Smyth
Snedecor
Sultan
Sun
Thygesen
Torres
Van Ruissen
Velculescu
Vencio
Werner
Wilhelm
Yavuz Ariyurek
Yelin
Publication venue: Oxford University Press
Publication date
Field of study

The hippocampal expression profiles of wild-type mice and mice transgenic for δC-doublecortin-like kinase were compared with Solexa/Illumina deep sequencing technology and five different microarray platforms. With Illumina's digital gene expression assay, we obtained ∼2.4 million sequence tags per sample, their abundance spanning four orders of magnitude. Results were highly reproducible, even across laboratories. With a dedicated Bayesian model, we found differential expression of 3179 transcripts with an estimated false-discovery rate of 8.5%. This is a much higher figure than found for microarrays. The overlap in differentially expressed transcripts found with deep sequencing and microarrays was most significant for Affymetrix. The changes in expression observed by deep sequencing were larger than observed by microarrays or quantitative PCR. Relevant processes such as calmodulin-dependent protein kinase activity and vesicle transport along microtubules were found affected by deep sequencing but not by microarrays. While undetectable by microarrays, antisense transcription was found for 51% of all genes and alternative polyadenylation for 47%. We conclude that deep sequencing provides a major advance in robustness, comparability and richness of expression profiling data and is expected to boost collaborative, comparative and integrative genomics studies

Crossref

PubMed Central

Integrated analysis of DNA copy number and gene expression microarray data using gene sets

Author: A Adler
A Aguirre
A Alizadeh
A Jarvinen
B Masayesva
B Stranger
B Vogelstein
C Perou
D Albertson
D Carrasco
D Pinkel
D Tsafrir
E Hyman
E Segal
F Lui
G Tonon
Gert-Jan B van Ommen
H Lee
H Willenbrock
J Cardoso
J Goeman
J Goeman
J Phillips
J Pollack
Judith M Boer
K Chin
L Garraway
L van't Veer
Marten Boetzer
Melle Sieswerda
P Eilers
P Roepman
R Chari
R Development Core Team
R Mao
Renée X Menezes
S Chin
S Solinas-Toldo
T Golub
T Sorlie
V Mootha
W van Wieringen
Y Benjamini
Publication venue: BioMed Central
Publication date: 29/06/2009
Field of study

Background: Genes that play an important role in tumorigenesis are expected to show association between DNA copy number and RNA expression. Optimal power to find such associations can only be achieved if analysing copy number and gene expression jointly. Furthermore, some copy number changes extend over larger chromosomal regions affecting the expression levels of multiple resident genes.

Crossref

Springer - Publisher Connector

PubMed Central

Erasmus University Digital Repository

Testing for association between RNA-Seq and high-dimensional data

Author: A Roehle
AC Frazee
AM Hulse
Armin Rauschenberger
DJ McCarthy
G Verbeke
JJ Goeman
JJ Goeman
JK Pickrell
JN Weinstein
M Rebhan
M Sanchez-Carbayo
M Smid
Marianne A. Jonker
Mark A. van de Wiel
MD Robinson
P McCullagh
P Senchaudhuri
Renée X. Menezes
RX Menezes
S Anders
S le Cessie
SB Montgomery
T Lappalainen
The International HapMap Consortium
WN van Wieringen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Mechanisms that clear mutations drive field cancerization in mammary tissue

Oncogenic mutations are abundant in the tissues of healthy individuals, but rarely form tumours1–3. Yet, the underlying protection mechanisms are largely unknown. To resolve these mechanisms in mouse mammary tissue, we use lineage tracing to map the fate of wild-type and Brca1−/−;Trp53−/− cells, and find that both follow a similar pattern of loss and spread within ducts. Clonal analysis reveals that ducts consist of small repetitive units of self-renewing cells that give rise to short-lived descendants. This offers a first layer of protection as any descendants, including oncogenic mutant cells, are constantly lost, thereby limiting the spread of mutations to a single stem cell-descendant unit. Local tissue remodelling during consecutive oestrous cycles leads to the cooperative and stochastic loss and replacement of self-renewing cells. This process provides a second layer of protection, leading to the elimination of most mutant clones while enabling the minority that by chance survive to expand beyond the stem cell-descendant unit. This leads to fields of mutant cells spanning large parts of the epithelial network, predisposing it for transformation. Eventually, clone expansion becomes restrained by the geometry of the ducts, providing a third layer of protection. Together, these mechanisms act to eliminate most cells that acquire somatic mutations at the expense of driving the accelerated expansion of a minority of cells, which can colonize large areas, leading to field cancerization

Utrecht University Repository

Quasi-variances

Author: De Menezes Renée X.
Firth David
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/03/2004
Field of study

In statistical models of dependence, the effect of a categorical variable is typically described by contrasts among parameters. For reporting such effects, quasi‐variances provide an economical and intuitive method which permits approximate inference on any contrast by subsequent readers. Applications include generalised linear models, generalised additive models and hazard models. The present paper exposes the generality of quasi‐variances, emphasises the need to control relative errors of approximation, gives simple methods for obtaining quasi‐variances and bounds on the approximation error involved, and explores the domain of accuracy of the method. Conditions are identified under which the quasi‐variance approximation is exact, and numerical work indicates high accuracy in a variety of settings

Warwick Research Archives Portal Repository