Skip to main content
Article thumbnail
Location of Repository

Use of pre-transformation to cope with outlying values in important candidate genes

By Anne-Laure Boulesteix, Guillemot Vincent and Sauerbrei Willi

Abstract

Outlying values in predictors often strongly affect the results of statistical analyses in high-dimensional settings. Although they frequently occur with most high-throughput techniques, the problem is often ignored in the literature. We suggest to use a very simple transformation, proposed before in a different context by Royston and Sauerbrei, as an intermediary step between array normalization and high-level statistical analysis. This straightforward univariate transformation identifies extreme values and reduces the influence of outlying values considerably in all further steps of statistical analysis without eliminating the incriminated observation or feature. The use of the transformation and its effects are demonstrated for diverse univariate and multivariate statistical analyses using nine publicly available microarray data sets.

Topics: Technische Reports, ddc:510
Year: 2010
DOI identifier: 10.1002/bimj.201000189
OAI identifier: oai:epub.ub.uni-muenchen.de:11501
Provided by: Open Access LMU

Suggested articles

Citations

  1. (1992). A bootstrap resampling procedure for model building: Application to the cox regression model.
  2. (2007). Accurate Ranking of Differentially Expressed Genes by a Distribution-Free Shrinkage Approach.
  3. (2008). Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples.
  4. (2009). affy. BioconductorR package version 1.24.2: http://www.bioconductor.org/
  5. (2005). An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival.
  6. (2007). Boosting algorithms: regularization, prediction and model fitting (with discussion).
  7. (2007). Cancer outlier differential gene expression detection.
  8. (2009). Detecting outlier samples in microarray data.
  9. (2002). Gene expression correlates of clinical prostate cancer behavior.
  10. (2006). Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis.
  11. (2005). Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts.
  12. (2005). Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer.
  13. (2009). Gene-expression profiling of peripheral blood mononuclear cells in sepsis.
  14. (2005). Genes that mediate breast cancer metastasis to lung.
  15. (2010). High-dimensional cox models: the choice of penalty as part of the model building process.
  16. (2007). Improving the robustness of fractional polynomial models by preliminary covariate transformation: A pragmatic approach.
  17. (2010). Increasing stability and interpretability of gene expression signatures.
  18. (2010). L1 penalized estimation in the cox proportional hazards model.
  19. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments.
  20. (2006). Model-based boosting in high dimensions.
  21. (2008). Molecular markers of early parkinsons disease based on gene expression in blood.
  22. On observations relating to several quantities.
  23. (2006). On the statistical assessment of classifiers using DNA microarray data.
  24. (2007). Outlier sums for differential gene expression analysis.
  25. (2005). Recurrent fusion of tmprss2 and ets transcription factor genes in prostate cancer.
  26. (2003). Robust regression and outlier detection.
  27. (2009). Stability and aggregation of ranked gene lists.
  28. (2008). Stability of gene contributions and identification of outliers in multivariate analysis of microarray data.
  29. (1986). Statistical methods for assessing agreement between two methods of clinical measurement.
  30. (2010). Testing the additional predictive value of high-dimensional molecular data.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.