Search CORE

31,132 research outputs found

Simple and Effective Visual Models for Gene Expression Cancer Diagnostics

Author: Bratko Ivan
Leban Gregor
Mramor Minca
Zupan Blaz
Publication venue
Publication date: 01/01/2005
Field of study

In the paper we show that diagnostic classes in cancer gene expression data sets, which most often include thousands of features (genes), may be effectively separated with simple two-dimensional plots such as scatterplot and radviz graph. The principal innovation proposed in the paper is a method called VizRank, which is able to score and identify the best among possibly millions of candidate projections for visualizations. Compared to recently much applied techniques in the field of cancer genomics that include neural networks, support vector machines and various ensemble-based approaches, VizRank is fast and finds visualization models that can be easily examined and interpreted by domain experts. Our experiments on a number of gene expression data sets show that VizRank was always able to find data visualizations with a small number of (two to seven) genes and excellent class separation. In addition to providing grounds for gene expression cancer diagnosis, VizRank and its visualizations also identify small sets of relevant genes, uncover interesting gene interactions and point to outliers and potential misclassifications in cancer data sets

ePrints.FRI

Optimal classifier selection and negative bias in error rate estimation: An empirical study on high-dimensional prediction

Author: Boulesteix Anne-Laure
Strobl Carolin
Publication venue
Publication date: 01/01/2009
Field of study

In biometric practice, researchers often apply a large number of different methods in a "trial-and-error" strategy to get as much as possible out of their data and, due to publication pressure or pressure from the consulting customer, present only the most favorable results. This strategy may induce a substantial optimistic bias in prediction error estimation, which is quantitatively assessed in the present manuscript. The focus of our work is on class prediction based on high-dimensional data (e.g. microarray data), since such analyses are particularly exposed to this kind of bias. In our study we consider a total of 124 variants of classifiers (possibly including variable selection or tuning steps) within a cross-validation evaluation scheme. The classifiers are applied to original and modified real microarray data sets, some of which are obtained by randomly permuting the class labels to mimic non-informative predictors while preserving their correlation structure. We then assess the minimal misclassification rate over the different variants of classifiers in order to quantify the bias arising when the optimal classifier is selected a posteriori in a data-driven manner. The bias resulting from the parameter tuning (including gene selection parameters as a special case) and the bias resulting from the choice of the classification method are examined both separately and jointly. We conclude that the strategy to present only the optimal result is not acceptable, and suggest alternative approaches for properly reporting classification accuracy

Springer - Publisher Connector

Directory of Open Access Journals

Open Access LMU

PubMed Central

Increasing stability and interpretability of gene expression signatures

Author: Haury Anne-Claire
Jacob Laurent
Vert Jean-Philippe
Publication venue
Publication date: 18/01/2010
Field of study

Motivation : Molecular signatures for diagnosis or prognosis estimated from large-scale gene expression data often lack robustness and stability, rendering their biological interpretation challenging. Increasing the signature's interpretability and stability across perturbations of a given dataset and, if possible, across datasets, is urgently needed to ease the discovery of important biological processes and, eventually, new drug targets. Results : We propose a new method to construct signatures with increased stability and easier interpretability. The method uses a gene network as side interpretation and enforces a large connectivity among the genes in the signature, leading to signatures typically made of genes clustered in a few subnetworks. It combines the recently proposed graph Lasso procedure with a stability selection procedure. We evaluate its relevance for the estimation of a prognostic signature in breast cancer, and highlight in particular the increase in interpretability and stability of the signature

arXiv.org e-Print Archive

HAL-MINES ParisTech

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California

PLS dimension reduction for classification of microarray data

Author: Boulesteix Anne-Laure
Publication venue
Publication date: 01/01/2004
Field of study

PLS dimension reduction is known to give good prediction accuracy in the context of classification with high-dimensional microarray data. In this paper, PLS is compared with some of the best state-of-the-art classification methods. In addition, a simple procedure to choose the number of components is suggested. The connection between PLS dimension reduction and gene selection is examined and a property of the first PLS component for binary classification is proven. PLS can also be used as a visualization tool for high-dimensional data in the classification framework. The whole study is based on 9 real microarray cancer data sets

CiteSeerX

Open Access LMU

PUEPro : A Computational Pipeline for Prediction of Urine Excretory Proteins

Author: Chen Xin
Du Wei
Liang Yanchun
Pang Wei
Wang Yan
Xu Ying
Zhang Chi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This work is supported by the National Natural Science Foundation of China (Grant Nos. 81320108025, 61402194, 61572227), Development Project of Jilin Province of China (20140101180JC) and China Postdoctoral Science Foundation (2014T70291).Postprin

Aberdeen University Research

Heriot Watt Pure

Development of a multivariable risk model integrating urinary cell DNA methylation and cell-free RNA data for the detection of significant prostate cancer

Author: Bapat Bharati
Brewer Daniel
Clark Jeremy
Connell Shea P
Cooper Colin
Hurst Rachel
Mills Robert
O'Reilly Eve
Perry Antoinette
Tuzova Alexandra
Webb Martyn
Zhao Fang
Publication venue: 'Wiley'
Publication date: 15/05/2020
Field of study

Background: Prostate cancer exhibits severe clinical heterogeneity and there is a critical need for clinically implementable tools able to precisely and noninvasively identify patients that can either be safely removed from treatment pathways or those requiring further follow up. Our objectives were to develop a multivariable risk prediction model through the integration of clinical, urine-derived cell-free messenger RNA (cf-RNA) and urine cell DNA methylation data capable of noninvasively detecting significant prostate cancer in biopsy naïve patients. Methods: Post-digital rectal examination urine samples previously analyzed separately for both cellular methylation and cf-RNA expression within the Movember GAP1 urine biomarker cohort were selected for a fully integrated analysis (n = 207). A robust feature selection framework, based on bootstrap resampling and permutation, was utilized to find the optimal combination of clinical and urinary markers in a random forest model, deemed ExoMeth. Out-of-bag predictions from ExoMeth were used for diagnostic evaluation in men with a clinical suspicion of prostate cancer (PSA ≥ 4 ng/mL, adverse digital rectal examination, age, or lower urinary tract symptoms). Results: As ExoMeth risk score (range, 0-1) increased, the likelihood of high-grade disease being detected on biopsy was significantly greater (odds ratio = 2.04 per 0.1 ExoMeth increase, 95% confidence interval [CI]: 1.78-2.35). On an initial TRUS biopsy, ExoMeth accurately predicted the presence of Gleason score ≥3 + 4, area under the receiver-operator characteristic curve (AUC) = 0.89 (95% CI: 0.84-0.93) and was additionally capable of detecting any cancer on biopsy, AUC = 0.91 (95% CI: 0.87-0.95). Application of ExoMeth provided a net benefit over current standards of care and has the potential to reduce unnecessary biopsies by 66% when a risk threshold of 0.25 is accepted. Conclusion: Integration of urinary biomarkers across multiple assay methods has greater diagnostic ability than either method in isolation, providing superior predictive ability of biopsy outcomes. ExoMeth represents a more holistic view of urinary biomarkers and has the potential to result in substantial changes to how patients suspected of harboring prostate cancer are diagnosed

Crossref

University of East Anglia digital repository