Search CORE

37,360 research outputs found

Boosting the concordance index for survival data - a unified framework to derive and evaluate biomarker combinations

Author: Mayr Andreas
Schmid Matthias
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 25/10/2013
Field of study

The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, estimation and evaluation steps. This might result in marker combinations that are only suboptimal regarding the evaluation criterion of interest. To address this issue, we propose a unified framework to derive and evaluate biomarker combinations. Our approach is based on the concordance index for time-to-event data, which is a non-parametric measure to quantify the discrimatory power of a prediction rule. Specifically, we propose a component-wise boosting algorithm that results in linear biomarker combinations that are optimal with respect to a smoothed version of the concordance index. We investigate the performance of our algorithm in a large-scale simulation study and in two molecular data sets for the prediction of survival in breast cancer patients. Our numerical results show that the new approach is not only methodologically sound but can also lead to a higher discriminatory power than traditional approaches for the derivation of gene signatures.Comment: revised manuscript - added simulation study, additional result

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

Open Access LMU

PubMed Central

FigShare

Estimation and Regularization Techniques for Regression Models with Multidimensional Prediction Functions

Author: Hothorn Torsten
Pfahlberg Annette
Potapov Sergej
Schmid Matthias
Publication venue
Publication date: 24/11/2008
Field of study

Boosting is one of the most important methods for fitting regression models and building prediction rules from high-dimensional data. A notable feature of boosting is that the technique has a built-in mechanism for shrinking coefficient estimates and variable selection. This regularization mechanism makes boosting a suitable method for analyzing data characterized by small sample sizes and large numbers of predictors. We extend the existing methodology by developing a boosting method for prediction functions with multiple components. Such multidimensional functions occur in many types of statistical models, for example in count data models and in models involving outcome variables with a mixture distribution. As will be demonstrated, the new algorithm is suitable for both the estimation of the prediction function and regularization of the estimates. In addition, nuisance parameters can be estimated simultaneously with the prediction function

Open Access LMU

Study of microRNAs-21/221 as potential breast cancer biomarkers in Egyptian women

Author: El Masry Maha Rafik
Mohareb Fady R.
Motawi Tarek Mohamed Kamal
Sadik Nermin Abdel Hamid
Shaker Olfat Gamil
Publication venue: 'Elsevier BV'
Publication date: 29/01/2016
Field of study

microRNAs (miRNAs) play an important role in cancer prognosis. They are small molecules, approximately 17–25 nucleotides in length, and their high stability in human serum supports their use as novel diagnostic biomarkers of cancer and other pathological conditions. In this study, we analyzed the expression patterns of miR-21 and miR-221 in the serum from a total of 100 Egyptian female subjects with breast cancer, fibroadenoma, and healthy control subjects. Using microarray-based expression profiling followed by real-time polymerase chain reaction validation, we compared the levels of the two circulating miRNAs in the serum of patients with breast cancer (n = 50), fibroadenoma (n = 25), and healthy controls (n = 25). The miRNA SNORD68 was chosen as the housekeeping endogenous control. We found that the serum levels of miR-21 and miR-221 were significantly overexpressed in breast cancer patients compared to normal controls and fibroadenoma patients. Receiver Operating Characteristic (ROC) curve analysis revealed that miR-21 has greater potential in discriminating between breast cancer patients and the control group, while miR-221 has greater potential in discriminating between breast cancer and fibroadenoma patients. Classification models using k-Nearest Neighbor (kNN), Naïve Bayes (NB), and Random Forests (RF) were developed using expression levels of both miR-21 and miR-221. Best classification performance was achieved by NB Classification models, reaching 91% of correct classification. Furthermore, relative miR-221 expression was associated with histological tumor grades. Therefore, it may be concluded that both miR-21 and miR-221 can be used to differentiate between breast cancer patients and healthy controls, but that the diagnostic accuracy of serum miR-21 is superior to miR-221 for breast cancer prediction. miR-221 has more diagnostic power in discriminating between breast cancer and fibroadenoma patients. The overexpression of miR-221 has been associated with the breast cancer grade. We also demonstrated that the combined expression of miR-21 and miR-221can be successfully applied as breast cancer biomarkers

Cranfield CERES

Over-optimism in bioinformatics: an illustration

Author: Anne-Laure Boulesteix
Arthur Tenenhaus
Korbinian Strimmer
Monika Jelizarow
Vincent Guillemot
Publication venue
Publication date: 03/05/2010
Field of study

In statistical bioinformatics research, different optimization mechanisms potentially lead to "over-optimism" in published papers. The present empirical study illustrates these mechanisms through a concrete example from an active research field. The investigated sources of over-optimism include the optimization of the data sets, of the settings, of the competing methods and, most importantly, of the method’s characteristics. We consider a "promising" new classification algorithm that turns out to yield disappointing results in terms of error rate, namely linear discriminant analysis incorporating prior knowledge on gene functional groups through an appropriate shrinkage of the within-group covariance matrix. We quantitatively demonstrate that this disappointing method can artificially seem superior to existing approaches if we "fish for significance”. We conclude that, if the improvement of a quantitative criterion such as the error rate is the main contribution of a paper, the superiority of new algorithms should be validated using "fresh" validation data sets

HAL-CentraleSupelec

Open Access LMU

HAL Descartes

The University of Manchester - Institutional Repository

HAL-CEA

HAL-Rennes 1

Evaluation of the current knowledge limitations in breast cancer research: a gap analysis

Author: A Cox
A Howell
A Kamb
A Maurice
A Moyer
A Renwick
A Rodger
Adrian Harris
AH Eliassen
AH Sims
Alastair Thompson
Angela Cox
Anthony Howell
C Julian-Reynier
C Kuperwasser
CA Purdie
CD Wagner
CH Kroenke
Charles Streuli
CJ Watson
CJ Watson
CM Perou
D Sachdev
DF Easton
Diana Harcourt
DJ Britton
DJ Hunter
DM Abd El-Rehim
DM Harcourt
DS Main
E Amir
E Katz
E Tiligada
G Bon
GP Gui
GW Sledge
H Goulding
HE Huang
Ingunn Holen
International Agency for Research on Cancer Expert Group
IO Ellis
J Bogaerts
J Brennan
J Brett
J Corner
J Debnath
J Polanowska
J Stingl
J Teulière
JJ Dignam
JK Camoriano
JL Jones
JM Bartlett
JM Dixon
JM Gee
JT Scott
Julia Gee
JW Jonker
K Polyak
K Rennstam
KA Green
KE Sleeman
Keith Brennan
KJ Fowler
KL Schwertfeger
KR Brennan
KW Kinzler
L Cortesi
L Fallowfield
L Hennighausen
L Larue
L Lostumbo
LL Northouse
LS Freedman
M Adams
M Harvie
M Harvie
M Shackleton
M Sidani
M Tanner
M Zhang
MD Holmes
MD Sternlicht
MD Sternlicht
Michael Steel
Michelle Harvie
MJ Piccart-Gebhart
ML Asselin-Labat
ML McNeely
ML McNeely
MM Ip
MO Leach
N Barnes
N Rahman
N Sarwar
NG Howlett
P Dent
P Schofield
PA Kenny
R DasGupta
R Doll
R Leake
R Scully
R Serra
RA Walker
RB Clarke
Robert Nicholson
RT Chlebowski
S Anand
S Hiscox
S Seal
S Stylianou
SA Bingham
SE Moody
SJ Cleator
SK Sharan
SN Stacey
SR Johnston
T Sheard
T Sjöblom
T Sorlie
The Breast Cancer Association Consortium
The CHEK2 Breast Cancer Case-Control Consortium
V Aranda
V Speirs
VA McCormack
VG Vogel
WD Foulkes
X Yu
YL Low
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

BACKGROUND A gap analysis was conducted to determine which areas of breast cancer research, if targeted by researchers and funding bodies, could produce the greatest impact on patients. METHODS Fifty-six Breast Cancer Campaign grant holders and prominent UK breast cancer researchers participated in a gap analysis of current breast cancer research. Before, during and following the meeting, groups in seven key research areas participated in cycles of presentation, literature review and discussion. Summary papers were prepared by each group and collated into this position paper highlighting the research gaps, with recommendations for action. RESULTS Gaps were identified in all seven themes. General barriers to progress were lack of financial and practical resources, and poor collaboration between disciplines. Critical gaps in each theme included: (1) genetics (knowledge of genetic changes, their effects and interactions); (2) initiation of breast cancer (how developmental signalling pathways cause ductal elongation and branching at the cellular level and influence stem cell dynamics, and how their disruption initiates tumour formation); (3) progression of breast cancer (deciphering the intracellular and extracellular regulators of early progression, tumour growth, angiogenesis and metastasis); (4) therapies and targets (understanding who develops advanced disease); (5) disease markers (incorporating intelligent trial design into all studies to ensure new treatments are tested in patient groups stratified using biomarkers); (6) prevention (strategies to prevent oestrogen-receptor negative tumours and the long-term effects of chemoprevention for oestrogen-receptor positive tumours); (7) psychosocial aspects of cancer (the use of appropriate psychosocial interventions, and the personal impact of all stages of the disease among patients from a range of ethnic and demographic backgrounds). CONCLUSION Through recommendations to address these gaps with future research, the long-term benefits to patients will include: better estimation of risk in families with breast cancer and strategies to reduce risk; better prediction of drug response and patient prognosis; improved tailoring of treatments to patient subgroups and development of new therapeutic approaches; earlier initiation of treatment; more effective use of resources for screening populations; and an enhanced experience for people with or at risk of breast cancer and their families. The challenge to funding bodies and researchers in all disciplines is to focus on these gaps and to drive advances in knowledge into improvements in patient care

Crossref

Online Research @ Cardiff

Springer - Publisher Connector

PubMed Central

UWE Bristol Research Repository

Oxford University Research Archive

The University of Manchester - Institutional Repository

University of Dundee Online Publications

White Rose Research Online

The Cure: Making a game of gene selection for breast cancer survival prediction

Author: Good Benjamin M.
Griffith Obi L.
Loguercio Salvatore
Nanis Max
Su Andrew I.
Wu Chunlei
Publication venue: 'JMIR Publications Inc.'
Publication date: 01/01/2014
Field of study

Motivation: Molecular signatures for predicting breast cancer prognosis could greatly improve care through personalization of treatment. Computational analyses of genome-wide expression datasets have identified such signatures, but these signatures leave much to be desired in terms of accuracy, reproducibility and biological interpretability. Methods that take advantage of structured prior knowledge (e.g. protein interaction networks) show promise in helping to define better signatures but most knowledge remains unstructured. Crowdsourcing via scientific discovery games is an emerging methodology that has the potential to tap into human intelligence at scales and in modes previously unheard of. Here, we developed and evaluated a game called The Cure on the task of gene selection for breast cancer survival prediction. Our central hypothesis was that knowledge linking expression patterns of specific genes to breast cancer outcomes could be captured from game players. We envisioned capturing knowledge both from the players prior experience and from their ability to interpret text related to candidate genes presented to them in the context of the game. Results: Between its launch in Sept. 2012 and Sept. 2013, The Cure attracted more than 1,000 registered players who collectively played nearly 10,000 games. Gene sets assembled through aggregation of the collected data clearly demonstrated the accumulation of relevant expert knowledge. In terms of predictive accuracy, these gene sets provided comparable performance to gene sets generated using other methods including those used in commercial tests. The Cure is available at http://genegames.org/cure

arXiv.org e-Print Archive

Digital Commons@Becker

PubMed Central

Identification of an Efficient Gene Expression Panel for Glioblastoma Classification.

Author: Coppola Giovanni
Crisman Thomas J
Gao Fuying
Kawaguchi Riki
Kornblum Harley I
Laks Dan R
Zelaya Ivette
Zhao Yining
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

We present here a novel genetic algorithm-based random forest (GARF) modeling technique that enables a reduction in the complexity of large gene disease signatures to highly accurate, greatly simplified gene panels. When applied to 803 glioblastoma multiforme samples, this method allowed the 840-gene Verhaak et al. gene panel (the standard in the field) to be reduced to a 48-gene classifier, while retaining 90.91% classification accuracy, and outperforming the best available alternative methods. Additionally, using this approach we produced a 32-gene panel which allows for better consistency between RNA-seq and microarray-based classifications, improving cross-platform classification retention from 69.67% to 86.07%. A webpage producing these classifications is available at http://simplegbm.semel.ucla.edu

Directory of Open Access Journals

PubMed Central

eScholarship - University of California