9,038 research outputs found
Boosting the concordance index for survival data - a unified framework to derive and evaluate biomarker combinations
The development of molecular signatures for the prediction of time-to-event
outcomes is a methodologically challenging task in bioinformatics and
biostatistics. Although there are numerous approaches for the derivation of
marker combinations and their evaluation, the underlying methodology often
suffers from the problem that different optimization criteria are mixed during
the feature selection, estimation and evaluation steps. This might result in
marker combinations that are only suboptimal regarding the evaluation criterion
of interest. To address this issue, we propose a unified framework to derive
and evaluate biomarker combinations. Our approach is based on the concordance
index for time-to-event data, which is a non-parametric measure to quantify the
discrimatory power of a prediction rule. Specifically, we propose a
component-wise boosting algorithm that results in linear biomarker combinations
that are optimal with respect to a smoothed version of the concordance index.
We investigate the performance of our algorithm in a large-scale simulation
study and in two molecular data sets for the prediction of survival in breast
cancer patients. Our numerical results show that the new approach is not only
methodologically sound but can also lead to a higher discriminatory power than
traditional approaches for the derivation of gene signatures.Comment: revised manuscript - added simulation study, additional result
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
Boosting Correlation Based Penalization in Generalized Linear Models
In high dimensional regression problems penalization techniques are a useful tool for estimation and variable selection. We
propose a novel penalization technique that aims at the grouping effect which encourages strongly correlated predictors to be in
or out of the model together. The proposed penalty uses the correlation between predictors explicitly. We consider a simple
version that does not select variables and a boosted version which is able to reduce the number of variables in the model. Both
methods are derived within the framework of generalized linear models. The performance is evaluated by simulations and by use of
real world data sets
Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification
We propose a high dimensional classification method that involves
nonparametric feature augmentation. Knowing that marginal density ratios are
the most powerful univariate classifiers, we use the ratio estimates to
transform the original feature measurements. Subsequently, penalized logistic
regression is invoked, taking as input the newly transformed or augmented
features. This procedure trains models equipped with local complexity and
global simplicity, thereby avoiding the curse of dimensionality while creating
a flexible nonlinear decision boundary. The resulting method is called Feature
Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by
generalizing the Naive Bayes model, writing the log ratio of joint densities as
a linear combination of those of marginal densities. It is related to
generalized additive models, but has better interpretability and computability.
Risk bounds are developed for FANS. In numerical analysis, FANS is compared
with competing methods, so as to provide a guideline on its best application
domain. Real data analysis demonstrates that FANS performs very competitively
on benchmark email spam and gene expression data sets. Moreover, FANS is
implemented by an extremely fast algorithm through parallel computing.Comment: 30 pages, 2 figure
Predicting a local recurrence after breast-conserving therapy by gene expression profiling
INTRODUCTION: To tailor local treatment in breast cancer patients there is a need for predicting ipsilateral recurrences after breast-conserving therapy. After adequate treatment (excision with free margins and radiotherapy), young age and incompletely excised extensive intraductal component are predictors for local recurrence, but many local recurrences can still not be predicted. Here we have used gene expression profiling by microarray analysis to identify gene expression profiles that can help to predict local recurrence in individual patients. METHODS: By using previously established gene expression profiles with proven value in predicting metastasis-free and overall survival (wound-response signature, 70-gene prognosis profile and hypoxia-induced profile) and training towards an optimal prediction of local recurrences in a training series, we establish a classifier for local recurrence after breast-conserving therapy. RESULTS: Validation of the different gene lists shows that the wound-response signature is able to separate patients with a high (29%) or low (5%) risk of a local recurrence at 10 years (sensitivity 87.5%, specificity 75%). In multivariable analysis the classifier is an independent predictor for local recurrence. CONCLUSION: Our findings indicate that gene expression profiling can identify subgroups of patients at increased risk of developing a local recurrence after breast-conserving therapy
Neuroblastoma patient outcomes, tumor differentiation, and ERK activation are correlated with expression levels of the ubiquitin ligase UBE4B.
BackgroundUBE4B is an E3/E4 ubiquitin ligase whose gene is located in chromosome 1p36.22. We analyzed the associations of UBE4B gene and protein expression with neuroblastoma patient outcomes and with tumor prognostic features and histology.MethodsWe evaluated the association of UBE4B gene expression with neuroblastoma patient outcomes using the R2 Platform. We screened neuroblastoma tumor samples for UBE4B protein expression using immunohistochemistry. FISH for UBE4B and 1p36 deletion was performed on tumor samples. We then evaluated UBE4B expression for associations with prognostic factors and with levels of phosphorylated ERK in neuroblastoma tumors and cell lines.ResultsLow UBE4B gene expression is associated with poor outcomes in patients with neuroblastoma and with worse outcomes in all patient subgroups. UBE4B protein expression was associated with neuroblastoma tumor differentiation, and decreased UBE4B protein levels were associated with high-risk features. UBE4B protein levels were also associated with levels of phosphorylated ERK.ConclusionsWe have demonstrated associations between UBE4B gene expression and neuroblastoma patient outcomes and prognostic features. Reduced UBE4B protein expression in neuroblastoma tumors was associated with high-risk features, a lack of differentiation, and with ERK activation. These results suggest UBE4B may contribute to the poor prognosis of neuroblastoma tumors with 1p36 deletions and that UBE4B expression may mediate neuroblastoma differentiation
Stable Feature Selection for Biomarker Discovery
Feature selection techniques have been used as the workhorse in biomarker
discovery applications for a long time. Surprisingly, the stability of feature
selection with respect to sampling variations has long been under-considered.
It is only until recently that this issue has received more and more attention.
In this article, we review existing stable feature selection methods for
biomarker discovery using a generic hierarchal framework. We have two
objectives: (1) providing an overview on this new yet fast growing topic for a
convenient reference; (2) categorizing existing methods under an expandable
framework for future research and development
- …