9,038 research outputs found

    Boosting the concordance index for survival data - a unified framework to derive and evaluate biomarker combinations

    Get PDF
    The development of molecular signatures for the prediction of time-to-event outcomes is a methodologically challenging task in bioinformatics and biostatistics. Although there are numerous approaches for the derivation of marker combinations and their evaluation, the underlying methodology often suffers from the problem that different optimization criteria are mixed during the feature selection, estimation and evaluation steps. This might result in marker combinations that are only suboptimal regarding the evaluation criterion of interest. To address this issue, we propose a unified framework to derive and evaluate biomarker combinations. Our approach is based on the concordance index for time-to-event data, which is a non-parametric measure to quantify the discrimatory power of a prediction rule. Specifically, we propose a component-wise boosting algorithm that results in linear biomarker combinations that are optimal with respect to a smoothed version of the concordance index. We investigate the performance of our algorithm in a large-scale simulation study and in two molecular data sets for the prediction of survival in breast cancer patients. Our numerical results show that the new approach is not only methodologically sound but can also lead to a higher discriminatory power than traditional approaches for the derivation of gene signatures.Comment: revised manuscript - added simulation study, additional result

    Improving average ranking precision in user searches for biomedical research datasets

    Full text link
    Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorisation method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries. Our system provides competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP among the participants, being +22.3% higher than the median infAP of the participant's best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system's performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. Our similarity measure algorithm seems to be robust, in particular compared to Divergence From Randomness framework, having smaller performance variations under different training conditions. Finally, the result categorization did not have significant impact on the system's performance. We believe that our solution could be used to enhance biomedical dataset management systems. In particular, the use of data driven query expansion methods could be an alternative to the complexity of biomedical terminologies

    Boosting Correlation Based Penalization in Generalized Linear Models

    Get PDF
    In high dimensional regression problems penalization techniques are a useful tool for estimation and variable selection. We propose a novel penalization technique that aims at the grouping effect which encourages strongly correlated predictors to be in or out of the model together. The proposed penalty uses the correlation between predictors explicitly. We consider a simple version that does not select variables and a boosted version which is able to reduce the number of variables in the model. Both methods are derived within the framework of generalized linear models. The performance is evaluated by simulations and by use of real world data sets

    Feature Augmentation via Nonparametrics and Selection (FANS) in High Dimensional Classification

    Full text link
    We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.Comment: 30 pages, 2 figure

    Predicting a local recurrence after breast-conserving therapy by gene expression profiling

    Get PDF
    INTRODUCTION: To tailor local treatment in breast cancer patients there is a need for predicting ipsilateral recurrences after breast-conserving therapy. After adequate treatment (excision with free margins and radiotherapy), young age and incompletely excised extensive intraductal component are predictors for local recurrence, but many local recurrences can still not be predicted. Here we have used gene expression profiling by microarray analysis to identify gene expression profiles that can help to predict local recurrence in individual patients. METHODS: By using previously established gene expression profiles with proven value in predicting metastasis-free and overall survival (wound-response signature, 70-gene prognosis profile and hypoxia-induced profile) and training towards an optimal prediction of local recurrences in a training series, we establish a classifier for local recurrence after breast-conserving therapy. RESULTS: Validation of the different gene lists shows that the wound-response signature is able to separate patients with a high (29%) or low (5%) risk of a local recurrence at 10 years (sensitivity 87.5%, specificity 75%). In multivariable analysis the classifier is an independent predictor for local recurrence. CONCLUSION: Our findings indicate that gene expression profiling can identify subgroups of patients at increased risk of developing a local recurrence after breast-conserving therapy

    Neuroblastoma patient outcomes, tumor differentiation, and ERK activation are correlated with expression levels of the ubiquitin ligase UBE4B.

    Get PDF
    BackgroundUBE4B is an E3/E4 ubiquitin ligase whose gene is located in chromosome 1p36.22. We analyzed the associations of UBE4B gene and protein expression with neuroblastoma patient outcomes and with tumor prognostic features and histology.MethodsWe evaluated the association of UBE4B gene expression with neuroblastoma patient outcomes using the R2 Platform. We screened neuroblastoma tumor samples for UBE4B protein expression using immunohistochemistry. FISH for UBE4B and 1p36 deletion was performed on tumor samples. We then evaluated UBE4B expression for associations with prognostic factors and with levels of phosphorylated ERK in neuroblastoma tumors and cell lines.ResultsLow UBE4B gene expression is associated with poor outcomes in patients with neuroblastoma and with worse outcomes in all patient subgroups. UBE4B protein expression was associated with neuroblastoma tumor differentiation, and decreased UBE4B protein levels were associated with high-risk features. UBE4B protein levels were also associated with levels of phosphorylated ERK.ConclusionsWe have demonstrated associations between UBE4B gene expression and neuroblastoma patient outcomes and prognostic features. Reduced UBE4B protein expression in neuroblastoma tumors was associated with high-risk features, a lack of differentiation, and with ERK activation. These results suggest UBE4B may contribute to the poor prognosis of neuroblastoma tumors with 1p36 deletions and that UBE4B expression may mediate neuroblastoma differentiation

    Stable Feature Selection for Biomarker Discovery

    Full text link
    Feature selection techniques have been used as the workhorse in biomarker discovery applications for a long time. Surprisingly, the stability of feature selection with respect to sampling variations has long been under-considered. It is only until recently that this issue has received more and more attention. In this article, we review existing stable feature selection methods for biomarker discovery using a generic hierarchal framework. We have two objectives: (1) providing an overview on this new yet fast growing topic for a convenient reference; (2) categorizing existing methods under an expandable framework for future research and development
    corecore