6,438 research outputs found
Every which way? On predicting tumor evolution using cancer progression models
Successful prediction of the likely paths of tumor progression is valuable for diagnostic,
prognostic, and treatment purposes. Cancer progression models (CPMs) use cross-sectional samples to identify restrictions in the order of accumulation of driver mutations and
thus CPMs encode the paths of tumor progression. Here we analyze the performance of
four CPMs to examine whether they can be used to predict the true distribution of paths of
tumor progression and to estimate evolutionary unpredictability. Employing simulations we
show that if fitness landscapes are single peaked (have a single fitness maximum) there is
good agreement between true and predicted distributions of paths of tumor progression
when sample sizes are large, but performance is poor with the currently common much
smaller sample sizes. Under multi-peaked fitness landscapes (i.e., those with multiple fitness maxima), performance is poor and improves only slightly with sample size. In all
cases, detection regime (when tumors are sampled) is a key determinant of performance.
Estimates of evolutionary unpredictability from the best performing CPM, among the four
examined, tend to overestimate the true unpredictability and the bias is affected by detection
regime; CPMs could be useful for estimating upper bounds to the true evolutionary unpredictability. Analysis of twenty-two cancer data sets shows low evolutionary unpredictability
for several of the data sets. But most of the predictions of paths of tumor progression are
very unreliable, and unreliability increases with the number of features analyzed. Our results
indicate that CPMs could be valuable tools for predicting cancer progression but that, currently, obtaining useful predictions of paths of tumor progression from CPMs is dubious, and
emphasize the need for methodological work that can account for the probably multi-peaked
fitness landscapes in cancerWork partially supported by BFU2015-
67302-R (MINECO/FEDER, EU) to RDU. CV
supported by PEJD-2016-BMD-2116 from
Comunidad de Madrid to RD
Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data
Image data are increasingly encountered and are of growing importance in many
areas of science. Much of these data are quantitative image data, which are
characterized by intensities that represent some measurement of interest in the
scanned images. The data typically consist of multiple images on the same
domain and the goal of the research is to combine the quantitative information
across images to make inference about populations or interventions. In this
paper we present a unified analysis framework for the analysis of quantitative
image data using a Bayesian functional mixed model approach. This framework is
flexible enough to handle complex, irregular images with many local features,
and can model the simultaneous effects of multiple factors on the image
intensities and account for the correlation between images induced by the
design. We introduce a general isomorphic modeling approach to fitting the
functional mixed model, of which the wavelet-based functional mixed model is
one special case. With suitable modeling choices, this approach leads to
efficient calculations and can result in flexible modeling and adaptive
smoothing of the salient features in the data. The proposed method has the
following advantages: it can be run automatically, it produces inferential
plots indicating which regions of the image are associated with each factor, it
simultaneously considers the practical and statistical significance of
findings, and it controls the false discovery rate.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS407 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Wavelet feature extraction and genetic algorithm for biomarker detection in colorectal cancer data
Biomarkers which predict patient’s survival can play an important role in medical diagnosis and
treatment. How to select the significant biomarkers from hundreds of protein markers is a key step in
survival analysis. In this paper a novel method is proposed to detect the prognostic biomarkers ofsurvival in colorectal cancer patients using wavelet analysis, genetic algorithm, and Bayes classifier. One dimensional discrete wavelet transform (DWT) is normally used to reduce the dimensionality of biomedical data. In this study one dimensional continuous wavelet transform (CWT) was proposed to extract the features of colorectal cancer data. One dimensional CWT has no ability to reduce
dimensionality of data, but captures the missing features of DWT, and is complementary part of DWT. Genetic algorithm was performed on extracted wavelet coefficients to select the optimized features, using Bayes classifier to build its fitness function. The corresponding protein markers were
located based on the position of optimized features. Kaplan-Meier curve and Cox regression model 2 were used to evaluate the performance of selected biomarkers. Experiments were conducted on colorectal cancer dataset and several significant biomarkers were detected. A new protein biomarker CD46 was found to significantly associate with survival time
Data Mining in Healthcare: A Survey of Techniques and Algorithms with its Limitations and Challenges
The large amount of data in healthcare industry is a key resource to be processed and analyzed for knowledge extraction. The knowledge discovery is the process of making low-level data into high-level knowledge. Data mining is a core component of the KDD process. Data mining techniques are used in healthcare management which improve the quality and decrease the cost of healthcare services. Data mining algorithms are needed in almost every step in KDD process ranging from domain understanding to knowledge evaluation. It is necessary to identify and evaluate the most common data mining algorithms implemented in modern healthcare services. The need is for algorithms with very high accuracy as medical diagnosis is considered as a significant yet obscure task that needs to be carried out precisely and efficiently
- …