4,782 research outputs found
Every which way? On predicting tumor evolution using cancer progression models
Successful prediction of the likely paths of tumor progression is valuable for diagnostic,
prognostic, and treatment purposes. Cancer progression models (CPMs) use cross-sectional samples to identify restrictions in the order of accumulation of driver mutations and
thus CPMs encode the paths of tumor progression. Here we analyze the performance of
four CPMs to examine whether they can be used to predict the true distribution of paths of
tumor progression and to estimate evolutionary unpredictability. Employing simulations we
show that if fitness landscapes are single peaked (have a single fitness maximum) there is
good agreement between true and predicted distributions of paths of tumor progression
when sample sizes are large, but performance is poor with the currently common much
smaller sample sizes. Under multi-peaked fitness landscapes (i.e., those with multiple fitness maxima), performance is poor and improves only slightly with sample size. In all
cases, detection regime (when tumors are sampled) is a key determinant of performance.
Estimates of evolutionary unpredictability from the best performing CPM, among the four
examined, tend to overestimate the true unpredictability and the bias is affected by detection
regime; CPMs could be useful for estimating upper bounds to the true evolutionary unpredictability. Analysis of twenty-two cancer data sets shows low evolutionary unpredictability
for several of the data sets. But most of the predictions of paths of tumor progression are
very unreliable, and unreliability increases with the number of features analyzed. Our results
indicate that CPMs could be valuable tools for predicting cancer progression but that, currently, obtaining useful predictions of paths of tumor progression from CPMs is dubious, and
emphasize the need for methodological work that can account for the probably multi-peaked
fitness landscapes in cancerWork partially supported by BFU2015-
67302-R (MINECO/FEDER, EU) to RDU. CV
supported by PEJD-2016-BMD-2116 from
Comunidad de Madrid to RD
Dynamic Bayesian networks in molecular plant science: inferring gene regulatory networks from multiple gene expression time series
To understand the processes of growth and biomass production in plants, we ultimately need to elucidate the structure of the underlying regulatory networks at the molecular level. The advent of high-throughput postgenomic technologies has spurred substantial interest in reverse engineering these networks from data, and several techniques from machine learning and multivariate statistics have recently been proposed. The present article discusses the problem of inferring gene regulatory networks from gene expression time series, and we focus our exposition on the methodology of Bayesian networks. We describe dynamic Bayesian networks and explain their advantages over other statistical methods. We introduce a novel information sharing scheme, which allows us to infer gene regulatory networks from multiple sources of gene expression data more accurately. We illustrate and test this method on a set of synthetic data, using three different measures to quantify the network reconstruction accuracy. The main application of our method is related to the problem of circadian regulation in plants, where we aim to reconstruct the regulatory networks of nine circadian genes in Arabidopsis thaliana from four gene expression time series obtained under different experimental conditions
Inferring gene regression networks with model trees
Background: Novel strategies are required in order to handle the huge amount of data produced by microarray
technologies. To infer gene regulatory networks, the first step is to find direct regulatory relationships between
genes building the so-called gene co-expression networks. They are typically generated using correlation statistics
as pairwise similarity measures. Correlation-based methods are very useful in order to determine whether two
genes have a strong global similarity but do not detect local similarities.
Results: We propose model trees as a method to identify gene interaction networks. While correlation-based
methods analyze each pair of genes, in our approach we generate a single regression tree for each gene from the
remaining genes. Finally, a graph from all the relationships among output and input genes is built taking into
account whether the pair of genes is statistically significant. For this reason we apply a statistical procedure to
control the false discovery rate. The performance of our approach, named REGNET, is experimentally tested on two
well-known data sets: Saccharomyces Cerevisiae and E.coli data set. First, the biological coherence of the results are
tested. Second the E.coli transcriptional network (in the Regulon database) is used as control to compare the
results to that of a correlation-based method. This experiment shows that REGNET performs more accurately at
detecting true gene associations than the Pearson and Spearman zeroth and first-order correlation-based methods.
Conclusions: REGNET generates gene association networks from gene expression data, and differs from
correlation-based methods in that the relationship between one gene and others is calculated simultaneously.
Model trees are very useful techniques to estimate the numerical values for the target genes by linear regression
functions. They are very often more precise than linear regression models because they can add just different
linear regressions to separate areas of the search space favoring to infer localized similarities over a more global
similarity. Furthermore, experimental results show the good performance of REGNET.Ministerio de Ciencia e Innovación TIN2011-68084-C02-00Ministerio de Ciencia e Innovación PCI2006-A7-0575Junta de Andalucia P07-TIC- 02611Junta de Andalucía TIC-20
- …