4 research outputs found
Parallel Implementation of Efficient Search Schemes for the Inference of Cancer Progression Models
The emergence and development of cancer is a consequence of the accumulation
over time of genomic mutations involving a specific set of genes, which
provides the cancer clones with a functional selective advantage. In this work,
we model the order of accumulation of such mutations during the progression,
which eventually leads to the disease, by means of probabilistic graphic
models, i.e., Bayesian Networks (BNs). We investigate how to perform the task
of learning the structure of such BNs, according to experimental evidence,
adopting a global optimization meta-heuristics. In particular, in this work we
rely on Genetic Algorithms, and to strongly reduce the execution time of the
inference -- which can also involve multiple repetitions to collect
statistically significant assessments of the data -- we distribute the
calculations using both multi-threading and a multi-node architecture. The
results show that our approach is characterized by good accuracy and
specificity; we also demonstrate its feasibility, thanks to a 84x reduction of
the overall execution time with respect to a traditional sequential
implementation
Efficient computational strategies to learn the structure of probabilistic graphical models of cumulative phenomena
Structural learning of Bayesian Networks (BNs) is a NP-hard problem, which is
further complicated by many theoretical issues, such as the I-equivalence among
different structures. In this work, we focus on a specific subclass of BNs,
named Suppes-Bayes Causal Networks (SBCNs), which include specific structural
constraints based on Suppes' probabilistic causation to efficiently model
cumulative phenomena. Here we compare the performance, via extensive
simulations, of various state-of-the-art search strategies, such as local
search techniques and Genetic Algorithms, as well as of distinct regularization
methods. The assessment is performed on a large number of simulated datasets
from topologies with distinct levels of complexity, various sample size and
different rates of errors in the data. Among the main results, we show that the
introduction of Suppes' constraints dramatically improve the inference
accuracy, by reducing the solution space and providing a temporal ordering on
the variables. We also report on trade-offs among different search techniques
that can be efficiently employed in distinct experimental settings. This
manuscript is an extended version of the paper "Structural Learning of
Probabilistic Graphical Models of Cumulative Phenomena" presented at the 2018
International Conference on Computational Science
Multi-objective optimization to explicitly account for model complexity when learning Bayesian Networks
Bayesian Networks have been widely used in the last decades in many fields,
to describe statistical dependencies among random variables. In general,
learning the structure of such models is a problem with considerable
theoretical interest that still poses many challenges. On the one hand, this is
a well-known NP-complete problem, which is practically hardened by the huge
search space of possible solutions. On the other hand, the phenomenon of
I-equivalence, i.e., different graphical structures underpinning the same set
of statistical dependencies, may lead to multimodal fitness landscapes further
hindering maximum likelihood approaches to solve the task. Despite all these
difficulties, greedy search methods based on a likelihood score coupled with a
regularization term to account for model complexity, have been shown to be
surprisingly effective in practice. In this paper, we consider the formulation
of the task of learning the structure of Bayesian Networks as an optimization
problem based on a likelihood score. Nevertheless, our approach do not adjust
this score by means of any of the complexity terms proposed in the literature;
instead, it accounts directly for the complexity of the discovered solutions by
exploiting a multi-objective optimization procedure. To this extent, we adopt
NSGA-II and define the first objective function to be the likelihood of a
solution and the second to be the number of selected arcs. We thoroughly
analyze the behavior of our method on a wide set of simulated data, and we
discuss the performance considering the goodness of the inferred solutions both
in terms of their objective functions and with respect to the retrieved
structure. Our results show that NSGA-II can converge to solutions
characterized by better likelihood and less arcs than classic approaches,
although paradoxically frequently characterized by a lower similarity to the
target network
Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data
Background. A large number of algorithms is being developed to reconstruct
evolutionary models of individual tumours from genome sequencing data. Most
methods can analyze multiple samples collected either through bulk multi-region
sequencing experiments or the sequencing of individual cancer cells. However,
rarely the same method can support both data types.
Results. We introduce TRaIT, a computational framework to infer mutational
graphs that model the accumulation of multiple types of somatic alterations
driving tumour evolution. Compared to other tools, TRaIT supports multi-region
and single-cell sequencing data within the same statistical framework, and
delivers expressive models that capture many complex evolutionary phenomena.
TRaIT improves accuracy, robustness to data-specific errors and computational
complexity compared to competing methods.
Conclusions. We show that the application of TRaIT to single-cell and
multi-region cancer datasets can produce accurate and reliable models of
single-tumour evolution, quantify the extent of intra-tumour heterogeneity and
generate new testable experimental hypotheses