9 research outputs found
VaiPhy: a Variational Inference Based Algorithm for Phylogeny
Phylogenetics is a classical methodology in computational biology that today
has become highly relevant for medical investigation of single-cell data, e.g.,
in the context of cancer development. The exponential size of the tree space
is, unfortunately, a substantial obstacle for Bayesian phylogenetic inference
using Markov chain Monte Carlo based methods since these rely on local
operations. And although more recent variational inference (VI) based methods
offer speed improvements, they rely on expensive auto-differentiation
operations for learning the variational parameters. We propose VaiPhy, a
remarkably fast VI based algorithm for approximate posterior inference in an
augmented tree space. VaiPhy produces marginal log-likelihood estimates on par
with the state-of-the-art methods on real data and is considerably faster since
it does not require auto-differentiation. Instead, VaiPhy combines coordinate
ascent update equations with two novel sampling schemes: (i) SLANTIS, a
proposal distribution for tree topologies in the augmented tree space, and (ii)
the JC sampler, to the best of our knowledge, the first-ever scheme for
sampling branch lengths directly from the popular Jukes-Cantor model. We
compare VaiPhy in terms of density estimation and runtime. Additionally, we
evaluate the reproducibility of the baselines. We provide our code on GitHub:
\url{https://github.com/Lagergren-Lab/VaiPhy}.Comment: NeurIPS-22 conference pape
Novel likelihood-based inference techniques for sequential data with medical and biological applications
The probabilistic approach is crucial in modern machine learning, as it provides transparency and quantification of uncertainty. This thesis is concerned with the probabilistic building blocks, i.e., probabilistic graphical models (PGM) followed by application of standard deterministic approximate inference, i.e., Expectation-Maximization (EM) and Variational Inference (VI). The contribution regards improvements on the parameter learning of EM, most importantly, novel probabilistic models, and new VI methodology for phylogenetic inference. Firstly, this thesis improves upon the vanilla EM algorithm for hidden Markov models (HMM) and mixtures of HMMs (MHMM). The proposed constrained EM algorithm for HMMs compensates for the lack of long-range context in HMMs. The two other proposed novel regularized EM algorithms provide better local optima for parameter learning of MHMMs, particularly in cancer analysis. The novel EMs are merely modifications of the standard EM algorithm that do not add any extra complexity, unlike other modifications targeting the context and poor local optima issues. Secondly, this thesis introduces one novel and one augmented PGMs together with the VI frameworks for robust and fast Bayesian inference. The first method, CopyMix, uses a single-phase framework to simultaneously provide clonal decomposition and copy number pro- filing of single-cell cancer data. So, in contrast to previous approaches, it does not achieve the two objectives in a sequential and ad-hoc fashion, which is prune to introduce artifacts and errors. The second method provides an augmented PGM with a faster framework for phylogenetic inference; specifically, a novel natural gradient-based VI algorithm is devised. Regarding the cancer analysis, this thesis concludes that CopyMix is superior to MH- MMs, despite that the two novel EM algorithms proposed in this thesis partially improve the performance of clonal tumor decomposition. The empirical support presented throughout this thesis confirms that the proposed likelihood-based methods and optimization tools provide opportunities for better analysis algorithms, particularly suited for cancer research. QC 20220422</p
Novel likelihood-based inference techniques for sequential data with medical and biological applications
The probabilistic approach is crucial in modern machine learning, as it provides transparency and quantification of uncertainty. This thesis is concerned with the probabilistic building blocks, i.e., probabilistic graphical models (PGM) followed by application of standard deterministic approximate inference, i.e., Expectation-Maximization (EM) and Variational Inference (VI). The contribution regards improvements on the parameter learning of EM, most importantly, novel probabilistic models, and new VI methodology for phylogenetic inference. Firstly, this thesis improves upon the vanilla EM algorithm for hidden Markov models (HMM) and mixtures of HMMs (MHMM). The proposed constrained EM algorithm for HMMs compensates for the lack of long-range context in HMMs. The two other proposed novel regularized EM algorithms provide better local optima for parameter learning of MHMMs, particularly in cancer analysis. The novel EMs are merely modifications of the standard EM algorithm that do not add any extra complexity, unlike other modifications targeting the context and poor local optima issues. Secondly, this thesis introduces one novel and one augmented PGMs together with the VI frameworks for robust and fast Bayesian inference. The first method, CopyMix, uses a single-phase framework to simultaneously provide clonal decomposition and copy number pro- filing of single-cell cancer data. So, in contrast to previous approaches, it does not achieve the two objectives in a sequential and ad-hoc fashion, which is prune to introduce artifacts and errors. The second method provides an augmented PGM with a faster framework for phylogenetic inference; specifically, a novel natural gradient-based VI algorithm is devised. Regarding the cancer analysis, this thesis concludes that CopyMix is superior to MH- MMs, despite that the two novel EM algorithms proposed in this thesis partially improve the performance of clonal tumor decomposition. The empirical support presented throughout this thesis confirms that the proposed likelihood-based methods and optimization tools provide opportunities for better analysis algorithms, particularly suited for cancer research. QC 20220422</p
VaiPhy: a Variational Inference Based Algorithm for Phylogeny
Phylogenetics is a classical methodology in com- putational biology that today has become highly relevant for medical investigation of single-cell data, e.g., in the context of development of can- cer. The exponential size of the tree space is unfortunately a formidable obstacle for current Bayesian phylogenetic inference using Markov chain Monte Carlo based methods since these rely on local operations. And although more re- cent variational inference (VI) based methods of- fer speed improvements, they rely on expensive auto-differentiation operations for learning the variational parameters. We propose VaiPhy, a remarkably fast VI based algorithm for approx- imate posterior inference in an augmented tree space. VaiPhy produces marginal log-likelihood estimates on par with the state-of-the-art meth- ods on real data, and is considerably faster since it does not require auto-differentiation. Instead, VaiPhy combines coordinate ascent update equa- tions with two novel sampling schemes: (i) SLANTIS, a proposal distribution for tree topolo- gies in the augmented tree space, and (ii) the JC sampler, the, to the best of our knowledge, first ever scheme for sampling branch lengths directly from the popular Jukes-Cantor model. We compare VaiPhy in terms of density esti- mation and runtime. Additionally, we evaluate the reproducibility of the baselines. We provide our code on GitHub: https://github.com/ Lagergren-Lab/VaiPhy. QC 20220421</p
VaiPhy: a Variational Inference Based Algorithm for Phylogeny
Phylogenetics is a classical methodology in com- putational biology that today has become highly relevant for medical investigation of single-cell data, e.g., in the context of development of can- cer. The exponential size of the tree space is unfortunately a formidable obstacle for current Bayesian phylogenetic inference using Markov chain Monte Carlo based methods since these rely on local operations. And although more re- cent variational inference (VI) based methods of- fer speed improvements, they rely on expensive auto-differentiation operations for learning the variational parameters. We propose VaiPhy, a remarkably fast VI based algorithm for approx- imate posterior inference in an augmented tree space. VaiPhy produces marginal log-likelihood estimates on par with the state-of-the-art meth- ods on real data, and is considerably faster since it does not require auto-differentiation. Instead, VaiPhy combines coordinate ascent update equa- tions with two novel sampling schemes: (i) SLANTIS, a proposal distribution for tree topolo- gies in the augmented tree space, and (ii) the JC sampler, the, to the best of our knowledge, first ever scheme for sampling branch lengths directly from the popular Jukes-Cantor model. We compare VaiPhy in terms of density esti- mation and runtime. Additionally, we evaluate the reproducibility of the baselines. We provide our code on GitHub: https://github.com/ Lagergren-Lab/VaiPhy. QC 20220421</p