4 research outputs found

    A LASSO-based approach to sample sites for phylogenetic tree search

    Get PDF
    Motivation In recent years, full-genome sequences have become increasingly available and as a result many modern phylogenetic analyses are based on very long sequences, often with over 100 000 sites. Phylogenetic reconstructions of large-scale alignments are challenging for likelihood-based phylogenetic inference programs and usually require using a powerful computer cluster. Current tools for alignment trimming prior to phylogenetic analysis do not promise a significant reduction in the alignment size and are claimed to have a negative effect on the accuracy of the obtained tree. Results Here, we propose an artificial-intelligence-based approach, which provides means to select the optimal subset of sites and a formula by which one can compute the log-likelihood of the entire data based on this subset. Our approach is based on training a regularized Lasso-regression model that optimizes the log-likelihood prediction accuracy while putting a constraint on the number of sites used for the approximation. We show that computing the likelihood based on 5% of the sites already provides accurate approximation of the tree likelihood based on the entire data. Furthermore, we show that using this Lasso-based approximation during a tree search decreased running-time substantially while retaining the same tree-search performance

    A machine-learning based alternative to phylogenetic bootstrap

    No full text
    A data-driven approach to estimate branch support values with a probabilistic interpretation</p

    COVID‐19 pandemic‐related lockdown: response time is more important than its strictness

    No full text
    Abstract The rapid spread of SARS‐CoV‐2 and its threat to health systems worldwide have led governments to take acute actions to enforce social distancing. Previous studies used complex epidemiological models to quantify the effect of lockdown policies on infection rates. However, these rely on prior assumptions or on official regulations. Here, we use country‐specific reports of daily mobility from people cellular usage to model social distancing. Our data‐driven model enabled the extraction of lockdown characteristics which were crossed with observed mortality rates to show that: (i) the time at which social distancing was initiated is highly correlated with the number of deaths, r2 = 0.64, while the lockdown strictness or its duration is not as informative; (ii) a delay of 7.49 days in initiating social distancing would double the number of deaths; and (iii) the immediate response has a prolonged effect on COVID‐19 death toll

    An Approximate Bayesian Computation Approach for Modeling Genome Rearrangements

    No full text
    The inference of genome rearrangement events has been extensively studied, as they play a major role in molecular evolution. However, probabilistic evolutionary models that explicitly imitate the evolutionary dynamics of such events, as well as methods to infer model parameters, are yet to be fully utilized. Here, we developed a probabilistic approach to infer genome rearrangement rate parameters using an Approximate Bayesian Computation (ABC) framework. We developed two genome rearrangement models, a basic model, which accounts for genomic changes in gene order, and a more sophisticated one which also accounts for changes in chromosome number. We characterized the ABC inference accuracy using simulations and applied our methodology to both prokaryotic and eukaryotic empirical datasets. Knowledge of genome-rearrangement rates can help elucidate their role in evolution as well as help simulate genomes with evolutionary dynamics that reflect empirical genomes
    corecore