24 research outputs found

    Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting

    Full text link
    Phylogenetic networks are necessary to represent the tree of life expanded by edges to represent events such as horizontal gene transfers, hybridizations or gene flow. Not all species follow the paradigm of vertical inheritance of their genetic material. While a great deal of research has flourished into the inference of phylogenetic trees, statistical methods to infer phylogenetic networks are still limited and under development. The main disadvantage of existing methods is a lack of scalability. Here, we present a statistical method to infer phylogenetic networks from multi-locus genetic data in a pseudolikelihood framework. Our model accounts for incomplete lineage sorting through the coalescent model, and for horizontal inheritance of genes through reticulation nodes in the network. Computation of the pseudolikelihood is fast and simple, and it avoids the burdensome calculation of the full likelihood which can be intractable with many species. Moreover, estimation at the quartet-level has the added computational benefit that it is easily parallelizable. Simulation studies comparing our method to a full likelihood approach show that our pseudolikelihood approach is much faster without compromising accuracy. We applied our method to reconstruct the evolutionary relationships among swordtails and platyfishes (XiphophorusXiphophorus: Poeciliidae), which is characterized by widespread hybridizations

    Sparse Gaussian chain graphs with the spike-and-slab LASSO: Algorithms and asymptotics

    Full text link
    The Gaussian chain graph model simultaneously parametrizes (i) the direct effects of pp predictors on qq correlated outcomes and (ii) the residual partial covariance between pair of outcomes. We introduce a new method for fitting sparse Gaussian chain graph models with spike-and-slab LASSO (SSL) priors. We develop an Expectation-Conditional Maximization algorithm to obtain sparse estimates of the p×qp \times q matrix of direct effects and the q×qq \times q residual precision matrix. Our algorithm iteratively solves a sequence of penalized maximum likelihood problems with self-adaptive penalties that gradually filter out negligible regression coefficients and partial covariances. Because it adaptively penalizes model parameters, our method is seen to outperform fixed-penalty competitors on simulated data. We establish the posterior concentration rate for our model, buttressing our method's excellent empirical performance with strong theoretical guarantees. We use our method to reanalyze a dataset from a study of the effects of diet and residence type on the composition of the gut microbiome of elderly adults

    PhyloNetworks: A package for phylogenetic networks

    No full text
    International audiencePhyloNetworks is a Julia package for the inference, manipulation, visualization, and use of phylogenetic networks in an interactive environment. Inference of phylogenetic networks is done with maximum pseudolikelihood from gene trees or multi-locus sequences (SNaQ), with possible bootstrap analysis. PhyloNetworks is the first software providing tools to summarize a set of networks (from a bootstrap or posterior sample) with measures of tree edge support, hybrid edge support, and hybrid node support. Networks can be used for phylogenetic comparative analysis of continuous traits, to estimate ancestral states or do a phylogenetic regression. The software is available in open source and with documentation at https://github.com/crsl4/PhyloNetworks.jl

    Networks with <i>k</i> = 4 nodes in the reticulation cycle and identical unrooted topologies.

    No full text
    <p>They differ in their hybrid position (left: good diamond, right: bad diamond I). If <i>D</i><sub>2</sub> is not sampled (<i>n</i> = 4), only for <i>i</i> = 1, 2 are identifiable and the 2 networks are not distinguishable from each other.</p

    Data from: Bayesian species delimitation combining multiple genes and traits in a unified framework

    No full text
    Delimitation of species based exclusively on genetic data has been advocated despite a critical knowledge gap: how might such approaches fail because they rely on genetic data alone, and would their accuracy be improved by using multiple data-types. We provide here the requisite framework for addressing these key questions. Because both phenotypic and molecular data can be analyzed in a common Bayesian framework with our program iBPP, we can compare the accuracy of delimited taxa based on genetic data alone versus when integrated with phenotypic data. We can also evaluate how the integration of phenotypic data might improve species delimitation when divergence occurs with gene flow and/or is selectively driven. These two realities of the speciation process are ignored by currently available genetic approaches. Our model accommodates phenotypic characters that exhibit different degrees of divergence, allowing for both neutral traits and traits under selection. We found a greater accuracy of estimated species boundaries with the integration of phenotypic and genetic data, with a strong beneficial influence of phenotypic data from traits under selection when the speciation process involves gene flow. Our results highlight the benefits of multiple data-types, but also draws into question the rationale of species delimitation based exclusively on genetic data

    Example of a 4-taxon semi-directed network (left), with known direction of both hybrid edges but unspecified position of the root.

    No full text
    <p>The root can be placed on the internal edges with length <i>t</i><sub>2</sub>, <i>t</i><sub>3</sub>, <i>t</i><sub>4</sub>, or on the external edges to C or D. The quartet CFs on this network are weighted averages of CFs under 4 trees with weights as shown (right).</p

    Example of rooted and semi-directed phylogenetic networks with <i>h</i> = 2 hybridization events and <i>n</i> = 7 sampled taxa.

    No full text
    <p>Inheritance probabilities <i>γ</i> represent the proportion of genes contributed by each parental population to a given hybrid node. Left: rooted network modelling several biological processes. Taxon F is a hybrid between two non-sampled taxa Y and Z with <i>γ</i><sub>2</sub> ≈ 0.50, and the lineage ancestral to taxa C and D has received genes introgressed from a non-sampled taxon X, for which <i>γ</i><sub>1</sub> ≈ 0.10. An alternative process at this event could be the horizontal transfer of only a handful of genes, corresponding to a very small fraction <i>γ</i><sub>1</sub> ≈ 0.001. Center: semi-directed network for the biological scenario just described. Although the root location is unknown, its position is constrained by the direction of hybrid edges (directed by arrows). For example, C, G or E cannot be outgroups. Right: rooted network obtained from the semi-directed network (center) by placing the root on the hybrid edge that leads to taxon F (labeled by 1 − <i>γ</i><sub>2</sub>).</p

    Accuracy of SNaQ in simulations using true gene trees or sequence alignments.

    No full text
    <p>Even when the semi-directed topology was not recovered, the unrooted topology was estimated correctly for most replicates using 30 loci or more and <i>h</i> ≀ 2.</p

    Networks with <i>k</i> nodes in a hybridization cycle: <i>k</i> = 2, 3, 4 and 5 from left to right.

    No full text
    <p>When <i>k</i> = 3, parameters are not identifiable. A good triangle corresponds to <i>n</i><sub>1</sub>, <i>n</i><sub>2</sub>, <i>n</i><sub>3</sub> ≄ 2, in which case setting <i>t</i><sub>12</sub> = 0 makes the other parameters identifiable. When <i>k</i> = 4, parameters are not all identifiable for the bad diamond I (<i>n</i><sub>0</sub> = <i>n</i><sub>2</sub> = <i>n</i><sub>3</sub> = 1 but <i>n</i><sub>1</sub> ≄ 2) and for the bad diamond II (<i>n</i><sub>0</sub> = <i>n</i><sub>1</sub> = <i>n</i><sub>2</sub> = 1 but <i>n</i><sub>3</sub> ≄ 2).</p
    corecore