24 research outputs found
Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting
Phylogenetic networks are necessary to represent the tree of life expanded by
edges to represent events such as horizontal gene transfers, hybridizations or
gene flow. Not all species follow the paradigm of vertical inheritance of their
genetic material. While a great deal of research has flourished into the
inference of phylogenetic trees, statistical methods to infer phylogenetic
networks are still limited and under development. The main disadvantage of
existing methods is a lack of scalability. Here, we present a statistical
method to infer phylogenetic networks from multi-locus genetic data in a
pseudolikelihood framework. Our model accounts for incomplete lineage sorting
through the coalescent model, and for horizontal inheritance of genes through
reticulation nodes in the network. Computation of the pseudolikelihood is fast
and simple, and it avoids the burdensome calculation of the full likelihood
which can be intractable with many species. Moreover, estimation at the
quartet-level has the added computational benefit that it is easily
parallelizable. Simulation studies comparing our method to a full likelihood
approach show that our pseudolikelihood approach is much faster without
compromising accuracy. We applied our method to reconstruct the evolutionary
relationships among swordtails and platyfishes (: Poeciliidae),
which is characterized by widespread hybridizations
Bayesian species delimitation combining multiple genes and traits in a unified framework
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/110547/1/evo12582.pd
Sparse Gaussian chain graphs with the spike-and-slab LASSO: Algorithms and asymptotics
The Gaussian chain graph model simultaneously parametrizes (i) the direct
effects of predictors on correlated outcomes and (ii) the residual
partial covariance between pair of outcomes. We introduce a new method for
fitting sparse Gaussian chain graph models with spike-and-slab LASSO (SSL)
priors. We develop an Expectation-Conditional Maximization algorithm to obtain
sparse estimates of the matrix of direct effects and the residual precision matrix. Our algorithm iteratively solves a sequence of
penalized maximum likelihood problems with self-adaptive penalties that
gradually filter out negligible regression coefficients and partial
covariances. Because it adaptively penalizes model parameters, our method is
seen to outperform fixed-penalty competitors on simulated data. We establish
the posterior concentration rate for our model, buttressing our method's
excellent empirical performance with strong theoretical guarantees. We use our
method to reanalyze a dataset from a study of the effects of diet and residence
type on the composition of the gut microbiome of elderly adults
PhyloNetworks: A package for phylogenetic networks
International audiencePhyloNetworks is a Julia package for the inference, manipulation, visualization, and use of phylogenetic networks in an interactive environment. Inference of phylogenetic networks is done with maximum pseudolikelihood from gene trees or multi-locus sequences (SNaQ), with possible bootstrap analysis. PhyloNetworks is the first software providing tools to summarize a set of networks (from a bootstrap or posterior sample) with measures of tree edge support, hybrid edge support, and hybrid node support. Networks can be used for phylogenetic comparative analysis of continuous traits, to estimate ancestral states or do a phylogenetic regression. The software is available in open source and with documentation at https://github.com/crsl4/PhyloNetworks.jl
Networks with <i>k</i> = 4 nodes in the reticulation cycle and identical unrooted topologies.
<p>They differ in their hybrid position (left: good diamond, right: bad diamond I). If <i>D</i><sub>2</sub> is not sampled (<i>n</i> = 4), only for <i>i</i> = 1, 2 are identifiable and the 2 networks are not distinguishable from each other.</p
Example of a 4-taxon semi-directed network (left), with known direction of both hybrid edges but unspecified position of the root.
<p>The root can be placed on the internal edges with length <i>t</i><sub>2</sub>, <i>t</i><sub>3</sub>, <i>t</i><sub>4</sub>, or on the external edges to C or D. The quartet CFs on this network are weighted averages of CFs under 4 trees with weights as shown (right).</p
Example of rooted and semi-directed phylogenetic networks with <i>h</i> = 2 hybridization events and <i>n</i> = 7 sampled taxa.
<p>Inheritance probabilities <i>Îł</i> represent the proportion of genes contributed by each parental population to a given hybrid node. Left: rooted network modelling several biological processes. Taxon F is a hybrid between two non-sampled taxa Y and Z with <i>Îł</i><sub>2</sub> â 0.50, and the lineage ancestral to taxa C and D has received genes introgressed from a non-sampled taxon X, for which <i>Îł</i><sub>1</sub> â 0.10. An alternative process at this event could be the horizontal transfer of only a handful of genes, corresponding to a very small fraction <i>Îł</i><sub>1</sub> â 0.001. Center: semi-directed network for the biological scenario just described. Although the root location is unknown, its position is constrained by the direction of hybrid edges (directed by arrows). For example, C, G or E cannot be outgroups. Right: rooted network obtained from the semi-directed network (center) by placing the root on the hybrid edge that leads to taxon F (labeled by 1 â <i>Îł</i><sub>2</sub>).</p
Data from: Bayesian species delimitation combining multiple genes and traits in a unified framework
Delimitation of species based exclusively on genetic data has been advocated despite a critical knowledge gap: how might such approaches fail because they rely on genetic data alone, and would their accuracy be improved by using multiple data-types. We provide here the requisite framework for addressing these key questions. Because both phenotypic and molecular data can be analyzed in a common Bayesian framework with our program iBPP, we can compare the accuracy of delimited taxa based on genetic data alone versus when integrated with phenotypic data. We can also evaluate how the integration of phenotypic data might improve species delimitation when divergence occurs with gene flow and/or is selectively driven. These two realities of the speciation process are ignored by currently available genetic approaches. Our model accommodates phenotypic characters that exhibit different degrees of divergence, allowing for both neutral traits and traits under selection. We found a greater accuracy of estimated species boundaries with the integration of phenotypic and genetic data, with a strong beneficial influence of phenotypic data from traits under selection when the speciation process involves gene flow. Our results highlight the benefits of multiple data-types, but also draws into question the rationale of species delimitation based exclusively on genetic data
perl script to simulate data and analyze with iBPP
This perl script can be used to reproduce all simulations in the article "Bayesian species delimitation combining multiple genes and traits in a unified framework"
Performance (average computing time per replicate) of SNaQ and PhyloNet.
<p>in simulations using true gene trees on networks with <i>n</i> = 6, 10 or 15 taxa and <i>h</i> = 1, 2 or 3. Each replicate consisted of 10 independent runs with full optimization of branch lengths and inheritance probabilities for each run. Pie charts display accuracy (black: probability of recovering the true network). With <i>n</i> = 10 and 300 or more loci, or with <i>n</i> = 15, PhyloNet was too slow to run.</p