21,062 research outputs found
Phase transition in the sample complexity of likelihood-based phylogeny inference
Reconstructing evolutionary trees from molecular sequence data is a
fundamental problem in computational biology. Stochastic models of sequence
evolution are closely related to spin systems that have been extensively
studied in statistical physics and that connection has led to important
insights on the theoretical properties of phylogenetic reconstruction
algorithms as well as the development of new inference methods. Here, we study
maximum likelihood, a classical statistical technique which is perhaps the most
widely used in phylogenetic practice because of its superior empirical
accuracy.
At the theoretical level, except for its consistency, that is, the guarantee
of eventual correct reconstruction as the size of the input data grows, much
remains to be understood about the statistical properties of maximum likelihood
in this context. In particular, the best bounds on the sample complexity or
sequence-length requirement of maximum likelihood, that is, the amount of data
required for correct reconstruction, are exponential in the number, , of
tips---far from known lower bounds based on information-theoretic arguments.
Here we close the gap by proving a new upper bound on the sequence-length
requirement of maximum likelihood that matches up to constants the known lower
bound for some standard models of evolution.
More specifically, for the -state symmetric model of sequence evolution on
a binary phylogeny with bounded edge lengths, we show that the sequence-length
requirement behaves logarithmically in when the expected amount of mutation
per edge is below what is known as the Kesten-Stigum threshold. In general, the
sequence-length requirement is polynomial in . Our results imply moreover
that the maximum likelihood estimator can be computed efficiently on randomly
generated data provided sequences are as above.Comment: To appear in Probability Theory and Related Field
Tur\`an numbers of Multiple Paths and Equibipartite Trees
The Tur\'an number of a graph H, ex(n;H), is the maximum number of edges in
any graph on n vertices which does not contain H as a subgraph. Let P_l denote
a path on l vertices, and kP_l denote k vertex-disjoint copies of P_l. We
determine ex(n, kP_3) for n appropriately large, answering in the positive a
conjecture of Gorgol. Further, we determine ex (n, kP_l) for arbitrary l, and n
appropriately large relative to k and l. We provide some background on the
famous Erd\H{o}s-S\'os conjecture, and conditional on its truth we determine
ex(n;H) when H is an equibipartite forest, for appropriately large n.Comment: 17 pages, 13 figures; Updated to incorporate referee's suggestions;
minor structural change
Cohen-Macaulay Properties of Square-Free Monomial Ideals
In this paper we study simplicial complexes as higher dimensional graphs in
order to produce algebraic statements about their facet ideals. We introduce a
large class of square-free monomial ideals with Cohen-Macaulay quotients, and a
criterion for the Cohen-Macaulayness of facet ideals of simplicial trees. Along
the way, we generalize several concepts from graph theory to simplicial
complexes.Comment: 28 pages, 17 figure
Odd-Cycle-Free Facet Complexes and the K\"onig property
We use the definition of a simplicial cycle to define an odd-cycle-free facet
complex (hypergraph). These are facet complexes that do not contain any cycles
of odd length. We show that besides one class of such facet complexes, all of
them satisfy the
K\"onig property. This new family of complexes includes the family of
balanced hypergraphs, which are known to satisfy the K\"onig property. These
facet complexes are, however, not Mengerian; we give an example to demonstrate
this fact.Comment: 12 pages, 11 figure
- …