8,485 research outputs found
On the variational distance of two trees
A widely studied model for generating sequences is to ``evolve'' them on a
tree according to a symmetric Markov process. We prove that model trees tend to
be maximally ``far apart'' in terms of variational distance.Comment: Published at http://dx.doi.org/10.1214/105051606000000196 in the
Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute
of Mathematical Statistics (http://www.imstat.org
A new approach to nonrepetitive sequences
A sequence is nonrepetitive if it does not contain two adjacent identical
blocks. The remarkable construction of Thue asserts that 3 symbols are enough
to build an arbitrarily long nonrepetitive sequence. It is still not settled
whether the following extension holds: for every sequence of 3-element sets
there exists a nonrepetitive sequence with
. Applying the probabilistic method one can prove that this is true
for sufficiently large sets . We present an elementary proof that sets of
size 4 suffice (confirming the best known bound). The argument is a simple
counting with Catalan numbers involved. Our approach is inspired by a new
algorithmic proof of the Lov\'{a}sz Local Lemma due to Moser and Tardos and its
interpretations by Fortnow and Tao. The presented method has further
applications to nonrepetitive games and nonrepetitive colorings of graphs.Comment: 5 pages, no figures.arXiv admin note: substantial text overlap with
arXiv:1103.381
Circular Networks from Distorted Metrics
Trees have long been used as a graphical representation of species
relationships. However complex evolutionary events, such as genetic
reassortments or hybrid speciations which occur commonly in viruses, bacteria
and plants, do not fit into this elementary framework. Alternatively, various
network representations have been developed. Circular networks are a natural
generalization of leaf-labeled trees interpreted as split systems, that is,
collections of bipartitions over leaf labels corresponding to current species.
Although such networks do not explicitly model specific evolutionary events of
interest, their straightforward visualization and fast reconstruction have made
them a popular exploratory tool to detect network-like evolution in genetic
datasets.
Standard reconstruction methods for circular networks, such as Neighbor-Net,
rely on an associated metric on the species set. Such a metric is first
estimated from DNA sequences, which leads to a key difficulty: distantly
related sequences produce statistically unreliable estimates. This is
problematic for Neighbor-Net as it is based on the popular tree reconstruction
method Neighbor-Joining, whose sensitivity to distance estimation errors is
well established theoretically. In the tree case, more robust reconstruction
methods have been developed using the notion of a distorted metric, which
captures the dependence of the error in the distance through a radius of
accuracy. Here we design the first circular network reconstruction method based
on distorted metrics. Our method is computationally efficient. Moreover, the
analysis of its radius of accuracy highlights the important role played by the
maximum incompatibility, a measure of the extent to which the network differs
from a tree.Comment: Submitte
Alignment-free phylogenetic reconstruction: Sample complexity via a branching process analysis
We present an efficient phylogenetic reconstruction algorithm allowing
insertions and deletions which provably achieves a sequence-length requirement
(or sample complexity) growing polynomially in the number of taxa. Our
algorithm is distance-based, that is, it relies on pairwise sequence
comparisons. More importantly, our approach largely bypasses the difficult
problem of multiple sequence alignment.Comment: Published in at http://dx.doi.org/10.1214/12-AAP852 the Annals of
Applied Probability (http://www.imstat.org/aap/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …