58,536 research outputs found
On Two Measures of Distance Between Fully-Labelled Trees
The last decade brought a significant increase in the amount of data and a variety of new inference methods for reconstructing the detailed evolutionary history of various cancers. This brings the need of designing efficient procedures for comparing rooted trees representing the evolution of mutations in tumor phylogenies. Bernardini et al. [CPM 2019] recently introduced a notion of the rearrangement distance for fully-labelled trees motivated by this necessity. This notion originates from two operations: one that permutes the labels of the nodes, the other that affects the topology of the tree. Each operation alone defines a distance that can be computed in polynomial time, while the actual rearrangement distance, that combines the two, was proven to be NP-hard.
We answer two open question left unanswered by the previous work. First, what is the complexity of computing the permutation distance? Second, is there a constant-factor approximation algorithm for estimating the rearrangement distance between two arbitrary trees? We answer the first one by showing, via a two-way reduction, that calculating the permutation distance between two trees on n nodes is equivalent, up to polylogarithmic factors, to finding the largest cardinality matching in a sparse bipartite graph. In particular, by plugging in the algorithm of Liu and Sidford [ArXiv 2020], we obtain an ??(n^{4/3+o(1}) time algorithm for computing the permutation distance between two trees on n nodes. Then we answer the second question positively, and design a linear-time constant-factor approximation algorithm that does not need any assumption on the trees
Ordered increasing k-trees: Introduction and analysis of a preferential attachment network model
We introduce a random graph model based on k-trees, which can be generated by
applying a probabilistic preferential attachment rule, but which also has a
simple combinatorial description. We carry out a precise distributional
analysis of important parameters for the network model such as the degree, the
local clustering coefficient and the number of descendants of the nodes and
root-to-node distances. We do not only obtain results for random nodes, but in
particular we also get a precise description of the behaviour of parameters for
the j-th inserted node in a random k-tree of size n, where j = j(n) might grow
with n. The approach presented is not restricted to this specific k-tree model,
but can also be applied to other evolving k-tree models.Comment: 12 pages, 2 figure
On the accuracy of language trees
Historical linguistics aims at inferring the most likely language
phylogenetic tree starting from information concerning the evolutionary
relatedness of languages. The available information are typically lists of
homologous (lexical, phonological, syntactic) features or characters for many
different languages.
From this perspective the reconstruction of language trees is an example of
inverse problems: starting from present, incomplete and often noisy,
information, one aims at inferring the most likely past evolutionary history. A
fundamental issue in inverse problems is the evaluation of the inference made.
A standard way of dealing with this question is to generate data with
artificial models in order to have full access to the evolutionary process one
is going to infer. This procedure presents an intrinsic limitation: when
dealing with real data sets, one typically does not know which model of
evolution is the most suitable for them. A possible way out is to compare
algorithmic inference with expert classifications. This is the point of view we
take here by conducting a thorough survey of the accuracy of reconstruction
methods as compared with the Ethnologue expert classifications. We focus in
particular on state-of-the-art distance-based methods for phylogeny
reconstruction using worldwide linguistic databases.
In order to assess the accuracy of the inferred trees we introduce and
characterize two generalizations of standard definitions of distances between
trees. Based on these scores we quantify the relative performances of the
distance-based algorithms considered. Further we quantify how the completeness
and the coverage of the available databases affect the accuracy of the
reconstruction. Finally we draw some conclusions about where the accuracy of
the reconstructions in historical linguistics stands and about the leading
directions to improve it.Comment: 36 pages, 14 figure
Regenerative tree growth: structural results and convergence
We introduce regenerative tree growth processes as consistent families of
random trees with n labelled leaves, n>=1, with a regenerative property at
branch points. This framework includes growth processes for exchangeably
labelled Markov branching trees, as well as non-exchangeable models such as the
alpha-theta model, the alpha-gamma model and all restricted exchangeable models
previously studied. Our main structural result is a representation of the
growth rule by a sigma-finite dislocation measure kappa on the set of
partitions of the natural numbers extending Bertoin's notion of exchangeable
dislocation measures from the setting of homogeneous fragmentations. We use
this representation to establish necessary and sufficient conditions on the
growth rule under which we can apply results by Haas and Miermont for
unlabelled and not necessarily consistent trees to establish self-similar
random trees and residual mass processes as scaling limits. While previous
studies exploited some form of exchangeability, our scaling limit results here
only require a regularity condition on the convergence of asymptotic
frequencies under kappa, in addition to a regular variation condition.Comment: 23 pages, new title, restructured, presentation improve
- …