4,974 research outputs found
Generating Compact Tree Ensembles via Annealing
Tree ensembles are flexible predictive models that can capture relevant
variables and to some extent their interactions in a compact and interpretable
manner. Most algorithms for obtaining tree ensembles are based on versions of
boosting or Random Forest. Previous work showed that boosting algorithms
exhibit a cyclic behavior of selecting the same tree again and again due to the
way the loss is optimized. At the same time, Random Forest is not based on loss
optimization and obtains a more complex and less interpretable model. In this
paper we present a novel method for obtaining compact tree ensembles by growing
a large pool of trees in parallel with many independent boosting threads and
then selecting a small subset and updating their leaf weights by loss
optimization. We allow for the trees in the initial pool to have different
depths which further helps with generalization. Experiments on real datasets
show that the obtained model has usually a smaller loss than boosting, which is
also reflected in a lower misclassification error on the test set.Comment: Comparison with Random Forest included in the results sectio
Exact reconciliation of undated trees
Reconciliation methods aim at recovering macro evolutionary events and at
localizing them in the species history, by observing discrepancies between gene
family trees and species trees. In this article we introduce an Integer Linear
Programming (ILP) approach for the NP-hard problem of computing a most
parsimonious time-consistent reconciliation of a gene tree with a species tree
when dating information on speciations is not available. The ILP formulation,
which builds upon the DTL model, returns a most parsimonious reconciliation
ranging over all possible datings of the nodes of the species tree. By studying
its performance on plausible simulated data we conclude that the ILP approach
is significantly faster than a brute force search through the space of all
possible species tree datings. Although the ILP formulation is currently
limited to small trees, we believe that it is an important proof-of-concept
which opens the door to the possibility of developing an exact, parsimony based
approach to dating species trees. The software (ILPEACE) is freely available
for download
Inducing Compact but Accurate Tree-Substitution Grammars
Tree substitution grammars (TSGs) are a compelling alternative to context-free grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and over-fitting. We present a theoretically principled model which solves these problems using a Bayesian non-parametric formulation. Our model learns compact and simple grammars, uncovering latent linguistic structures (e.g., verb subcategorisation), and in doing so far out-performs a standard PCFG.
Extracting few representative reconciliations with Host-Switches (Extended Abstract)
Phylogenetic tree reconciliation is the approach commonly used to in- vestigate the coevolution of sets of organisms such as hosts and symbionts. Given a phylogenetic tree for each such set, respectively denoted by H and S, together with a mapping φ of the leaves of S to the leaves of H, a reconciliation is a mapping ρ of the internal vertices of S to the vertices of H which extends φ with some constraints.
Given a cost for each reconciliation, a huge number of most parsimonious ones are possible, even exponential in the dimension of the trees. Without further information, any biological interpretation of the underlying coevolution would require that all optimal solutions are enumerated and examined. The latter is however impossible without pro- viding some sort of high level view of the situation. One approach would be to extract a small number of representatives, based on some notion of similarity or of equivalence between the reconciliations.
In this paper, we define two equivalence relations that allow one to identify many reconciliations with a single one, thereby reducing their number. Extensive experiments indicate that the number of output solutions greatly decreases in general. By how much clearly depends on the constraints that are given as input
PHYLOGENETIC RELATIONSHIPS AMONG WEST INDIAN XENODONTINE SNAKES (SERPENTES; COLUBRIDAE) WITH COMMENTS ON THE PHYLOGENY OF SOME MAINLAND XENODONTINES
The evolutionary relationships of the West Indian (W. I.) xenodontine snake assemblage has been considered as either monophyletic or paraphyletic. Allozyme data from protein electrophoresis were used to estimate the phylogeny of the W. I. xenodontine snakes. Forty-two species from 25 genera (mainland and W. I. taxa) were examined. The phylogenetic relationships were estimated using parsimony analyses with successive approximation weighting on the data coded two ways: (1) the allele as the character and (2) the locus as the character. The most parsimonious trees from both coding methods indicated a non-monophyletic W. I. xenodontine assemblage. Three W.I. groups were recovered in both coding methods: (1) Jamaican Arrhyton and Darlingtonia, (2) Uromacer and the Cuban Arrhyton, and (3) Alsophis, Ialtris, and the South American Alsophis elegans. The relationships of Hypsirhynchus, Antillophis and Arrhyton exiguum were unstable. Nomenclatural changes are recommended for Darlingtonia, Arrhyton, Ialtris and Alsophis
Molecular and morphological phylogenetics of the digitate-tubered clade within subtribe Orchidinae s.s. (Orchidaceae: Orchideae)
The digitate-tubered clade (Dactylorhiza s.l. plus Gymnadenia s.l.) within subtribe Orchidinae is an important element of the North-temperate orchid flora and has become a model system for studying the genetic and epigenetic consequences of organism-wide ploidy change. Here, we integrate morphological phylogenetics with Sanger sequencing of nrITS and the plastid region trnL-F in order to explore phylogenetic relationships and phenotypic character evolution within the clade. The resulting morphological phylogenies are strongly incongruent with the molecular phylogenies, instead reconstructing through parsimony the genus-level boundaries recognised by traditional 20th Century taxonomy. They raise fresh doubts concerning whether Pseudorchis is sister to Platanthera or to Dactylorhiza plus Gymnadenia. Constraining the morphological matrix to the topology derived from ITS sequences increased tree length by 20%, adding considerably to the already exceptional level of phenotypic homoplasy. Both molecular and morphological trees agree that D. viridis and D. iberica are the earliest- diverging species within Dactylorhiza (emphasising the redundancy of the former genus Coeloglossum). Morphology and ITS both suggest that the former genus Nigritella is nested within (and thus part of) Gymnadenia, the Pyrenean endemic 'N.' gabasiana apparently forming a molecular bridge between the two radically contrasting core phenotypes. Comparatively short subtending molecular branches plus widespread (though sporadic) hybridisation indicate that Dactylorhiza and Gymnadenia approximate the minimum level of molecular divergence acceptable in sister genera. They share similar tuber morphologies and base chromosome numbers, and both genera are unusually prone to polyploid speciation. Another prominent feature of multiple speciation events within Gymnadenia is floral paedomorphosis. The 'traditional' morphological and candidate-gene approaches to phylogeny reconstruction are critically appraised.Peer reviewedFinal Published versio
- …