4,974 research outputs found

    Generating Compact Tree Ensembles via Annealing

    Full text link
    Tree ensembles are flexible predictive models that can capture relevant variables and to some extent their interactions in a compact and interpretable manner. Most algorithms for obtaining tree ensembles are based on versions of boosting or Random Forest. Previous work showed that boosting algorithms exhibit a cyclic behavior of selecting the same tree again and again due to the way the loss is optimized. At the same time, Random Forest is not based on loss optimization and obtains a more complex and less interpretable model. In this paper we present a novel method for obtaining compact tree ensembles by growing a large pool of trees in parallel with many independent boosting threads and then selecting a small subset and updating their leaf weights by loss optimization. We allow for the trees in the initial pool to have different depths which further helps with generalization. Experiments on real datasets show that the obtained model has usually a smaller loss than boosting, which is also reflected in a lower misclassification error on the test set.Comment: Comparison with Random Forest included in the results sectio

    Exact reconciliation of undated trees

    Full text link
    Reconciliation methods aim at recovering macro evolutionary events and at localizing them in the species history, by observing discrepancies between gene family trees and species trees. In this article we introduce an Integer Linear Programming (ILP) approach for the NP-hard problem of computing a most parsimonious time-consistent reconciliation of a gene tree with a species tree when dating information on speciations is not available. The ILP formulation, which builds upon the DTL model, returns a most parsimonious reconciliation ranging over all possible datings of the nodes of the species tree. By studying its performance on plausible simulated data we conclude that the ILP approach is significantly faster than a brute force search through the space of all possible species tree datings. Although the ILP formulation is currently limited to small trees, we believe that it is an important proof-of-concept which opens the door to the possibility of developing an exact, parsimony based approach to dating species trees. The software (ILPEACE) is freely available for download

    Inducing Compact but Accurate Tree-Substitution Grammars

    Get PDF
    Tree substitution grammars (TSGs) are a compelling alternative to context-free grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and over-fitting. We present a theoretically principled model which solves these problems using a Bayesian non-parametric formulation. Our model learns compact and simple grammars, uncovering latent linguistic structures (e.g., verb subcategorisation), and in doing so far out-performs a standard PCFG.

    Extracting few representative reconciliations with Host-Switches (Extended Abstract)

    Get PDF
    Phylogenetic tree reconciliation is the approach commonly used to in- vestigate the coevolution of sets of organisms such as hosts and symbionts. Given a phylogenetic tree for each such set, respectively denoted by H and S, together with a mapping φ of the leaves of S to the leaves of H, a reconciliation is a mapping ρ of the internal vertices of S to the vertices of H which extends φ with some constraints. Given a cost for each reconciliation, a huge number of most parsimonious ones are possible, even exponential in the dimension of the trees. Without further information, any biological interpretation of the underlying coevolution would require that all optimal solutions are enumerated and examined. The latter is however impossible without pro- viding some sort of high level view of the situation. One approach would be to extract a small number of representatives, based on some notion of similarity or of equivalence between the reconciliations. In this paper, we define two equivalence relations that allow one to identify many reconciliations with a single one, thereby reducing their number. Extensive experiments indicate that the number of output solutions greatly decreases in general. By how much clearly depends on the constraints that are given as input

    PHYLOGENETIC RELATIONSHIPS AMONG WEST INDIAN XENODONTINE SNAKES (SERPENTES; COLUBRIDAE) WITH COMMENTS ON THE PHYLOGENY OF SOME MAINLAND XENODONTINES

    Get PDF
    The evolutionary relationships of the West Indian (W. I.) xenodontine snake assemblage has been considered as either monophyletic or paraphyletic. Allozyme data from protein electrophoresis were used to estimate the phylogeny of the W. I. xenodontine snakes. Forty-two species from 25 genera (mainland and W. I. taxa) were examined. The phylogenetic relationships were estimated using parsimony analyses with successive approximation weighting on the data coded two ways: (1) the allele as the character and (2) the locus as the character. The most parsimonious trees from both coding methods indicated a non-monophyletic W. I. xenodontine assemblage. Three W.I. groups were recovered in both coding methods: (1) Jamaican Arrhyton and Darlingtonia, (2) Uromacer and the Cuban Arrhyton, and (3) Alsophis, Ialtris, and the South American Alsophis elegans. The relationships of Hypsirhynchus, Antillophis and Arrhyton exiguum were unstable. Nomenclatural changes are recommended for Darlingtonia, Arrhyton, Ialtris and Alsophis

    Molecular and morphological phylogenetics of the digitate-tubered clade within subtribe Orchidinae s.s. (Orchidaceae: Orchideae)

    Get PDF
    The digitate-tubered clade (Dactylorhiza s.l. plus Gymnadenia s.l.) within subtribe Orchidinae is an important element of the North-temperate orchid flora and has become a model system for studying the genetic and epigenetic consequences of organism-wide ploidy change. Here, we integrate morphological phylogenetics with Sanger sequencing of nrITS and the plastid region trnL-F in order to explore phylogenetic relationships and phenotypic character evolution within the clade. The resulting morphological phylogenies are strongly incongruent with the molecular phylogenies, instead reconstructing through parsimony the genus-level boundaries recognised by traditional 20th Century taxonomy. They raise fresh doubts concerning whether Pseudorchis is sister to Platanthera or to Dactylorhiza plus Gymnadenia. Constraining the morphological matrix to the topology derived from ITS sequences increased tree length by 20%, adding considerably to the already exceptional level of phenotypic homoplasy. Both molecular and morphological trees agree that D. viridis and D. iberica are the earliest- diverging species within Dactylorhiza (emphasising the redundancy of the former genus Coeloglossum). Morphology and ITS both suggest that the former genus Nigritella is nested within (and thus part of) Gymnadenia, the Pyrenean endemic 'N.' gabasiana apparently forming a molecular bridge between the two radically contrasting core phenotypes. Comparatively short subtending molecular branches plus widespread (though sporadic) hybridisation indicate that Dactylorhiza and Gymnadenia approximate the minimum level of molecular divergence acceptable in sister genera. They share similar tuber morphologies and base chromosome numbers, and both genera are unusually prone to polyploid speciation. Another prominent feature of multiple speciation events within Gymnadenia is floral paedomorphosis. The 'traditional' morphological and candidate-gene approaches to phylogeny reconstruction are critically appraised.Peer reviewedFinal Published versio
    corecore