97 research outputs found

    Accuracy bounds for ensembles under 0 - 1 loss.

    Get PDF
    This paper is an attempt to increase the understanding in the behavior of ensembles for discrete variables in a quantitative way. A set of tight upper and lower bounds for the accuracy of an ensemble is presented for wide classes of ensemble algorithms, including bagging and boosting. The ensemble accuracy is expressed in terms of the accuracies of the members of the ensemble. Since those bounds represent best and worst case behavior only, we study typical behavior as well, and discuss its properties. A parameterised bound is presented which describes ensemble bahavior as a mixture of dependent base classifier and independent base classifier areas. Some empirical results are presented to support our conclusions

    Efficient algorithms for conditional independence inference

    Get PDF
    The topic of the paper is computer testing of (probabilistic) conditional independence (CI) implications by an algebraic method of structural imsets. The basic idea is to transform (sets of) CI statements into certain integral vectors and to verify by a computer the corresponding algebraic relation between the vectors, called the independence implication. We interpret the previous methods for computer testing of this implication from the point of view of polyhedral geometry. However, the main contribution of the paper is a new method, based on linear programming (LP). The new method overcomes the limitation of former methods to the number of involved variables. We recall/describe the theoretical basis for all four methods involved in our computational experiments, whose aim was to compare the efficiency of the algorithms. The experiments show that the LP method is clearly the fastest one. As an example of possible application of such algorithms we show that testing inclusion of Bayesian network structures or whether a CI statement is encoded in an acyclic directed graph can be done by the algebraic method

    Phylogeography by diffusion on a sphere: whole world phylogeography

    Get PDF

    Bayesian belief networks : from construction to inference

    Get PDF
    In het dagelijks leven is het redeneren met onzekerheden gebruikelijker dan het redeneren zonder. Bayesiaanse belief netwerken bieden een wiskundig correct formalisme om onzekerheid te representeren en op efficiëte wijze mee te redeneren. Een Bayesiaanse belief netwerk bestaat uit twee delen. Ten eerste bestaat een belief netwerk uit een een gerichte graaf zonder lussen: de netwerkstructuur. Voor elke variabele waarmee we willen redeneren is er een knoop in de graaf. We zullen de termen knoop en variabele dan ook door elkaar gebruiken. Figuur 0.1 laat een eenvoudig belief netwerk zien voor een klein medisch domein met daarin de leeftijd van een patient (a), de behoefte aan een bril (g), of het zicht beter wordt als de patient knippert (v) en of de patient klachten heeft over zijn zicht (s). Als er een directe afhankelijkheid tussen twee knopen is, dan zijn deze knopen verbonden met een pijl. Intuitief geeft de richting van de pijl een causale invloed aan. Bijvoorbeeld in Figuur 0.1 geeft de pijl van a naar g weer dat de leeftijd een indicatie is dat de patient een bril nodig heeft

    StarBEAST2 Brings Faster Species Tree Inference and Accurate Estimates of Substitution Rates

    Get PDF
    Fully Bayesian multispecies coalescent (MSC) methods like *BEAST estimate species trees from multiple sequence alignments. Today thousands of genes can be sequenced for a given study, but using that many genes with *BEAST is intractably slow. An alternative is to use heuristic methods which compromise accuracy or completeness in return for speed. A common heuristic is concatenation, which assumes that the evolutionary history of each gene tree is identical to the species tree. This is an inconsistent estimator of species tree topology, a worse estimator of divergence times, and induces spurious substitution rate variation when incomplete lineage sorting is present. Another class of heuristics directly motivated by the MSC avoids many of the pitfalls of concatenation but cannot be used to estimate divergence times. To enable fuller use of available data and more accurate inference of species tree topologies, divergence times, and substitution rates, we have developed a new version of *BEAST called StarBEAST2. To improve convergence rates we add analytical integration of population sizes, novel MCMC operators and other optimizations. Computational performance improved by 13.5× and 13.8× respectively when analyzing two empirical data sets, and an average of 33.1× across 30 simulated data sets. To enable accurate estimates of per-species substitution rates, we introduce species tree relaxed clocks, and show that StarBEAST2 is a more powerful and robust estimator of rate variation than concatenation. StarBEAST2 is available through the BEAUTi package manager in BEAST 2.4 and above.This work was supported by a Rutherford Discovery Fellowship awarded to A.J.D. by the Royal Society of New Zealand. H.A.O. was supported by an Australian Laureate Fellowship awarded to Craig Moritz by the Australian Research Council (FL110100104)
    corecore