42,862 research outputs found

    Minimal spanning forests

    Full text link
    Minimal spanning forests on infinite graphs are weak limits of minimal spanning trees from finite subgraphs. These limits can be taken with free or wired boundary conditions and are denoted FMSF (free minimal spanning forest) and WMSF (wired minimal spanning forest), respectively. The WMSF is also the union of the trees that arise from invasion percolation started at all vertices. We show that on any Cayley graph where critical percolation has no infinite clusters, all the component trees in the WMSF have one end a.s. In Zd\mathbb{Z}^d this was proved by Alexander [Ann. Probab. 23 (1995) 87--104], but a different method is needed for the nonamenable case. We also prove that the WMSF components are ``thin'' in a different sense, namely, on any graph, each component tree in the WMSF has pc=1p_{\mathrm{c}}=1 a.s., where pcp_{\mathrm{c}} denotes the critical probability for having an infinite cluster in Bernoulli percolation. On the other hand, the FMSF is shown to be ``thick'': on any connected graph, the union of the FMSF and independent Bernoulli percolation (with arbitrarily small parameter) is a.s. connected. In conjunction with a recent result of Gaboriau, this implies that in any Cayley graph, the expected degree of the FMSF is at least the expected degree of the FSF (the weak limit of uniform spanning trees). We also show that the number of infinite clusters for Bernoulli(pup_{\mathrm{u}}) percolation is at most the number of components of the FMSF, where pup_{\mathrm{u}} denotes the critical probability for having a unique infinite cluster. Finally, an example is given to show that the minimal spanning tree measure does not have negative associations.Comment: Published at http://dx.doi.org/10.1214/009117906000000269 in the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics

    Get PDF
    The Random Forest (RF) algorithm by Leo Breiman has become a standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research
    • ā€¦
    corecore