42,862 research outputs found
Minimal spanning forests
Minimal spanning forests on infinite graphs are weak limits of minimal
spanning trees from finite subgraphs. These limits can be taken with free or
wired boundary conditions and are denoted FMSF (free minimal spanning forest)
and WMSF (wired minimal spanning forest), respectively. The WMSF is also the
union of the trees that arise from invasion percolation started at all
vertices. We show that on any Cayley graph where critical percolation has no
infinite clusters, all the component trees in the WMSF have one end a.s. In
this was proved by Alexander [Ann. Probab. 23 (1995) 87--104],
but a different method is needed for the nonamenable case. We also prove that
the WMSF components are ``thin'' in a different sense, namely, on any graph,
each component tree in the WMSF has a.s., where
denotes the critical probability for having an infinite
cluster in Bernoulli percolation. On the other hand, the FMSF is shown to be
``thick'': on any connected graph, the union of the FMSF and independent
Bernoulli percolation (with arbitrarily small parameter) is a.s. connected. In
conjunction with a recent result of Gaboriau, this implies that in any Cayley
graph, the expected degree of the FMSF is at least the expected degree of the
FSF (the weak limit of uniform spanning trees). We also show that the number of
infinite clusters for Bernoulli() percolation is at most the
number of components of the FMSF, where denotes the critical
probability for having a unique infinite cluster. Finally, an example is given
to show that the minimal spanning tree measure does not have negative
associations.Comment: Published at http://dx.doi.org/10.1214/009117906000000269 in the
Annals of Probability (http://www.imstat.org/aop/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Overview of Random Forest Methodology and Practical Guidance with Emphasis on Computational Biology and Bioinformatics
The Random Forest (RF) algorithm by Leo Breiman has become a
standard data analysis tool in bioinformatics. It has shown excellent performance in settings where the number of variables is much larger than the number of observations, can cope with complex interaction structures as well as highly correlated variables and returns measures of variable importance. This paper synthesizes ten years of RF development with emphasis on applications to bioinformatics and computational biology. Special attention is given to practical aspects such as the selection of parameters, available RF implementations, and important pitfalls and biases of RF and its variable importance measures (VIMs). The paper surveys recent developments of the methodology relevant to bioinformatics as well as some representative examples of RF applications in this context and possible directions for future research
- ā¦