2 research outputs found
Best-scored Random Forest Density Estimation
This paper presents a brand new nonparametric density estimation strategy
named the best-scored random forest density estimation whose effectiveness is
supported by both solid theoretical analysis and significant experimental
performance. The terminology best-scored stands for selecting one density tree
with the best estimation performance out of a certain number of purely random
density tree candidates and we then name the selected one the best-scored
random density tree. In this manner, the ensemble of these selected trees that
is the best-scored random density forest can achieve even better estimation
results than simply integrating trees without selection. From the theoretical
perspective, by decomposing the error term into two, we are able to carry out
the following analysis: First of all, we establish the consistency of the
best-scored random density trees under -norm. Secondly, we provide the
convergence rates of them under -norm concerning with three different tail
assumptions, respectively. Thirdly, the convergence rates under
-norm is presented. Last but not least, we also achieve the above
convergence rates analysis for the best-scored random density forest. When
conducting comparative experiments with other state-of-the-art density
estimation approaches on both synthetic and real data sets, it turns out that
our algorithm has not only significant advantages in terms of estimation
accuracy over other methods, but also stronger resistance to the curse of
dimensionality
Density-based Clustering with Best-scored Random Forest
Single-level density-based approach has long been widely acknowledged to be a
conceptually and mathematically convincing clustering method. In this paper, we
propose an algorithm called "best-scored clustering forest" that can obtain the
optimal level and determine corresponding clusters. The terminology
"best-scored" means to select one random tree with the best empirical
performance out of a certain number of purely random tree candidates. From the
theoretical perspective, we first show that consistency of our proposed
algorithm can be guaranteed. Moreover, under certain mild restrictions on the
underlying density functions and target clusters, even fast convergence rates
can be achieved. Last but not least, comparisons with other state-of-the-art
clustering methods in the numerical experiments demonstrate accuracy of our
algorithm on both synthetic data and several benchmark real data sets