Search CORE

2 research outputs found

Best-scored Random Forest Density Estimation

Author: Hang Hanyuan
Wen Hongwei
Publication venue
Publication date: 09/05/2019
Field of study

This paper presents a brand new nonparametric density estimation strategy named the best-scored random forest density estimation whose effectiveness is supported by both solid theoretical analysis and significant experimental performance. The terminology best-scored stands for selecting one density tree with the best estimation performance out of a certain number of purely random density tree candidates and we then name the selected one the best-scored random density tree. In this manner, the ensemble of these selected trees that is the best-scored random density forest can achieve even better estimation results than simply integrating trees without selection. From the theoretical perspective, by decomposing the error term into two, we are able to carry out the following analysis: First of all, we establish the consistency of the best-scored random density trees under

L_1

-norm. Secondly, we provide the convergence rates of them under

L_1

-norm concerning with three different tail assumptions, respectively. Thirdly, the convergence rates under

L_{\infty}

-norm is presented. Last but not least, we also achieve the above convergence rates analysis for the best-scored random density forest. When conducting comparative experiments with other state-of-the-art density estimation approaches on both synthetic and real data sets, it turns out that our algorithm has not only significant advantages in terms of estimation accuracy over other methods, but also stronger resistance to the curse of dimensionality

arXiv.org e-Print Archive

Density-based Clustering with Best-scored Random Forest

Author: Cai Yuchao
Hang Hanyuan
Yang Hanfang
Publication venue
Publication date: 24/06/2019
Field of study

Single-level density-based approach has long been widely acknowledged to be a conceptually and mathematically convincing clustering method. In this paper, we propose an algorithm called "best-scored clustering forest" that can obtain the optimal level and determine corresponding clusters. The terminology "best-scored" means to select one random tree with the best empirical performance out of a certain number of purely random tree candidates. From the theoretical perspective, we first show that consistency of our proposed algorithm can be guaranteed. Moreover, under certain mild restrictions on the underlying density functions and target clusters, even fast convergence rates can be achieved. Last but not least, comparisons with other state-of-the-art clustering methods in the numerical experiments demonstrate accuracy of our algorithm on both synthetic data and several benchmark real data sets

arXiv.org e-Print Archive