470,624 research outputs found

    Learning the attribute selection measures for decision tree

    Full text link
    Decision tree has most widely used for classification. However the main influence of decision tree classification performance is attribute selection problem. The paper considers a number of different attribute selection measures and experimentally examines their behavior in classification. The results show that the choice of measure doesn't affect the classification accuracy, but the size of the tree is influenced significantly. The main effect of the new attribute selection measures which base on normal gain and distance is that they generate smaller trees than traditional attribute selection measures. © 2013 SPIE

    Alternating model trees

    Get PDF
    Model tree induction is a popular method for tackling regression problems requiring interpretable models. Model trees are decision trees with multiple linear regression models at the leaf nodes. In this paper, we propose a method for growing alternating model trees, a form of option tree for regression problems. The motivation is that alternating decision trees achieve high accuracy in classification problems because they represent an ensemble classifier as a single tree structure. As in alternating decision trees for classifi-cation, our alternating model trees for regression contain splitter and prediction nodes, but we use simple linear regression functions as opposed to constant predictors at the prediction nodes. Moreover, additive regression using forward stagewise modeling is applied to grow the tree rather than a boosting algorithm. The size of the tree is determined using cross-validation. Our empirical results show that alternating model trees achieve significantly lower squared error than standard model trees on several regression datasets

    Stock Picking via Nonsymmetrically Pruned Binary Decision Trees

    Get PDF
    Stock picking is the field of financial analysis that is of particular interest for many professional investors and researchers. In this study stock picking is implemented via binary classification trees. Optimal tree size is believed to be the crucial factor in forecasting performance of the trees. While there exists a standard method of tree pruning, which is based on the cost-complexity tradeoff and used in the majority of studies employing binary decision trees, this paper introduces a novel methodology of nonsymmetric tree pruning called Best Node Strategy (BNS). An important property of BNS is proven that provides an easy way to implement the search of the optimal tree size in practice. BNS is compared with the traditional pruning approach by composing two recursive portfolios out of XETRA DAX stocks. Performance forecasts for each of the stocks are provided by constructed decision trees. It is shown that BNS clearly outperforms the traditional approach according to the backtesting results and the Diebold-Mariano test for statistical significance of the performance difference between two forecasting methods.decision tree, stock picking, pruning, earnings forecasting, data mining

    Vacancy localization in the square dimer model

    Get PDF
    We study the classical dimer model on a square lattice with a single vacancy by developing a graph-theoretic classification of the set of all configurations which extends the spanning tree formulation of close-packed dimers. With this formalism, we can address the question of the possible motion of the vacancy induced by dimer slidings. We find a probability 57/4-10Sqrt[2] for the vacancy to be strictly jammed in an infinite system. More generally, the size distribution of the domain accessible to the vacancy is characterized by a power law decay with exponent 9/8. On a finite system, the probability that a vacancy in the bulk can reach the boundary falls off as a power law of the system size with exponent 1/4. The resultant weak localization of vacancies still allows for unbounded diffusion, characterized by a diffusion exponent that we relate to that of diffusion on spanning trees. We also implement numerical simulations of the model with both free and periodic boundary conditions.Comment: 35 pages, 24 figures. Improved version with one added figure (figure 9), a shift s->s+1 in the definition of the tree size, and minor correction

    Klasifikasi Citra Dengan Pohon Keputusan

    Full text link
    Image classification can be done by using attribute of text that come along with the image, such as file name, size, or creator. Image classification also can be done base on visual content of the image. In this research, we implement a image classification model base on image visual content. The image classification is based on decision tree method that adapt C4.5 algorithm. The decision variable used in the decision tree generation process is image visual features, i.e. color moment order-1, color moment order-2, color moment order-3, entropy, energy, contrast, and homogeneity. The result of this research is an application that can classified image base on the knowledge of the previous classification cases

    Key structural features of Boreal forests may be detected directly using L-moments from airborne lidar data

    Get PDF
    This article introduces a novel methodology for automated classification of forest areas from airborne laser scanning (ALS) datasets based on two direct and simple rules: L-coefficient of variation Lcv=0.5 and L-skewness Lskew=0, thresholds based on descriptors of the mathematical properties of ALS height distributions. We observed that, while Lcv>0.5 may represent forests with large tree size inequality, Lskew>0 can be an indicator for areas lacking a closed dominant canopy. Lcv=0.5 discriminated forests with trees of approximately equal sizes (even tree size classes) from those with large tree size inequality (uneven tree size classes) with kappa Îș = 0.48 and overall accuracy OA = 92.4%, while Lskew=0 segregated oligophotic and euphotic zones with Îș = 0.56 and OA = 84.6%. We showed that a supervised classification could only marginally improve some of these accuracy results. The rule-based approach presents a simple method for detecting structural properties key to tree competition and potential for natural regeneration. The study was carried out with low-density datasets from the national program on ALS surveying of Finland, which shows potential for replication with the ALS datasets typically acquired at nation-wide scales. Since the presented method was based on deductive mathematical rules for describing distributions, it stands out from inductive supervised and unsupervised classification methods which are more commonly used in remote sensing. Therefore, it presents an opportunity for deducing physical relations which could partly eliminate the need for supporting ALS applications with field plot data for training and modelling, at least in Boreal forest ecosystems
    • 

    corecore