94,970 research outputs found

    Decision Stream: Cultivating Deep Decision Trees

    Full text link
    Various modifications of decision trees have been extensively used during the past years due to their high efficiency and interpretability. Tree node splitting based on relevant feature selection is a key step of decision tree learning, at the same time being their major shortcoming: the recursive nodes partitioning leads to geometric reduction of data quantity in the leaf nodes, which causes an excessive model complexity and data overfitting. In this paper, we present a novel architecture - a Decision Stream, - aimed to overcome this problem. Instead of building a tree structure during the learning process, we propose merging nodes from different branches based on their similarity that is estimated with two-sample test statistics, which leads to generation of a deep directed acyclic graph of decision rules that can consist of hundreds of levels. To evaluate the proposed solution, we test it on several common machine learning problems - credit scoring, twitter sentiment analysis, aircraft flight control, MNIST and CIFAR image classification, synthetic data classification and regression. Our experimental results reveal that the proposed approach significantly outperforms the standard decision tree learning methods on both regression and classification tasks, yielding a prediction error decrease up to 35%

    LEARNING HYPERPLANES THAT CAPTURES THE GEOMETRIC STRUCTURE OF CLASS REGIONS

    Get PDF
    Most of the decision tree algorithms rely on impurity measures to evaluate the goodness of hyperplanes at each node while learning a decision tree in a top-down fashion. These impurity measures are not differentiable with relation to the hyperplane parameters. Therefore the algorithms for decision tree learning using impurity measures need to use some search techniques for finding the best hyperplane at every node. These impurity measures don’t properly capture the geometric structures of the data. In this paper a Two-Class algorithm for learning oblique decision trees is proposed. Aggravated by this, the algorithm uses a strategy, to evaluate the hyperplanes in such a way that the (linear) geometric structure in the data is taken into consideration. At each node of the decision tree, algorithm finds the clustering hyperplanes for both the classes. The clustering hyperplanes are obtained by solving the generalized Eigen-value problem. Then the data is splitted based on angle bisector and recursively learn the left and right sub-trees of the node. Since, in general, there will be two angle bisectors; one is selected which is better based on an impurity measure gini index. Thus the algorithm combines the ideas of linear tendencies in data and purity of nodes to find better decision trees. This idea leads to small decision trees and better performance

    PHOTOGRAMMETRIC POINT CLOUD CLASSIFICATION BASED ON GEOMETRIC AND RADIOMETRIC DATA INTEGRATION

    Get PDF
    The extraction of information from point cloud is usually done after the application of classification methods based on the geometric characteristics of the objects. However, the classification of photogrammetric point clouds can be carried out using radiometric information combined with geometric information to minimize possible classification issues. With this in mind, this work proposes an approach to the classification of photogrammetric point cloud, generated by correspondence of aerial images acquired by Remotely Piloted Aircraft System (RPAS). The proposed approach for classifying photogrammetric point clouds consists of a pixel-supervised classification method, based on a decision tree. To achieve this, three data sets were used, one to define which attributes allow discrimination between the classes and the definition of the thresholds. Initially, several attributes were extracted based on a training sample. The average and standard deviation values for the attributes of each class extracted were used to guide the decision tree definition. The defined decision tree was applied to the other two point clouds to validate the approach and for thematic accuracy assessment. The quantitative analyses of the classifications based on kappa coefficient of agreement, applied to both validation areas, reached values higher than 0.93

    Generalized Sparse Convolutional Neural Networks for Semantic Segmentation of Point Clouds Derived from Tri-Stereo Satellite Imagery

    Get PDF
    We studied the applicability of point clouds derived from tri-stereo satellite imagery for semantic segmentation for generalized sparse convolutional neural networks by the example of an Austrian study area. We examined, in particular, if the distorted geometric information, in addition to color, influences the performance of segmenting clutter, roads, buildings, trees, and vehicles. In this regard, we trained a fully convolutional neural network that uses generalized sparse convolution one time solely on 3D geometric information (i.e., 3D point cloud derived by dense image matching), and twice on 3D geometric as well as color information. In the first experiment, we did not use class weights, whereas in the second we did. We compared the results with a fully convolutional neural network that was trained on a 2D orthophoto, and a decision tree that was once trained on hand-crafted 3D geometric features, and once trained on hand-crafted 3D geometric as well as color features. The decision tree using hand-crafted features has been successfully applied to aerial laser scanning data in the literature. Hence, we compared our main interest of study, a representation learning technique, with another representation learning technique, and a non-representation learning technique. Our study area is located in Waldviertel, a region in Lower Austria. The territory is a hilly region covered mainly by forests, agriculture, and grasslands. Our classes of interest are heavily unbalanced. However, we did not use any data augmentation techniques to counter overfitting. For our study area, we reported that geometric and color information only improves the performance of the Generalized Sparse Convolutional Neural Network (GSCNN) on the dominant class, which leads to a higher overall performance in our case. We also found that training the network with median class weighting partially reverts the effects of adding color. The network also started to learn the classes with lower occurrences. The fully convolutional neural network that was trained on the 2D orthophoto generally outperforms the other two with a kappa score of over 90% and an average per class accuracy of 61%. However, the decision tree trained on colors and hand-crafted geometric features has a 2% higher accuracy for roads

    Learning non-Higgsable gauge groups in 4D F-theory

    Full text link
    We apply machine learning techniques to solve a specific classification problem in 4D F-theory. For a divisor DD on a given complex threefold base, we want to read out the non-Higgsable gauge group on it using local geometric information near DD. The input features are the triple intersection numbers among divisors near DD and the output label is the non-Higgsable gauge group. We use decision tree to solve this problem and achieved 85%-98% out-of-sample accuracies for different classes of divisors, where the data sets are generated from toric threefold bases without (4,6) curves. We have explicitly generated a large number of analytic rules directly from the decision tree and proved a small number of them. As a crosscheck, we applied these decision trees on bases with (4,6) curves as well and achieved high accuracies. Additionally, we have trained a decision tree to distinguish toric (4,6) curves as well. Finally, we present an application of these analytic rules to construct local base configurations with interesting gauge groups such as SU(3).Comment: 50 pages, 18 figures, 20 table

    Model Balanced Bagging Berbasis Decision Tree Pada Dataset Imbalanced Class

    Get PDF
    Algoritma klasifikasi merupakan algoritma yang sangat sering digunakan beriringan dengan kebutuhan manusia, namun peneliti an sebelumnya sering dijumpai kendala saat menggunakan algoritma klasifikasi. Salah satu permasalahan yang sering sekali dijumpai ialah kasus imbalanced dataset. Sehingga dalam penelitian ini diusulkan ensemble method untuk mengatasinya, salah satu algoritma ensemble method yang terkenal ialah bagging. Implementasi balanced-bagging digunakan untuk meningkatkan kemampuan dari algoritma bagging. Dalam penelitian ini melibatkan perbandingan tiga model klasifikasi berbeda dengan lima dataset yang memiliki imbalanced ratio (IR) yang berbeda, Model akan dievaluasi berdasarkan metrik akurasi (balanced accuracy), geometric mean dan area under curve (AUC). Model pertama merupakan proses klasifikasi menggunakan Decision Tree (tanpa Bagging),  Model kedua merupakan proses klasifikasi menggunakan Decision Tree (dengan Bagging) dan model ketiga menggunakan Decision Tree (dengan Balanced-Bagging). Implementasi metode bagging dan balanced bagging terhadap algoritma klasifikasi Decision Tree mampu meningkatkan kinerja hasil akurasi (balanced accuracy), geometric mean, dan AUC. Secara umum model Decision Tree + Balanced Bagging menghasilkan kinerja yang terbaik pada seluruh dataset yang digunakan
    corecore