10,631 research outputs found

    An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

    Get PDF
    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing

    Decision Stream: Cultivating Deep Decision Trees

    Full text link
    Various modifications of decision trees have been extensively used during the past years due to their high efficiency and interpretability. Tree node splitting based on relevant feature selection is a key step of decision tree learning, at the same time being their major shortcoming: the recursive nodes partitioning leads to geometric reduction of data quantity in the leaf nodes, which causes an excessive model complexity and data overfitting. In this paper, we present a novel architecture - a Decision Stream, - aimed to overcome this problem. Instead of building a tree structure during the learning process, we propose merging nodes from different branches based on their similarity that is estimated with two-sample test statistics, which leads to generation of a deep directed acyclic graph of decision rules that can consist of hundreds of levels. To evaluate the proposed solution, we test it on several common machine learning problems - credit scoring, twitter sentiment analysis, aircraft flight control, MNIST and CIFAR image classification, synthetic data classification and regression. Our experimental results reveal that the proposed approach significantly outperforms the standard decision tree learning methods on both regression and classification tasks, yielding a prediction error decrease up to 35%

    Novel proposal for prediction of CO2 course and occupancy recognition in Intelligent Buildings within IoT

    Get PDF
    Many direct and indirect methods, processes, and sensors available on the market today are used to monitor the occupancy of selected Intelligent Building (IB) premises and the living activities of IB residents. By recognizing the occupancy of individual spaces in IB, IB can be optimally automated in conjunction with energy savings. This article proposes a novel method of indirect occupancy monitoring using CO2, temperature, and relative humidity measured by means of standard operating measurements using the KNX (Konnex (standard EN 50090, ISO/IEC 14543)) technology to monitor laboratory room occupancy in an intelligent building within the Internet of Things (IoT). The article further describes the design and creation of a Software (SW) tool for ensuring connectivity of the KNX technology and the IoT IBM Watson platform in real-time for storing and visualization of the values measured using a Message Queuing Telemetry Transport (MQTT) protocol and data storage into a CouchDB type database. As part of the proposed occupancy determination method, the prediction of the course of CO2 concentration from the measured temperature and relative humidity values were performed using mathematical methods of Linear Regression, Neural Networks, and Random Tree (using IBM SPSS Modeler) with an accuracy higher than 90%. To increase the accuracy of the prediction, the application of suppression of additive noise from the CO2 signal predicted by CO2 using the Least mean squares (LMS) algorithm in adaptive filtering (AF) method was used within the newly designed method. In selected experiments, the prediction accuracy with LMS adaptive filtration was better than 95%.Web of Science1223art. no. 454

    Optimization in a Simulation Setting: Use of Function Approximation in Debt Strategy Analysis

    Get PDF
    The stochastic simulation model suggested by Bolder (2003) for the analysis of the federal government's debt-management strategy provides a wide variety of useful information. It does not, however, assist in determining an optimal debt-management strategy for the government in its current form. Including optimization in the debt-strategy model would be useful, since it could substantially broaden the range of policy questions that can be addressed. Finding such an optimal strategy is nonetheless complicated by two challenges. First, performing optimization with traditional techniques in a simulation setting is computationally intractable. Second, it is necessary to define precisely what one means by an "optimal" debt strategy. The authors detail a possible approach for addressing these two challenges. They address the first challenge by approximating the numerically computed objective function using a function-approximation technique. They consider the use of ordinary least squares, kernel regression, multivariate adaptive regression splines, and projection-pursuit regressions as approximation algorithms. The second challenge is addressed by proposing a wide range of possible government objective functions and examining them in the context of an illustrative example. The authors' view is that the approach permits debt and fiscal managers to address a number of policy questions that could not be fully addressed with the current stochastic simulation engine.Debt management; Econometric and statistical methods; Fiscal policy; Financial markets
    • ā€¦
    corecore