8,393 research outputs found

    An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

    Get PDF
    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing

    Developing Decision Tree Models to Create a Predictive Blockage Likelihood Model for Real-World Wastewater Networks

    Get PDF
    To reduce the blockages occurring on wastewater networks, reducing costs, customer and environmental impact, greater levels of proactive maintenance are being conducted by water and sewerage companies. For effective prioritisation of this maintenance, an accurate model of blockage likelihood is required. This paper presents the development of a model, for provision of a blockage likelihood level and verification using unseen data, based on previous decision tree models constructed using the asset and historical incident data from the wastewater network of Dŵr Cymru Welsh Water. The model has been developed here using the geographical grouping of sewers and the application of ensemble techniques, with the results illustrating the potential benefits which can be derived from these techniques.The work has been conducted as part of a Knowledge Transfer Partnership (KTP) with funding provided by Innovate UK and Dŵr Cymru Welsh Water (DCWW), working in collaboration with the University of Exeter’s Centre for Water Systems (CWS)

    Predicting the energy output of wind farms based on weather data: important variables and their correlation

    Get PDF
    Pre-print available at: http://arxiv.org/abs/1109.1922Wind energy plays an increasing role in the supply of energy world wide. The energy output of a wind farm is highly dependent on the weather conditions present at its site. If the output can be predicted more accurately, energy suppliers can coordinate the collaborative production of different energy sources more efficiently to avoid costly overproduction. In this paper, we take a computer science perspective on energy prediction based on weather data and analyze the important parameters as well as their correlation on the energy output. To deal with the interaction of the different parameters, we use symbolic regression based on the genetic programming tool DataModeler. Our studies are carried out on publicly available weather and energy data for a wind farm in Australia. We report on the correlation of the different variables for the energy output. The model obtained for energy prediction gives a very reliable prediction of the energy output for newly supplied weather data. © 2012 Elsevier Ltd.Ekaterina Vladislavleva, Tobias Friedrich, Frank Neumann, Markus Wagne

    Improved prediction of clay soil expansion using machine learning algorithms and meta-heuristic dichotomous ensemble classifiers

    Get PDF
    Soil swelling-related disaster is considered as one of the most devastating geo-hazards in modern history. Hence, proper determination of a soil's ability to expand is very vital for achieving a secure and safe ground for infrastructures. Accordingly, this study has provided a novel and intelligent approach that enables an improved estimation of swelling by using kernelised machines (Bayesian linear regression (BLR) & bayes point machine (BPM) support vector machine (SVM) and deep-support vector machine (D-SVM)); (multiple linear regressor (REG), logistic regressor (LR) and artificial neural network (ANN)), tree-based algorithms such as decision forest (RDF) & boosted trees (BDT). Also, and for the first time, meta-heuristic classifiers incorporating the techniques of voting (VE) and stacking (SE) were utilised. Different independent scenarios of explanatory features’ combination that influence soil behaviour in swelling were investigated. Preliminary results indicated BLR as possessing the highest amount of deviation from the predictor variable (the actual swell-strain). REG and BLR performed slightly better than ANN while the meta-heuristic learners (VE and SE) produced the best overall performance (greatest R2 value of 0.94 and RMSE of 0.06% exhibited by VE). CEC, plasticity index and moisture content were the features considered to have the highest level of importance. Kernelized binary classifiers (SVM, D-SVM and BPM) gave better accuracy (average accuracy and recall rate of 0.93 and 0.60) compared to ANN, LR and RDF. Sensitivity-driven diagnostic test indicated that the meta-heuristic models’ best performance occurred when ML training was conducted using k-fold validation technique. Finally, it is recommended that the concepts developed herein be deployed during the preliminary phases of a geotechnical or geological site characterisation by using the best performing meta-heuristic models via their background coding resource
    • …
    corecore