Search CORE

8,393 research outputs found

An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

Author: Malley James
Strobl Carolin
Tutz Gerhard
Publication venue
Publication date: 01/04/2009
Field of study

Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing

Crossref

Open Access LMU

PubMed Central

Developing Decision Tree Models to Create a Predictive Blockage Likelihood Model for Real-World Wastewater Networks

Author: Bailey J
Djordjevic S
Harris E
Kapelan Z
Keedwell E
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

To reduce the blockages occurring on wastewater networks, reducing costs, customer and environmental impact, greater levels of proactive maintenance are being conducted by water and sewerage companies. For effective prioritisation of this maintenance, an accurate model of blockage likelihood is required. This paper presents the development of a model, for provision of a blockage likelihood level and verification using unseen data, based on previous decision tree models constructed using the asset and historical incident data from the wastewater network of Dŵr Cymru Welsh Water. The model has been developed here using the geographical grouping of sewers and the application of ensemble techniques, with the results illustrating the potential benefits which can be derived from these techniques.The work has been conducted as part of a Knowledge Transfer Partnership (KTP) with funding provided by Innovate UK and Dŵr Cymru Welsh Water (DCWW), working in collaboration with the University of Exeter’s Centre for Water Systems (CWS)

Elsevier - Publisher Connector

Crossref

Open Research Exeter

Some fundamental issues in ensemble methods

Author: Wang W.
Publication venue
Publication date: 01/01/2008
Field of study

University of East Anglia digital repository

Predicting the energy output of wind farms based on weather data: important variables and their correlation

Author: Brown
Ekaterina Vladislavleva
Evolved Analytics LLC
Foley
Frank Neumann
Jursa
Kotanchek
Koza
Kramer
Kramer
Kusiak
Markus Wagner
Poli
Schmidt
Sánchez
Tobias Friedrich
Webb
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Pre-print available at: http://arxiv.org/abs/1109.1922Wind energy plays an increasing role in the supply of energy world wide. The energy output of a wind farm is highly dependent on the weather conditions present at its site. If the output can be predicted more accurately, energy suppliers can coordinate the collaborative production of different energy sources more efficiently to avoid costly overproduction. In this paper, we take a computer science perspective on energy prediction based on weather data and analyze the important parameters as well as their correlation on the energy output. To deal with the interaction of the different parameters, we use symbolic regression based on the genetic programming tool DataModeler. Our studies are carried out on publicly available weather and energy data for a wind farm in Australia. We report on the correlation of the different variables for the energy output. The model obtained for energy prediction gives a very reliable prediction of the energy output for newly supplied weather data. © 2012 Elsevier Ltd.Ekaterina Vladislavleva, Tobias Friedrich, Frank Neumann, Markus Wagne

Crossref

Adelaide Research & Scholarship

Improved prediction of clay soil expansion using machine learning algorithms and meta-heuristic dichotomous ensemble classifiers

Author: Abbey S. J.
Eyo E. U.
Lawrence T. T.
Tetteh F. K.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Soil swelling-related disaster is considered as one of the most devastating geo-hazards in modern history. Hence, proper determination of a soil's ability to expand is very vital for achieving a secure and safe ground for infrastructures. Accordingly, this study has provided a novel and intelligent approach that enables an improved estimation of swelling by using kernelised machines (Bayesian linear regression (BLR) & bayes point machine (BPM) support vector machine (SVM) and deep-support vector machine (D-SVM)); (multiple linear regressor (REG), logistic regressor (LR) and artificial neural network (ANN)), tree-based algorithms such as decision forest (RDF) & boosted trees (BDT). Also, and for the first time, meta-heuristic classifiers incorporating the techniques of voting (VE) and stacking (SE) were utilised. Different independent scenarios of explanatory features’ combination that influence soil behaviour in swelling were investigated. Preliminary results indicated BLR as possessing the highest amount of deviation from the predictor variable (the actual swell-strain). REG and BLR performed slightly better than ANN while the meta-heuristic learners (VE and SE) produced the best overall performance (greatest R2 value of 0.94 and RMSE of 0.06% exhibited by VE). CEC, plasticity index and moisture content were the features considered to have the highest level of importance. Kernelized binary classifiers (SVM, D-SVM and BPM) gave better accuracy (average accuracy and recall rate of 0.93 and 0.60) compared to ANN, LR and RDF. Sensitivity-driven diagnostic test indicated that the meta-heuristic models’ best performance occurred when ML training was conducted using k-fold validation technique. Finally, it is recommended that the concepts developed herein be deployed during the preliminary phases of a geotechnical or geological site characterisation by using the best performing meta-heuristic models via their background coding resource

Directory of Open Access Journals

UWE Bristol Research Repository

Coventry University Pure Portal