5,617 research outputs found

    Towards a theory of heuristic and optimal planning for sequential information search

    No full text

    Fitting Prediction Rule Ensembles with R Package pre

    Get PDF
    Prediction rule ensembles (PREs) are sparse collections of rules, offering highly interpretable regression and classification models. This paper presents the R package pre, which derives PREs through the methodology of Friedman and Popescu (2008). The implementation and functionality of package pre is described and illustrated through application on a dataset on the prediction of depression. Furthermore, accuracy and sparsity of PREs is compared with that of single trees, random forest and lasso regression in four benchmark datasets. Results indicate that pre derives ensembles with predictive accuracy comparable to that of random forests, while using a smaller number of variables for prediction

    An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

    Get PDF
    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing

    The application of predictive modelling for determining bio-environmental factors affecting the distribution of blackflies (Diptera: Simuliidae) in the Gilgel Gibe watershed in Southwest Ethiopia

    Get PDF
    Blackflies are important macroinvertebrate groups from a public health as well as ecological point of view. Determining the biological and environmental factors favouring or inhibiting the existence of blackflies could facilitate biomonitoring of rivers as well as control of disease vectors. The combined use of different predictive modelling techniques is known to improve identification of presence/absence and abundance of taxa in a given habitat. This approach enables better identification of the suitable habitat conditions or environmental constraints of a given taxon. Simuliidae larvae are important biological indicators as they are abundant in tropical aquatic ecosystems. Some of the blackfly groups are also important disease vectors in poor tropical countries. Our investigations aim to establish a combination of models able to identify the environmental factors and macroinvertebrate organisms that are favourable or inhibiting blackfly larvae existence in aquatic ecosystems. The models developed using macroinvertebrate predictors showed better performance than those based on environmental predictors. The identified environmental and macroinvertebrate parameters can be used to determine the distribution of blackflies, which in turn can help control river blindness in endemic tropical places. Through a combination of modelling techniques, a reliable method has been developed that explains environmental and biological relationships with the target organism, and, thus, can serve as a decision support tool for ecological management strategies
    • …
    corecore