333,866 research outputs found

    A comparison of four data selection methods for artificial neural networks and support vector machines

    Get PDF
    The performance of data-driven models such as Artificial Neural Networks and Support Vector Machines relies to a good extent on selecting proper data throughout the design phase. This paper addresses a comparison of four unsupervised data selection methods including random, convex hull based, entropy based and a hybrid data selection method. These methods were evaluated on eight benchmarks in classification and regression problems. For classification, Support Vector Machines were used, while for the regression problems, Multi-Layer Perceptrons were employed. Additionally, for each problem type, a non-dominated set of Radial Basis Functions Neural Networks were designed, benefiting from a Multi Objective Genetic Algorithm. The simulation results showed that the convex hull based method and the hybrid method involving convex hull and entropy, obtain better performance than the other methods, and that MOGA designed RBFNNs always perform better than the other models. (C) 2017, IFAC (International Federation of Automatic Control) Hosting by Elsevier Ltd. All rights reserved.FCT through IDMEC, under LAETA grant [UID/EMS/50022/2013]info:eu-repo/semantics/publishedVersio

    An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

    Get PDF
    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing

    Monte Carlo Implementation of Gaussian Process Models for Bayesian Regression and Classification

    Full text link
    Gaussian processes are a natural way of defining prior distributions over functions of one or more input variables. In a simple nonparametric regression problem, where such a function gives the mean of a Gaussian distribution for an observed response, a Gaussian process model can easily be implemented using matrix computations that are feasible for datasets of up to about a thousand cases. Hyperparameters that define the covariance function of the Gaussian process can be sampled using Markov chain methods. Regression models where the noise has a t distribution and logistic or probit models for classification applications can be implemented by sampling as well for latent values underlying the observations. Software is now available that implements these methods using covariance functions with hierarchical parameterizations. Models defined in this way can discover high-level properties of the data, such as which inputs are relevant to predicting the response
    • …
    corecore