14 research outputs found

    Mean field variational Bayesian inference for support vector machine classification

    Full text link
    A mean field variational Bayes approach to support vector machines (SVMs) using the latent variable representation on Polson & Scott (2012) is presented. This representation allows circumvention of many of the shortcomings associated with classical SVMs including automatic penalty parameter selection, the ability to handle dependent samples, missing data and variable selection. We demonstrate on simulated and real datasets that our approach is easily extendable to non-standard situations and outperforms the classical SVM approach whilst remaining computationally efficient.Comment: 18 pages, 4 figure

    Bayesian Approximate Kernel Regression with Variable Selection

    Full text link
    Nonlinear kernel regression models are often used in statistics and machine learning because they are more accurate than linear models. Variable selection for kernel regression models is a challenge partly because, unlike the linear regression setting, there is no clear concept of an effect size for regression coefficients. In this paper, we propose a novel framework that provides an effect size analog of each explanatory variable for Bayesian kernel regression models when the kernel is shift-invariant --- for example, the Gaussian kernel. We use function analytic properties of shift-invariant reproducing kernel Hilbert spaces (RKHS) to define a linear vector space that: (i) captures nonlinear structure, and (ii) can be projected onto the original explanatory variables. The projection onto the original explanatory variables serves as an analog of effect sizes. The specific function analytic property we use is that shift-invariant kernel functions can be approximated via random Fourier bases. Based on the random Fourier expansion we propose a computationally efficient class of Bayesian approximate kernel regression (BAKR) models for both nonlinear regression and binary classification for which one can compute an analog of effect sizes. We illustrate the utility of BAKR by examining two important problems in statistical genetics: genomic selection (i.e. phenotypic prediction) and association mapping (i.e. inference of significant variants or loci). State-of-the-art methods for genomic selection and association mapping are based on kernel regression and linear models, respectively. BAKR is the first method that is competitive in both settings.Comment: 22 pages, 3 figures, 3 tables; theory added; new simulations presented; references adde

    Support Vector Regression Method for Wind Speed Prediction Incorporating Probability Prior Knowledge

    Get PDF
    Prior knowledge, such as wind speed probability distribution based on historical data and the wind speed fluctuation between the maximal value and the minimal value in a certain period of time, provides much more information about the wind speed, so it is necessary to incorporate it into the wind speed prediction. First, a method of estimating wind speed probability distribution based on historical data is proposed based on Bernoulli’s law of large numbers. Second, in order to describe the wind speed fluctuation between the maximal value and the minimal value in a certain period of time, the probability distribution estimated by the proposed method is incorporated into the training data and the testing data. Third, a support vector regression model for wind speed prediction is proposed based on standard support vector regression. At last, experiments predicting the wind speed in a certain wind farm show that the proposed method is feasible and effective and the model’s running time and prediction errors can meet the needs of wind speed prediction

    Priori Information Based Support Vector Regression and Its Applications

    Get PDF
    In order to extract the priori information (PI) provided by real monitored values of peak particle velocity (PPV) and increase the prediction accuracy of PPV, PI based support vector regression (SVR) is established. Firstly, to extract the PI provided by monitored data from the aspect of mathematics, the probability density of PPV is estimated with ε-SVR. Secondly, in order to make full use of the PI about fluctuation of PPV between the maximal value and the minimal value in a certain period of time, probability density estimated with ε-SVR is incorporated into training data, and then the dimensionality of training data is increased. Thirdly, using the training data with a higher dimension, a method of predicting PPV called PI-ε-SVR is proposed. Finally, with the collected values of PPV induced by underwater blasting at Dajin Island in Taishan nuclear power station in China, contrastive experiments are made to show the effectiveness of the proposed method

    Bayesian Approaches to Copula Modelling

    Full text link
    Copula models have become one of the most widely used tools in the applied modelling of multivariate data. Similarly, Bayesian methods are increasingly used to obtain efficient likelihood-based inference. However, to date, there has been only limited use of Bayesian approaches in the formulation and estimation of copula models. This article aims to address this shortcoming in two ways. First, to introduce copula models and aspects of copula theory that are especially relevant for a Bayesian analysis. Second, to outline Bayesian approaches to formulating and estimating copula models, and their advantages over alternative methods. Copulas covered include Archimedean, copulas constructed by inversion, and vine copulas; along with their interpretation as transformations. A number of parameterisations of a correlation matrix of a Gaussian copula are considered, along with hierarchical priors that allow for Bayesian selection and model averaging for each parameterisation. Markov chain Monte Carlo sampling schemes for fitting Gaussian and D-vine copulas, with and without selection, are given in detail. The relationship between the prior for the parameters of a D-vine, and the prior for a correlation matrix of a Gaussian copula, is discussed. Last, it is shown how to compute Bayesian inference when the data are discrete-valued using data augmentation. This approach generalises popular Bayesian methods for the estimation of models for multivariate binary and other ordinal data to more general copula models. Bayesian data augmentation has substantial advantages over other methods of estimation for this class of models

    Generalized Kernel Regularized Least Squares

    Full text link
    Kernel Regularized Least Squares (KRLS) is a popular method for flexibly estimating models that may have complex relationships between variables. However, its usefulness to many researchers is limited for two reasons. First, existing approaches are inflexible and do not allow KRLS to be combined with theoretically-motivated extensions such as random effects, unregularized fixed effects, or non-Gaussian outcomes. Second, estimation is extremely computationally intensive for even modestly sized datasets. Our paper addresses both concerns by introducing generalized KRLS (gKRLS). We note that KRLS can be re-formulated as a hierarchical model thereby allowing easy inference and modular model construction where KRLS can be used alongside random effects, splines, and unregularized fixed effects. Computationally, we also implement random sketching to dramatically accelerate estimation while incurring a limited penalty in estimation quality. We demonstrate that gKRLS can be fit on datasets with tens of thousands of observations in under one minute. Further, state-of-the-art techniques that require fitting the model over a dozen times (e.g. meta-learners) can be estimated quickly.Comment: Accepted version available at DOI below; corrected small typo
    corecore