14 research outputs found
Mean field variational Bayesian inference for support vector machine classification
A mean field variational Bayes approach to support vector machines (SVMs)
using the latent variable representation on Polson & Scott (2012) is presented.
This representation allows circumvention of many of the shortcomings associated
with classical SVMs including automatic penalty parameter selection, the
ability to handle dependent samples, missing data and variable selection. We
demonstrate on simulated and real datasets that our approach is easily
extendable to non-standard situations and outperforms the classical SVM
approach whilst remaining computationally efficient.Comment: 18 pages, 4 figure
Bayesian Approximate Kernel Regression with Variable Selection
Nonlinear kernel regression models are often used in statistics and machine
learning because they are more accurate than linear models. Variable selection
for kernel regression models is a challenge partly because, unlike the linear
regression setting, there is no clear concept of an effect size for regression
coefficients. In this paper, we propose a novel framework that provides an
effect size analog of each explanatory variable for Bayesian kernel regression
models when the kernel is shift-invariant --- for example, the Gaussian kernel.
We use function analytic properties of shift-invariant reproducing kernel
Hilbert spaces (RKHS) to define a linear vector space that: (i) captures
nonlinear structure, and (ii) can be projected onto the original explanatory
variables. The projection onto the original explanatory variables serves as an
analog of effect sizes. The specific function analytic property we use is that
shift-invariant kernel functions can be approximated via random Fourier bases.
Based on the random Fourier expansion we propose a computationally efficient
class of Bayesian approximate kernel regression (BAKR) models for both
nonlinear regression and binary classification for which one can compute an
analog of effect sizes. We illustrate the utility of BAKR by examining two
important problems in statistical genetics: genomic selection (i.e. phenotypic
prediction) and association mapping (i.e. inference of significant variants or
loci). State-of-the-art methods for genomic selection and association mapping
are based on kernel regression and linear models, respectively. BAKR is the
first method that is competitive in both settings.Comment: 22 pages, 3 figures, 3 tables; theory added; new simulations
presented; references adde
Support Vector Regression Method for Wind Speed Prediction Incorporating Probability Prior Knowledge
Prior knowledge, such as wind speed probability distribution based on historical data and the wind speed fluctuation between the maximal value and the minimal value in a certain period of time, provides much more information about the wind speed, so it is necessary to incorporate it into the wind speed prediction. First, a method of estimating wind speed probability distribution based on historical data is proposed based on Bernoulli’s law of large numbers. Second, in order to describe the wind speed fluctuation between the maximal value and the minimal value in a certain period of time, the probability distribution estimated by the proposed method is incorporated into the training data and the testing data. Third, a support vector regression model for wind speed prediction is proposed based on standard support vector regression. At last, experiments predicting the wind speed in a certain wind farm show that the proposed method is feasible and effective and the model’s running time and prediction errors can meet the needs of wind speed prediction
Priori Information Based Support Vector Regression and Its Applications
In order to extract the priori information (PI) provided by real monitored values of peak particle velocity (PPV) and increase the prediction accuracy of PPV, PI based support vector regression (SVR) is established. Firstly, to extract the PI provided by monitored data from the aspect of mathematics, the probability density of PPV is estimated with ε-SVR. Secondly, in order to make full use of the PI about fluctuation of PPV between the maximal value and the minimal value in a certain period of time, probability density estimated with ε-SVR is incorporated into training data, and then the dimensionality of training data is increased. Thirdly, using the training data with a higher dimension, a method of predicting PPV called PI-ε-SVR is proposed. Finally, with the collected values of PPV induced by underwater blasting at Dajin Island in Taishan nuclear power station in China, contrastive experiments are made to show the effectiveness of the proposed method
Bayesian Approaches to Copula Modelling
Copula models have become one of the most widely used tools in the applied
modelling of multivariate data. Similarly, Bayesian methods are increasingly
used to obtain efficient likelihood-based inference. However, to date, there
has been only limited use of Bayesian approaches in the formulation and
estimation of copula models. This article aims to address this shortcoming in
two ways. First, to introduce copula models and aspects of copula theory that
are especially relevant for a Bayesian analysis. Second, to outline Bayesian
approaches to formulating and estimating copula models, and their advantages
over alternative methods. Copulas covered include Archimedean, copulas
constructed by inversion, and vine copulas; along with their interpretation as
transformations. A number of parameterisations of a correlation matrix of a
Gaussian copula are considered, along with hierarchical priors that allow for
Bayesian selection and model averaging for each parameterisation. Markov chain
Monte Carlo sampling schemes for fitting Gaussian and D-vine copulas, with and
without selection, are given in detail. The relationship between the prior for
the parameters of a D-vine, and the prior for a correlation matrix of a
Gaussian copula, is discussed. Last, it is shown how to compute Bayesian
inference when the data are discrete-valued using data augmentation. This
approach generalises popular Bayesian methods for the estimation of models for
multivariate binary and other ordinal data to more general copula models.
Bayesian data augmentation has substantial advantages over other methods of
estimation for this class of models
Generalized Kernel Regularized Least Squares
Kernel Regularized Least Squares (KRLS) is a popular method for flexibly
estimating models that may have complex relationships between variables.
However, its usefulness to many researchers is limited for two reasons. First,
existing approaches are inflexible and do not allow KRLS to be combined with
theoretically-motivated extensions such as random effects, unregularized fixed
effects, or non-Gaussian outcomes. Second, estimation is extremely
computationally intensive for even modestly sized datasets. Our paper addresses
both concerns by introducing generalized KRLS (gKRLS). We note that KRLS can be
re-formulated as a hierarchical model thereby allowing easy inference and
modular model construction where KRLS can be used alongside random effects,
splines, and unregularized fixed effects. Computationally, we also implement
random sketching to dramatically accelerate estimation while incurring a
limited penalty in estimation quality. We demonstrate that gKRLS can be fit on
datasets with tens of thousands of observations in under one minute. Further,
state-of-the-art techniques that require fitting the model over a dozen times
(e.g. meta-learners) can be estimated quickly.Comment: Accepted version available at DOI below; corrected small typo