51,798 research outputs found
Sparse Bayesian kernel learning for high-dimensional regression and classification
Doctor of PhilosophyDepartment of StatisticsGyuhyeong GohIn the past decades, statistical learning has been an increasingly popular topic that has drawn a significant amount of attention from researchers. Kernel-based nonlinear models, in particular, are powerful tools due to their flexibility to extract information from complex datasets. A major challenge with kernel modeling in the current big data era is the curse of dimensionality. Although an abundance of variable selection methods have been proposed, the developments in high-dimensional Bayesian kernel models is still in its infancy. In addition to the variable selection, the innate nature of kernel-based models induces heavy computational costs, which further prohibit the application of related methods. The goal of this dissertation is to develop new, fast variable selection and prediction procedures in order to address the problem of high-dimensional nonlinear regression and classification from the Bayesian perspective. To reduce the computational cost, we propose a novel hybrid search algorithm and the Bayesian doubly-sparse frameworks to the kernel-based models.
In Chapter 1, we discuss the background, existing methods, and their limitations. We also give the motivation for our study. In Chapter 2, we propose a Bayesian model hybrid search algorithm for Gaussian process (GP) regression models, which quickly scans through the model space to search for a set of models with high posterior probabilities. In addition, we address the massive and high-dimensional data problem for GP by proposing an approach which combines quantile subsample hybrid search with a nearest neighbor GP scheme. In Chapter 3, we propose a novel Bayesian doubly-sparse framework to the reproducing kernel Hilbert space (RKHS) regression models. The proposed doubly-sparse framework performs both variable selection and sparse kernel matrix estimation. In Chapter 4, we extend our proposed Bayesian doubly-sparse framework to the nonlinear Bayesian support vector machine
Using neutral cline decay to estimate contemporary dispersal: a generic tool and its application to a major crop pathogen
Dispersal is a key parameter of adaptation, invasion and persistence. Yet standard population genetics inference methods hardly distinguish it from drift and many species cannot be studied by direct mark-recapture methods. Here, we introduce a method using rates of change in cline shapes for neutral markers to estimate contemporary dispersal. We apply it to the devastating banana pest Mycosphaerella fijiensis, a wind-dispersed fungus for which a secondary contact zone had previously been detected using landscape genetics tools. By tracking the spatio-temporal frequency change of 15 microsatellite markers, we find that σ, the standard deviation of parent–offspring dispersal distances, is 1.2 km/generation1/2. The analysis is further shown robust to a large range of dispersal kernels. We conclude that combining landscape genetics approaches to detect breaks in allelic frequencies with analyses of changes in neutral genetic clines offers a powerful way to obtain ecologically relevant estimates of dispersal in many species
Power System Parameters Forecasting Using Hilbert-Huang Transform and Machine Learning
A novel hybrid data-driven approach is developed for forecasting power system
parameters with the goal of increasing the efficiency of short-term forecasting
studies for non-stationary time-series. The proposed approach is based on mode
decomposition and a feature analysis of initial retrospective data using the
Hilbert-Huang transform and machine learning algorithms. The random forests and
gradient boosting trees learning techniques were examined. The decision tree
techniques were used to rank the importance of variables employed in the
forecasting models. The Mean Decrease Gini index is employed as an impurity
function. The resulting hybrid forecasting models employ the radial basis
function neural network and support vector regression. Apart from introduction
and references the paper is organized as follows. The section 2 presents the
background and the review of several approaches for short-term forecasting of
power system parameters. In the third section a hybrid machine learning-based
algorithm using Hilbert-Huang transform is developed for short-term forecasting
of power system parameters. Fourth section describes the decision tree learning
algorithms used for the issue of variables importance. Finally in section six
the experimental results in the following electric power problems are
presented: active power flow forecasting, electricity price forecasting and for
the wind speed and direction forecasting
- …