51,798 research outputs found

    Sparse Bayesian kernel learning for high-dimensional regression and classification

    Get PDF
    Doctor of PhilosophyDepartment of StatisticsGyuhyeong GohIn the past decades, statistical learning has been an increasingly popular topic that has drawn a significant amount of attention from researchers. Kernel-based nonlinear models, in particular, are powerful tools due to their flexibility to extract information from complex datasets. A major challenge with kernel modeling in the current big data era is the curse of dimensionality. Although an abundance of variable selection methods have been proposed, the developments in high-dimensional Bayesian kernel models is still in its infancy. In addition to the variable selection, the innate nature of kernel-based models induces heavy computational costs, which further prohibit the application of related methods. The goal of this dissertation is to develop new, fast variable selection and prediction procedures in order to address the problem of high-dimensional nonlinear regression and classification from the Bayesian perspective. To reduce the computational cost, we propose a novel hybrid search algorithm and the Bayesian doubly-sparse frameworks to the kernel-based models. In Chapter 1, we discuss the background, existing methods, and their limitations. We also give the motivation for our study. In Chapter 2, we propose a Bayesian model hybrid search algorithm for Gaussian process (GP) regression models, which quickly scans through the model space to search for a set of models with high posterior probabilities. In addition, we address the massive and high-dimensional data problem for GP by proposing an approach which combines quantile subsample hybrid search with a nearest neighbor GP scheme. In Chapter 3, we propose a novel Bayesian doubly-sparse framework to the reproducing kernel Hilbert space (RKHS) regression models. The proposed doubly-sparse framework performs both variable selection and sparse kernel matrix estimation. In Chapter 4, we extend our proposed Bayesian doubly-sparse framework to the nonlinear Bayesian support vector machine

    Using neutral cline decay to estimate contemporary dispersal: a generic tool and its application to a major crop pathogen

    Get PDF
    Dispersal is a key parameter of adaptation, invasion and persistence. Yet standard population genetics inference methods hardly distinguish it from drift and many species cannot be studied by direct mark-recapture methods. Here, we introduce a method using rates of change in cline shapes for neutral markers to estimate contemporary dispersal. We apply it to the devastating banana pest Mycosphaerella fijiensis, a wind-dispersed fungus for which a secondary contact zone had previously been detected using landscape genetics tools. By tracking the spatio-temporal frequency change of 15 microsatellite markers, we find that σ, the standard deviation of parent–offspring dispersal distances, is 1.2 km/generation1/2. The analysis is further shown robust to a large range of dispersal kernels. We conclude that combining landscape genetics approaches to detect breaks in allelic frequencies with analyses of changes in neutral genetic clines offers a powerful way to obtain ecologically relevant estimates of dispersal in many species

    Power System Parameters Forecasting Using Hilbert-Huang Transform and Machine Learning

    Get PDF
    A novel hybrid data-driven approach is developed for forecasting power system parameters with the goal of increasing the efficiency of short-term forecasting studies for non-stationary time-series. The proposed approach is based on mode decomposition and a feature analysis of initial retrospective data using the Hilbert-Huang transform and machine learning algorithms. The random forests and gradient boosting trees learning techniques were examined. The decision tree techniques were used to rank the importance of variables employed in the forecasting models. The Mean Decrease Gini index is employed as an impurity function. The resulting hybrid forecasting models employ the radial basis function neural network and support vector regression. Apart from introduction and references the paper is organized as follows. The section 2 presents the background and the review of several approaches for short-term forecasting of power system parameters. In the third section a hybrid machine learning-based algorithm using Hilbert-Huang transform is developed for short-term forecasting of power system parameters. Fourth section describes the decision tree learning algorithms used for the issue of variables importance. Finally in section six the experimental results in the following electric power problems are presented: active power flow forecasting, electricity price forecasting and for the wind speed and direction forecasting
    corecore