12 research outputs found

    Some non-standard statistical dependence problems

    Get PDF
    Philosophiae Doctor - PhDThe major result of this thesis is the development of a framework for the application of pair-mixtures of copulas to model asymmetric dependencies in bivariate data. The main motivation is the inadequacy of mixtures of bivariate Gaussian models which are commonly fitted to data. Mixtures of rotated single parameter Archimedean and Gaussian copulas are fitted to real data sets. The method of maximum likelihood is used for parameter estimation. Goodness-of-fit tests performed on the models giving the highest log-likelihood values show that the models fit the data well. We use mixtures of univariate Gaussian models and mixtures of regression models to investigate the existence of bimodality in the distribution of the widths of autocorrelation functions in a sample of 119 gamma-ray bursts. Contrary to previous findings, our results do not reveal any evidence of bimodality. We extend a study by Genest et al. (2012) of the power and significance levels of tests of copula symmetry, to two copula models which have not been considered previously. Our results confirm that for small sample sizes, these tests fail to maintain their 5% significance level and that the Cramer-von Mises-type statistics are the most powerful

    Robust modelling framework for short-term forecasting of global horizontal irradiance

    Get PDF
    The increasing demand for electricity and the need for clean energy sources have increased solar energy use. Accurate forecasts of solar energy are required for easy management of the grid. This paper compares the accuracy of two Gaussian Process Regression (GPR) models combined with Additive Quantile Regression (AQR) and Bayesian Structural Time Series (BSTS) models in the 2-day ahead forecasting of global horizontal irradiance using data from the University of Pretoria from July 2020 to August 2021. Four methods were adopted for variable selection, Lasso, ElasticNet, Boruta, and GBR (Gradient Boosting Regression). The variables selected using GBR were used because they produced the lowest MAE (Minimum Absolute Errors) value. A comparison of seven models GPR (Gaussian Process Regression), Two-layer DGPR (Two-layer Deep Gaussian Process Regression), bstslong (Bayesian Structural Time Series long), AQRA (Additive Quantile Regression Averaging), QRNN(Quantile Regression Neural Network), PLAQR(Partial Linear additive Quantile Regression), and Opera(Online Prediction by ExpRt Aggregation) was made. The evaluation metrics used to select the best model were the MAE (Mean Absolute Error) and RMSE (Root Mean Square Error). Further evaluations were done using proper scoring rules and Murphy diagrams. The best individual model was found to be the GPR. The best forecast combination was AQRA ((AQR Averaging) based on MAE. However, based on RMSE, GPNN was the best forecast combination method. Companies such as Eskom could use the methods adopted in this study to control and manage the power grid. The results will promote economic development and sustainability of energy resources.Comment: 25 pages, 12 figures and 7 table

    Regularisation in discrete survival models: A comparison of lasso and gradient boosting

    Get PDF
    We present the results of a simulation study performed to compare the accuracy of a lassotype penalization method and gradient boosting in estimating the baseline hazard function and covariate parameters in discrete survival models. The mean square error results reveal that the lasso-type algorithm performs better in recovering the baseline hazard and covariate parameters. In particular, gradient boosting underestimates the sizes of the parameters and also has a high false positive rate. Similar results are obtained in an application to real-life data

    Short term electricity demand forecasting using partially linear additive quantile regression with an application to the unit commitment problem

    Get PDF
    Abstract Short term probabilistic load forecasting is essential for any power generating utility. This paper discusses an application of partially linear additive quantile regression models for predicting short term electricity demand during the peak demand hours (i.e. from 18:00 to 20:00) using South African data for January 2009 to June 2012. Additionally the bounded variable mixed integer linear programming technique is used on the forecasts obtained in order to find an optimal number of units to commit (switch on or off. Variable selection is done using the least absolute shrinkage and selection operator. Results from the unit commitment problem show that it is very costly to use gas fired generating units. These were not selected as part of the optimal solution. It is shown that the optimal solutions based on median forecasts ( Q 0.5 quantile forecasts) are the same as those from the 99th quantile forecasts except for generating unit g 8 c , which is a coal fired unit. This shows that for any increase in demand above the median quantile forecasts it will be economical to increase the generation of electricity from generating unit g 8 c . The main contribution of this study is in the use of nonlinear trend variables and the combining of forecasting with the unit commitment problem. The study should be useful to system operators in power utility companies in the unit commitment scheduling and dispatching of electricity at a minimal cost particularly during the peak period when the grid is constrained due to increased demand for electricity

    Prediction of Foreign Direct Investment: an Application to South African Data

    No full text
    Foreign direct investment is considered as a vehicle for transferring new ideas, capital, superior technology and skills from developed countries to developing countries. Kernel quantile regression is used in this study to estimate the relationship between foreign direct investment and the factors influencing it in South Africa, using data for the period 1996 to 2015. Using the least absolute shrinkage and selection operator technique, all the variables were selected to be in the models. The developed kernel quantile regression models were used for forecasting the future inflow of foreign direct investment in South Africa. The forecast evaluation was done on all the models and the model based on the ANOVA radial basis kernel was selected as the best in terms of the accuracy measures (mean absolute percentage error, root mean square error and mean absolute error). The forecasts from the individual models were then combined using linear quantile regression averaging. The kernel quantile regression model using an ANOVA radial basis kernel was found to be the best model for forecasting foreign direct investment in South Africa. Accurate forecasts of FDI aid in economic planning. Identification of key drivers of FDI inflow can assist in crafting strategies to attract more FDI

    Spatio-Temporal Forecasting of Global Horizontal Irradiance Using Bayesian Inference

    No full text
    Accurate global horizontal irradiance (GHI) forecasting promotes power grid stability. Most of the research on solar irradiance forecasting has been based on a single-site analysis. It is crucial to explore multisite modeling to capture variations in weather conditions between various sites, thereby producing a more robust model. In this research, we propose the use of spatial regression coupled with Gaussian Process Regression (GP Spatial) and the GP Autoregressive Spatial model (GP-AR Spatial) for the prediction of GHI using data from seven radiometric stations from South Africa and one from Namibia. The results of the proposed methods were compared with a benchmark model, the Linear Spatial Temporal Regression (LSTR) model. Five validation sets each comprised of three stations were chosen. For each validation set, the remaining five stations were used for training. Based on root mean square error, the GP model gave the most accurate forecasts across the validation sets. These results were confirmed by the statistical significance tests using the Giacommini–White test. In terms of coverage probability, there was a 100% coverage on three validation sets and the other two had 97% and 99%. The GP model dominated the other two models. One of the study’s contributions is using standardized forecasts and including a nonlinear trend covariate, which improved the accuracy of the forecasts. The forecasts were combined using a monotone composite quantile regression neural network and a quantile generalized additive model. This modeling framework could be useful to power utility companies in making informed decisions when planning power grid management, including large-scale solar power integration onto the power grid

    Twenty-Four-Hour Ahead Probabilistic Global Horizontal Irradiance Forecasting Using Gaussian Process Regression

    No full text
    Probabilistic solar power forecasting has been critical in Southern Africa because of major shortages of power due to climatic changes and other factors over the past decade. This paper discusses Gaussian process regression (GPR) coupled with core vector regression for short-term hourly global horizontal irradiance (GHI) forecasting. GPR is a powerful Bayesian non-parametric regression method that works well for small data sets and quantifies the uncertainty in the predictions. The choice of a kernel that characterises the covariance function is a crucial issue in Gaussian process regression. In this study, we adopt the minimum enclosing ball (MEB) technique. The MEB improves the forecasting power of GPR because the smaller the ball is, the shorter the training time, hence performance is robust. Forecasting of real-time data was done on two South African radiometric stations, Stellenbosch University (SUN) in a coastal area of the Western Cape Province, and the University of Venda (UNV) station in the Limpopo Province. Variables were selected using the least absolute shrinkage and selection operator via hierarchical interactions. The Bayesian approach using informative priors was used for parameter estimation. Based on the root mean square error, mean absolute error and percentage bias the results showed that the GPR model gives the most accurate predictions compared to those from gradient boosting and support vector regression models, making this study a useful tool for decision-makers and system operators in power utility companies. The main contribution of this paper is in the use of a GPR model coupled with the core vector methodology which is used in forecasting GHI using South African data. This is the first application of GPR coupled with core vector regression in which the minimum enclosing ball is applied on GHI data, to the best of our knowledge

    Short-Term Solar Power Forecasting Using Genetic Algorithms: An Application Using South African Data

    No full text
    Renewable energy forecasts are critical to renewable energy grids and backup plans, operational plans, and short-term power purchases. This paper focused on short-term forecasting of high-frequency global horizontal irradiance data from one of South Africa’s radiometric stations. The aim of the study was to compare the predictive performance of the genetic algorithm and recurrent neural network models with the K-nearest neighbour model, which was used as the benchmark model. Empirical results from the study showed that the genetic algorithm model has the best conditional predictive ability compared to the other two models, making this study a useful tool for decision-makers and system operators in power utility companies. To the best of our knowledge this is the first study which compares the genetic algorithm, the K-nearest neighbour method, and recurrent neural networks in short-term forecasting of global horizontal irradiance data from South Africa

    Modelling Drought Risk Using Bivariate Spatial Extremes: Application to the Limpopo Lowveld Region of South Africa

    No full text
    Weather and climate extremes such as heat waves, droughts and floods are projected to become more frequent and intense in several regions. There is compelling evidence indicating that changes in climate and its extremes over time influence the living conditions of society and the surrounding environment across the globe. This study applies max-stable models to capture the spatio–temporal extremes with dependence. The objective was to analyse the risk of drought caused by extremely high temperatures and deficient rainfall. Hopkin’s statistic was used to assess the clustering tendency before using the agglomerative method of hierarchical clustering to cluster the study area into n=3 temperature clusters and n=3 precipitation clusters. For the precipitation and temperature data, the values of Hopkin’s statistic were 0.7317 and 0.8446, respectively, which shows that both are significantly clusterable. Various max-stable process models were then fitted to each cluster of each variable, and the Schlather model with several covariance functions was found to be a good fit on both datasets compared to the Smith model with the Gaussian covariance function. The modelling approach presented in this paper could be useful to hydrologists, meteorologists and climatologists, including decision-makers in the agricultural sector, in enhancing their understanding of the behaviour of drought caused by extremely high temperatures and low rainfall. The modelling of these compound extremes could also assist in assessing the impact of climate change. It can be seen from this study that the size, including the topography of the location (cluster/region), provides important information about the strength of the extremal dependence

    Probabilistic Flood Height Estimation of the Limpopo River at the Beitbridge using r-Largest Order Statistics

    No full text
    The paper presents modelling of uncertainty in extreme return levels of the flood heights of the Limpopo river at the Beitbridge in which the delta and profile likelihood approaches are used in the estimation of the confidence intervals. The modelling approach discussed in this study is a hybrid modelling framework blending a variety of statistical models, techniques and approaches. Monthly flood height data for the years 1992 to 2014 are used. The method is based on a joint generalised extreme value distribution of the r-largest order statistics. The method is more efficient in its use of data than the traditional single maximum observation per block. Estimation of parameters is done using the maximum likelihood method. Using the r-largest order statistics approach, the paper shows that the flood height data can suitably be modelled by the Gumbel class distribution. The 100-year return level is estimated to be 4.981 metres with a confidence interval estimate of (4.886,5.083) using the profile likelihood method. This study is important as it enables accurate estimation of return levels and periods of extreme flood heights. Such analysis helps in risk mitigation, for example, the design of bridges by civil engineers
    corecore