13 research outputs found

    Some non-standard statistical dependence problems

    Get PDF
    Philosophiae Doctor - PhDThe major result of this thesis is the development of a framework for the application of pair-mixtures of copulas to model asymmetric dependencies in bivariate data. The main motivation is the inadequacy of mixtures of bivariate Gaussian models which are commonly fitted to data. Mixtures of rotated single parameter Archimedean and Gaussian copulas are fitted to real data sets. The method of maximum likelihood is used for parameter estimation. Goodness-of-fit tests performed on the models giving the highest log-likelihood values show that the models fit the data well. We use mixtures of univariate Gaussian models and mixtures of regression models to investigate the existence of bimodality in the distribution of the widths of autocorrelation functions in a sample of 119 gamma-ray bursts. Contrary to previous findings, our results do not reveal any evidence of bimodality. We extend a study by Genest et al. (2012) of the power and significance levels of tests of copula symmetry, to two copula models which have not been considered previously. Our results confirm that for small sample sizes, these tests fail to maintain their 5% significance level and that the Cramer-von Mises-type statistics are the most powerful

    Robust modelling framework for short-term forecasting of global horizontal irradiance

    Get PDF
    The increasing demand for electricity and the need for clean energy sources have increased solar energy use. Accurate forecasts of solar energy are required for easy management of the grid. This paper compares the accuracy of two Gaussian Process Regression (GPR) models combined with Additive Quantile Regression (AQR) and Bayesian Structural Time Series (BSTS) models in the 2-day ahead forecasting of global horizontal irradiance using data from the University of Pretoria from July 2020 to August 2021. Four methods were adopted for variable selection, Lasso, ElasticNet, Boruta, and GBR (Gradient Boosting Regression). The variables selected using GBR were used because they produced the lowest MAE (Minimum Absolute Errors) value. A comparison of seven models GPR (Gaussian Process Regression), Two-layer DGPR (Two-layer Deep Gaussian Process Regression), bstslong (Bayesian Structural Time Series long), AQRA (Additive Quantile Regression Averaging), QRNN(Quantile Regression Neural Network), PLAQR(Partial Linear additive Quantile Regression), and Opera(Online Prediction by ExpRt Aggregation) was made. The evaluation metrics used to select the best model were the MAE (Mean Absolute Error) and RMSE (Root Mean Square Error). Further evaluations were done using proper scoring rules and Murphy diagrams. The best individual model was found to be the GPR. The best forecast combination was AQRA ((AQR Averaging) based on MAE. However, based on RMSE, GPNN was the best forecast combination method. Companies such as Eskom could use the methods adopted in this study to control and manage the power grid. The results will promote economic development and sustainability of energy resources.Comment: 25 pages, 12 figures and 7 table

    Regularisation in discrete survival models: A comparison of lasso and gradient boosting

    Get PDF
    We present the results of a simulation study performed to compare the accuracy of a lassotype penalization method and gradient boosting in estimating the baseline hazard function and covariate parameters in discrete survival models. The mean square error results reveal that the lasso-type algorithm performs better in recovering the baseline hazard and covariate parameters. In particular, gradient boosting underestimates the sizes of the parameters and also has a high false positive rate. Similar results are obtained in an application to real-life data

    Short term electricity demand forecasting using partially linear additive quantile regression with an application to the unit commitment problem

    Get PDF
    Abstract Short term probabilistic load forecasting is essential for any power generating utility. This paper discusses an application of partially linear additive quantile regression models for predicting short term electricity demand during the peak demand hours (i.e. from 18:00 to 20:00) using South African data for January 2009 to June 2012. Additionally the bounded variable mixed integer linear programming technique is used on the forecasts obtained in order to find an optimal number of units to commit (switch on or off. Variable selection is done using the least absolute shrinkage and selection operator. Results from the unit commitment problem show that it is very costly to use gas fired generating units. These were not selected as part of the optimal solution. It is shown that the optimal solutions based on median forecasts ( Q 0.5 quantile forecasts) are the same as those from the 99th quantile forecasts except for generating unit g 8 c , which is a coal fired unit. This shows that for any increase in demand above the median quantile forecasts it will be economical to increase the generation of electricity from generating unit g 8 c . The main contribution of this study is in the use of nonlinear trend variables and the combining of forecasting with the unit commitment problem. The study should be useful to system operators in power utility companies in the unit commitment scheduling and dispatching of electricity at a minimal cost particularly during the peak period when the grid is constrained due to increased demand for electricity

    Bayesian spatial modelling of intimate partner violence and associated factors among adult women and men : evidence from 2019/2020 Rwanda demographic and health survey

    Get PDF
    DATA AVAILABILITY : The dataset generated and analysed during the current study are not publicly available since we received a data access letter from the DHS team https:// dhsprogram.com/ specific to our project but are available from the DHS team upon request.BACKGROUND : Intimate partner violence (IPV) remains a global public health concern for both men and women. Spatial mapping and clustering analysis can reveal subtle patterns in IPV occurrences but are yet to be explored in Rwanda, especially at a lower small-area scale. This study seeks to examine the spatial distribution, patterns, and associated factors of IPV among men and women in Rwanda. METHODS : This was a secondary data analysis of the 2019/2020 Rwanda Demographic and Health Survey (RDHS) individual-level data set for 1947 women aged 15–49 years and 1371 men aged 15–59 years. A spatially structured additive logistic regression model was used to assess risk factors for IPV while adjusting for spatial effects. The districtlevel spatial model was adjusted for fixed covariate effects and was implemented using a fully Bayesian inference within the generalized additive mixed effects framework. RESULTS : IPV prevalence amongst women was 45.9% (95% Confidence interval (CI): 43.4–48.5%) while that for men was 18.4% (95% CI: 16.2–20.9%). Using a bivariate choropleth, IPV perpetrated against women was higher in the North-Western districts of Rwanda whereas for men it was shown to be more prevalent in the Southern districts. A few districts presented high IPV for both men and women. The spatial structured additive logistic model revealed higher odds for IPV against women mainly in the North-western districts and the spatial effects were dominated by spatially structured effects contributing 64%. Higher odds of IPV were observed for men in the Southern districts of Rwanda and spatial effects were dominated by district heterogeneity accounting for 62%. There were no statistically significant district clusters for IPV in both men or women. Women with partners who consume alcohol, and with controlling partners were at significantly higher odds of IPV while those in rich households and making financial decisions together with partners were at lower odds of experiencing IPV. CONCLUSION : Campaigns against IPV should be strengthened, especially in the North-Western and Southern parts of Rwanda. In addition, the promotion of girl-child education and empowerment of women can potentially reduce IPV against women and girls. Furthermore, couples should be trained on making financial decisions together. In conclusion, the implementation of policies and interventions that discourage alcohol consumption and control behaviour, especially among men, should be rolled out.https://bmcpublichealth.biomedcentral.comam2024School of Health Systems and Public Health (SHSPH)SDG-03:Good heatlh and well-beingSDG-05:Gender equalit

    Prediction of Foreign Direct Investment: an Application to South African Data

    No full text
    Foreign direct investment is considered as a vehicle for transferring new ideas, capital, superior technology and skills from developed countries to developing countries. Kernel quantile regression is used in this study to estimate the relationship between foreign direct investment and the factors influencing it in South Africa, using data for the period 1996 to 2015. Using the least absolute shrinkage and selection operator technique, all the variables were selected to be in the models. The developed kernel quantile regression models were used for forecasting the future inflow of foreign direct investment in South Africa. The forecast evaluation was done on all the models and the model based on the ANOVA radial basis kernel was selected as the best in terms of the accuracy measures (mean absolute percentage error, root mean square error and mean absolute error). The forecasts from the individual models were then combined using linear quantile regression averaging. The kernel quantile regression model using an ANOVA radial basis kernel was found to be the best model for forecasting foreign direct investment in South Africa. Accurate forecasts of FDI aid in economic planning. Identification of key drivers of FDI inflow can assist in crafting strategies to attract more FDI

    Short-Term Solar Power Forecasting Using Genetic Algorithms: An Application Using South African Data

    No full text
    Renewable energy forecasts are critical to renewable energy grids and backup plans, operational plans, and short-term power purchases. This paper focused on short-term forecasting of high-frequency global horizontal irradiance data from one of South Africa’s radiometric stations. The aim of the study was to compare the predictive performance of the genetic algorithm and recurrent neural network models with the K-nearest neighbour model, which was used as the benchmark model. Empirical results from the study showed that the genetic algorithm model has the best conditional predictive ability compared to the other two models, making this study a useful tool for decision-makers and system operators in power utility companies. To the best of our knowledge this is the first study which compares the genetic algorithm, the K-nearest neighbour method, and recurrent neural networks in short-term forecasting of global horizontal irradiance data from South Africa

    Twenty-Four-Hour Ahead Probabilistic Global Horizontal Irradiance Forecasting Using Gaussian Process Regression

    No full text
    Probabilistic solar power forecasting has been critical in Southern Africa because of major shortages of power due to climatic changes and other factors over the past decade. This paper discusses Gaussian process regression (GPR) coupled with core vector regression for short-term hourly global horizontal irradiance (GHI) forecasting. GPR is a powerful Bayesian non-parametric regression method that works well for small data sets and quantifies the uncertainty in the predictions. The choice of a kernel that characterises the covariance function is a crucial issue in Gaussian process regression. In this study, we adopt the minimum enclosing ball (MEB) technique. The MEB improves the forecasting power of GPR because the smaller the ball is, the shorter the training time, hence performance is robust. Forecasting of real-time data was done on two South African radiometric stations, Stellenbosch University (SUN) in a coastal area of the Western Cape Province, and the University of Venda (UNV) station in the Limpopo Province. Variables were selected using the least absolute shrinkage and selection operator via hierarchical interactions. The Bayesian approach using informative priors was used for parameter estimation. Based on the root mean square error, mean absolute error and percentage bias the results showed that the GPR model gives the most accurate predictions compared to those from gradient boosting and support vector regression models, making this study a useful tool for decision-makers and system operators in power utility companies. The main contribution of this paper is in the use of a GPR model coupled with the core vector methodology which is used in forecasting GHI using South African data. This is the first application of GPR coupled with core vector regression in which the minimum enclosing ball is applied on GHI data, to the best of our knowledge

    Spatio-Temporal Forecasting of Global Horizontal Irradiance Using Bayesian Inference

    No full text
    Accurate global horizontal irradiance (GHI) forecasting promotes power grid stability. Most of the research on solar irradiance forecasting has been based on a single-site analysis. It is crucial to explore multisite modeling to capture variations in weather conditions between various sites, thereby producing a more robust model. In this research, we propose the use of spatial regression coupled with Gaussian Process Regression (GP Spatial) and the GP Autoregressive Spatial model (GP-AR Spatial) for the prediction of GHI using data from seven radiometric stations from South Africa and one from Namibia. The results of the proposed methods were compared with a benchmark model, the Linear Spatial Temporal Regression (LSTR) model. Five validation sets each comprised of three stations were chosen. For each validation set, the remaining five stations were used for training. Based on root mean square error, the GP model gave the most accurate forecasts across the validation sets. These results were confirmed by the statistical significance tests using the Giacommini–White test. In terms of coverage probability, there was a 100% coverage on three validation sets and the other two had 97% and 99%. The GP model dominated the other two models. One of the study’s contributions is using standardized forecasts and including a nonlinear trend covariate, which improved the accuracy of the forecasts. The forecasts were combined using a monotone composite quantile regression neural network and a quantile generalized additive model. This modeling framework could be useful to power utility companies in making informed decisions when planning power grid management, including large-scale solar power integration onto the power grid

    Modelling Drought Risk Using Bivariate Spatial Extremes: Application to the Limpopo Lowveld Region of South Africa

    No full text
    Weather and climate extremes such as heat waves, droughts and floods are projected to become more frequent and intense in several regions. There is compelling evidence indicating that changes in climate and its extremes over time influence the living conditions of society and the surrounding environment across the globe. This study applies max-stable models to capture the spatio–temporal extremes with dependence. The objective was to analyse the risk of drought caused by extremely high temperatures and deficient rainfall. Hopkin’s statistic was used to assess the clustering tendency before using the agglomerative method of hierarchical clustering to cluster the study area into n=3 temperature clusters and n=3 precipitation clusters. For the precipitation and temperature data, the values of Hopkin’s statistic were 0.7317 and 0.8446, respectively, which shows that both are significantly clusterable. Various max-stable process models were then fitted to each cluster of each variable, and the Schlather model with several covariance functions was found to be a good fit on both datasets compared to the Smith model with the Gaussian covariance function. The modelling approach presented in this paper could be useful to hydrologists, meteorologists and climatologists, including decision-makers in the agricultural sector, in enhancing their understanding of the behaviour of drought caused by extremely high temperatures and low rainfall. The modelling of these compound extremes could also assist in assessing the impact of climate change. It can be seen from this study that the size, including the topography of the location (cluster/region), provides important information about the strength of the extremal dependence
    corecore