916 research outputs found

    New Flexible Regression Models Generated by Gamma Random Variables with Censored Data

    Get PDF
    We propose and study a new log-gamma Weibull regression model. We obtain explicit expressions for the raw and incomplete moments, quantile and generating functions and mean deviations of the log-gamma Weibull distribution. We demonstrate that the new regression model can be applied to censored data since it represents a parametric family of models which includes as sub-models several widely-known regression models and therefore can be used more effectively in the analysis of survival data. We obtain the maximum likelihood estimates of the model parameters by considering censored data and evaluate local influence on the estimates of the parameters by taking different perturbation schemes. Some global-influence measurements are also investigated. Further, for different parameter settings, sample sizes and censoring percentages, various simulations are performed. In addition, the empirical distribution of some modified residuals are displayed and compared with the standard normal distribution. These studies suggest that the residual analysis usually performed in normal linear regression models can be extended to a modified deviance residual in the proposed regression model applied to censored data. We demonstrate that our extended regression model is very useful to the analysis of real data and may give more realistic fits than other special regression models

    An application of Bayesian variable selection to international economic data

    Get PDF
    Master's Project (M.S.) University of Alaska Fairbanks, 2017GDP plays an important role in people's lives. For example, when GDP increases, the unemployment rate will frequently decrease. In this project, we will use four different Bayesian variable selection methods to verify economic theory regarding important predictors to GDP. The four methods are: g-prior variable selection with credible intervals, local empirical Bayes with credible intervals, variable selection by indicator function, and hyper-g prior variable selection. Also, we will use four measures to compare the results of the various Bayesian variable selection methods: AIC, BIC, Adjusted-R squared and cross-validation

    Detection and down-weighting of outliers in non-normal data: theory and application

    Get PDF

    Modelling graft survival after kidney transplantation using semi-parametric and parametric survival models

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, South Africa, in ful lment of the requirements for the degree of Master of Science in Statistics November, 2017This study presents survival modelling and evaluation of risk factors of graft survival in the context of kidney transplant data generated in South Africa. Beyond the Kaplan-Meier estimator, the Cox proportional hazard (PH) model is the standard method used in identifying risk factors of graft survival after kidney transplant. The Cox PH model depends on the proportional hazard assumption, which is rarely met. Assessing and accounting for this assumption is necessary before using this model. When the PH assumption is not valid, modi cation of the Cox PH model could o er more insight into parameter estimates and the e ect of time-varying predictors at di erent time points. This study aims to identify the survival model that will e ectively describe the study data by employing the Cox PH and parametric accelerated failure time (AFT) models. To identify the risk factors that mediate graft survival after kidney transplant, secondary data involving 751 adults that received a single kidney transplant in Charlotte Maxeke Johannesburg Academic Hospital between 1984 and 2004 was analysed. The graft survival of these patients was analysed in three phases (overall, short-term and long-term) based on the follow-up times. The Cox PH and AFT models were employed to determine the signi cant risk factors. The purposeful method of variable selection based on the Cox PH model was used for model building. The performance of each model was assessed using the Cox-Snell residuals and the Akaike Information Criterion. The t of the appropriate model was evaluated using deviance residuals and the delta-beta statistics. In order to further assess how appropriately the best model t the study data for each time period, we simulated a right-censored survival data based on the model parameter-estimates. Overall, the PH assumption was violated in this study. By extending the standard Cox PH model, the resulting models out-performed the standard Cox PH model. The evaluation methods suggest that the Weibull model is the most appropriate in describing the overall graft survival, while the log-normal model is more reasonable in describing short-and long-term graft survival. Generally, the AFT models out-performed the standard Cox regression model in all the analyses. The simulation study resulted in parameter estimates comparable with the estimates from the real data. Factors that signi cantly in uenced graft survival are recipient age, donor type, diabetes, delayed graft function, ethnicity, no surgical complications, and interaction between recipient age and diabetes. Statistical inferences made from the appropriate survival model could impact on clinical practices with regards to kidney transplant in South Africa. Finally, limitations of the study are discussed in the context of further studies.MT 201

    Count data modelling application.

    Get PDF
    Masters Degree. University of KwaZulu-Natal, Durban.The rapid increase of total children ever born without a proportionate growth in the Nigerian economy has been a concern and making prediction with count data requires applying appropriate regression model.. As count data assumes discrete, non-negative values, a Poisson distribution is the ideal distribution to describe this data, but it is deficient due to equality of variance and mean. This deficiency results in under/over-dispersion and the estimation of the standard errors will be biased rendering the test statistics incorrect. This study aimed to model count data with the application of total children ever born using a Negative Binomial and Generalized Poisson regression The Nigeria Demographic and Health Survey 2013 data of women within the age of 15-49 years were used and three models applied to investigate the factors affecting the number of children ever born. A predictive count modelling was also carried out based on the performance evaluation metrics (root mean square error, mean absolute error, R-squared and mean square error). In the inferential modeling, Generalized Poisson Model was found to be superior with age of household head (<.0001), age of respondent at the time of first birth (<.0001), urban-rural status (<.0001), and religion (<.0001) being significantly associated with total children ever born. In the predictive modeling, all the three models showed almost identical performance evaluation metrics but Poisson regression was chosen as the best because it is the simplest model. In conclusion, early marriage, religious belief and unawareness of women who dwell in rural areas should be checked to control total children ever born in Nigeria.Supervisor Professor Zewotir prefers using his publications name of Zewotir, Temesgen

    Comparisons of boosted regression tree, GLM and GAM performance in the standardization of yellowfin tuna catch-rate data from the Gulf of Mexico lonline [sic] fishery

    Get PDF
    Recent advances in statistical understanding have focused fisheries research attention on addressing the theoretical and statistical issues encountered in standardizing catch-rate data. Similarly, the present study evaluates the performance of boosted regression trees (BRT), the product of recent progress in machine learning technology, as a potential tool for catch-rate standardization. The BRT method provides a number of advantages over the traditional GLM and GAM approaches including, but not limited to: robust parameter estimates as a result of the integrated stochastic gradient boosting algorithm; model structure learned from data and not determined a priori, thereby avoiding assumptions required for model specification; and easy implementation of complex and/or multi-way interactions. Performance of the BRT method was evaluated comparatively, where GLM, GAM and BRT main-effects models, and a BRT two-way model, were trained using zero-truncated, lognormal catch-rate data, with identical predictors and dataset. Data used were observer-collected records of yellowfin tuna catch from the Gulf of Mexico longline fishery, 1998-2005. Model comparisons were based, primarily, on percent deviance explained by the trained models and prediction error using a test dataset, measured as root mean squared error (RMSE). Secondarily, the relative influence of model predictors and handling of spatially correlated error structures by each of the four models were examined. Fitted GLM, GAM, BRT and BRT two-way models accounted for 19.56%, 25.10%, 26.10% and 37.3% of total model deviance, respectively. RMSE values for the GLM (0.3552), GAM (0.3554), BRT (0.3546) and BRT two-way (0.3509) models indicate that the BRT-based models performed marginally better than the traditional GLM and GAM methods, with lower prediction error. Indices of predictor influence and spatial analysis of model residuals, for the main-effects models, suggest GAM and BRT models perform comparably in the partitioning of variance amongst predictors and handling of autocorrelated variance structures. Overall, results of the main-effects models indicate that the BRT method is as equally adept as GAMs in fitting non-linear responses, however unlike the GAM, the BRT avoided overfitting the data, thereby providing more robust estimates. The BRT two-way interaction model further demonstrates: the ability of the BRT method in fitting complex models, while avoiding overfitting; the ease with which interactions can be incorporated and specific terms extracted, such as the year term; and the potential role of complex interactions in accounting for non-stationary processes. Although the results presented here are not definitive, for every measure of performance examined the BRT-based models performed as equally well or better than the traditional GLM/GAM standardization methods, thereby confirming the utility of the BRT method for catch standardization purposes

    Determining the Safety of Urban Arterial Roads

    Get PDF
    The purpose of this project was to investigate the safety of urban arterial non-access controlled roads in Worcester, Massachusetts. An investigation into the dependent variable proved inconclusive and the historical accident rate was used. The best functional form for these roads was unclear so both linear and log-linear models were developed. A linear model was developed that predicted the total accident crash rate and log-linear model was developed to predict the same thing. A second linear model was developed to predict the total injury accident crash rate. The models were validated using independent data where the linear total accident crash rate model was found to be the most robust of the three in that both state primary roads and other arterial roads could have crash rates predicted to a better than fifty percent error
    • …
    corecore