262 research outputs found
Generalized Additive Models for Location Scale and Shape (GAMLSS) in R
GAMLSS is a general framework for fitting regression type models where the distribution of the response variable does not have to belong to the exponential family and includes highly skew and kurtotic continuous and discrete distribution. GAMLSS allows all the parameters of the distribution of the response variable to be modelled as linear/non-linear or smooth functions of the explanatory variables. This paper starts by defining the statistical framework of GAMLSS, then describes the current implementation of GAMLSS in R and finally gives four different data examples to demonstrate how GAMLSS can be used for statistical modelling.
Generalized Additive Models for Location Scale and Shape (GAMLSS) in R
GAMLSS is a general framework for fitting regression type models where the distribution of the response variable does not have to belong to the exponential family and includes highly skew and kurtotic continuous and discrete distribution. GAMLSS allows all the parameters of the distribution of the response variable to be modelled as linear/non-linear or smooth functions of the explanatory variables. This paper starts by defining the statistical framework of GAMLSS, then describes the current implementation of GAMLSS in R and finally gives four different data examples to demonstrate how GAMLSS can be used for statistical modelling
Centile estimation for a proportion response variable
This paper introduces two general models for computing centiles when the response variable Y can take values between 0 and 1, inclusive of 0 or 1. The models developed are more flexible alternatives to the beta inflated distribution. The first proposed model employs a flexible four parameter logit skew Student t (logitSST) distribution to model the response variable Y on the unit interval (0, 1), excluding 0 and 1. This model is then extended to the inflated logitSST distribution for Y on the unit interval, including 1. The second model developed in this paper is a generalised Tobit model for Y on the unit interval, including 1. Applying these two models to (1-Y) rather than Y enables modelling of Y on the unit interval including 0 rather than 1. An application of the new models to real data shows that they can provide superior fits
GAMLSS: a distributional regression approach
A tutorial of the generalized additive models for location, scale and shape (GAMLSS) is given here using two examples. GAMLSS is a general framework for performing regression analysis where not only the location (e.g., the mean) of the distribution but also the scale and shape of the distribution can be modelled by explanatory variables
Gaussian Markov random field spatial models in GAMLSS
This paper describes the modelling and fitting of Gaussian Markov random field spatial components within a Generalized Additive-Model for Location, Scale and Shape (GAMLSS) model. This allows modelling of any or all the parameters of the distribution for the response variable using explanatory variables and spatial effects. The response variable distribution is allowed to be a non-exponential family distribution. A new package developed in R to achieve this is presented. We use Gaussian Markov random fields to model the spatial effect in Munich rent data and explore some features and characteristics of the data. The potential of using spatial analysis within GAMLSS is discussed. We argue that the flexibility of parametric distributions, ability to model all the parameters of the distribution and diagnostic tools of GAMLSS provide an ideal environment for modelling spatial features of data
Modelling location, scale and shape parameters of the birnbaumsaunders generalized t distribution
The Birnbaum-Saunders generalized t (BSGT) distribution is a very flflexible family of distributions that admits different degrees of skewness and kurtosis and includes some important special or limiting cases available in the literature, such as the Birnbaum-Saunders and Birnbaum-Saunders t distributions. In this paper we provide a regression type model to the BSGT distribution based on the generalized additive models for location, scale and shape (GAMLSS) framework. The resulting model has high flflexibility and therefore a great potential to model the distribution parameters of response variables that present light or heavy tails, i.e. platykurtic or leptokurtic shapes, as functions of explanatory variables. For different parameter settings, some simulations are performed to investigate the behavior of the estimators. The potentiality of the new regression model is illustrated by means of a real motor vehicle insurance data set
Recommended from our members
Association of severity of primary open-angle glaucoma with serum vitamin D levels in patients of African descent.
PurposeTo study the relationship between primary open-angle glaucoma (POAG) in a cohort of patients of African descent (AD) and serum vitamin D levels.MethodsA subset of the AD and glaucoma evaluation study III (ADAGES III) cohort, consisting of 357 patients with a diagnosis of POAG and 178 normal controls of self-reported AD, were included in this analysis. Demographic information, family history, and blood samples were collected from all the participants. All the subjects underwent clinical evaluation, including visual field (VF) mean deviation (MD), central cornea thickness (CCT), intraocular pressure (IOP), and height and weight measurements. POAG patients were classified into early and advanced phenotypes based on the severity of their visual field damage, and they were matched for age, gender, and history of hypertension and diabetes. Serum 25-Hydroxy (25-OH) vitamin D levels were measured by enzyme-linked immunosorbent assay (ELISA). The association of serum vitamin D levels with the development and severity of POAG was tested by analysis of variance (ANOVA) and the paired t-test.ResultsThe 178 early POAG subjects had a visual field MD of better than -4.0 dB, and the 179 advanced glaucoma subjects had a visual field MD of worse than -10 dB. The mean (95% confidence interval [CI]) levels of vitamin D of the subjects in the control (8.02 ± 6.19 pg/ml) and early phenotype (7.56 ± 5.74 pg/ml) groups were significantly or marginally significantly different from the levels observed in subjects with the advanced phenotype (6.35 ± 4.76 pg/ml; p = 0.0117 and 0.0543, respectively). In contrast, the mean serum vitamin D level in controls was not significantly different from that of the subjects with the early glaucoma phenotype (p = 0.8508).ConclusionsIn this AD cohort, patients with advanced glaucoma had lower serum levels of vitamin D compared with early glaucoma and normal subjects
A new continuous distribution on the unit interval applied to modelling the points ratio of football teams
We introduce a new flexible distribution to deal with variables on the unit interval based on a transformation of the sinh–arcsinh distribution, which accommodates different degrees of skewness and kurtosis and becomes an interesting alternative to model this type of data. We also include this new distribution into the generalised additive models for location, scale and shape (GAMLSS) framework in order to develop and fit its regression model. For different parameter settings, some simulations are performed to investigate the behaviour of the estimators. The potentiality of the new regression model is illustrated by means of a real dataset related to the points rate of football teams at the end of a championship from the four most important leagues in the world: Barclays Premier League (England), Bundesliga (Germany), Serie A (Italy) and BBVA league (Spain) during three seasons (2011–2012, 2012–2013 and 2013–2014)
Principal component regression in GAMLSS applied to Greek-German government bond yield spreads
A solution to the problem of having to deal with a large number of interrelated explanatory variables within a generalized additive model for location, scale, and shape (GAMLSS) is given here using as an example the Greek-German government bond yield spreads from the 25th of April 2005 to the 31th of March 2010. Those were turbulent financial years, and in order to capture the spreads behaviour, a model has to be able to deal with the complex nature of the financial indicators used to predict the spreads. Fitting a model, using principal components regression of both main and first order interaction terms, for all the parameters of the assumed distribution of the response variable seems to produce promising results
Retention of computing students in a London-based university during the Covid-19 pandemic using learned optimism as a lens: a statistical analysis in R
The aim of this research project is to investigate the low retention rate among the foundation and first year undergraduate students from the School of Computing and Digital Media in a London based university. Specifically, the research is conducted during the Covid-19 pandemic using learned optimism as a lens. The research will aid the university to improve retention rate as the overall dropout has been increasing in the last few years. The current study employed an exploratory investigation approach by using statistical modelling analysis in R to predict behavioural patterns. The quantitative data analysis conducted aims to support the efforts of the School of Computing and Digital Media of a London based university to re-evaluate its retention strategies in foundation and first year computing students. The main outcomes of the analysis is that students with a foreign qualification are optimistic, while students with other or not known qualification are mildly pessimistic. In addition, students with a BTECH, Higher Education diploma or A level qualification are generally more pessimistic especially if they are also black ethnicity, or are also not black ethnicity, aged under 34 and British
- …