36,485 research outputs found

    A Bayesian generalized random regression model for estimating heritability using overdispersed count data

    Get PDF
    Background: Faecal egg counts are a common indicator of nematode infection and since it is a heritable trait, it provides a marker for selective breeding. However, since resistance to disease changes as the adaptive immune system develops, quantifying temporal changes in heritability could help improve selective breeding programs. Faecal egg counts can be extremely skewed and difficult to handle statistically. Therefore, previous heritability analyses have log transformed faecal egg counts to estimate heritability on a latent scale. However, such transformations may not always be appropriate. In addition, analyses of faecal egg counts have typically used univariate rather than multivariate analyses such as random regression that are appropriate when traits are correlated. We present a method for estimating the heritability of untransformed faecal egg counts over the grazing season using random regression. Results: Replicating standard univariate analyses, we showed the dependence of heritability estimates on choice of transformation. Then, using a multitrait model, we exposed temporal correlations, highlighting the need for a random regression approach. Since random regression can sometimes involve the estimation of more parameters than observations or result in computationally intractable problems, we chose to investigate reduced rank random regression. Using standard software (WOMBAT), we discuss the estimation of variance components for log transformed data using both full and reduced rank analyses. Then, we modelled the untransformed data assuming it to be negative binomially distributed and used Metropolis Hastings to fit a generalized reduced rank random regression model with an additive genetic, permanent environmental and maternal effect. These three variance components explained more than 80 % of the total phenotypic variation, whereas the variance components for the log transformed data accounted for considerably less. The heritability, on a link scale, increased from around 0.25 at the beginning of the grazing season to around 0.4 at the end. Conclusions: Random regressions are a useful tool for quantifying sources of variation across time. Our MCMC (Markov chain Monte Carlo) algorithm provides a flexible approach to fitting random regression models to non-normal data. Here we applied the algorithm to negative binomially distributed faecal egg count data, but this method is readily applicable to other types of overdispersed data

    Estimation and Regularization Techniques for Regression Models with Multidimensional Prediction Functions

    Get PDF
    Boosting is one of the most important methods for fitting regression models and building prediction rules from high-dimensional data. A notable feature of boosting is that the technique has a built-in mechanism for shrinking coefficient estimates and variable selection. This regularization mechanism makes boosting a suitable method for analyzing data characterized by small sample sizes and large numbers of predictors. We extend the existing methodology by developing a boosting method for prediction functions with multiple components. Such multidimensional functions occur in many types of statistical models, for example in count data models and in models involving outcome variables with a mixture distribution. As will be demonstrated, the new algorithm is suitable for both the estimation of the prediction function and regularization of the estimates. In addition, nuisance parameters can be estimated simultaneously with the prediction function

    Conditional Heteroskedasticity in Count Data Regression: Self-Feeding Activity in Fish

    Get PDF
    The paper introduces a new approach to incorporating time dependent overdispersion for Poisson related regression models. To handle the added flexibility in conditional heteroskedasticity in time series count data some wellknown estimators are adapted and a GMM type estimator is suggested. The estimators are applied to a time series of self-feeding activity in Arctic charr. There is strong support for both a dynamic conditional mean function and a dynamic model for the overdispersion.Poisson; Overdispersion; ARCH; Estimation; Self-Feeding; Arctic Charr

    Statistical and Economic Evaluation of Time Series Models for Forecasting Arrivals at Call Centers

    Full text link
    Call centers' managers are interested in obtaining accurate point and distributional forecasts of call arrivals in order to achieve an optimal balance between service quality and operating costs. We present a strategy for selecting forecast models of call arrivals which is based on three pillars: (i) flexibility of the loss function; (ii) statistical evaluation of forecast accuracy; (iii) economic evaluation of forecast performance using money metrics. We implement fourteen time series models and seven forecast combination schemes on three series of daily call arrivals. Although we focus mainly on point forecasts, we also analyze density forecast evaluation. We show that second moments modeling is important both for point and density forecasting and that the simple Seasonal Random Walk model is always outperformed by more general specifications. Our results suggest that call center managers should invest in the use of forecast models which describe both first and second moments of call arrivals

    Statistical models for over-dispersion in the frequency of peaks over threshold data for a flow series.

    Get PDF
    In a peaks over threshold analysis of a series of river flows, a sufficiently high threshold is used to extract the peaks of independent flood events. This paper reviews existing, and proposes new, statistical models for both the annual counts of such events and the process of event peak times. The most common existing model for the process of event times is a homogeneous Poisson process. This model is motivated by asymptotic theory. However, empirical evidence suggests that it is not the most appropriate model, since it implies that the mean and variance of the annual counts are the same, whereas the counts appear to be overdispersed, i.e., have a larger variance than mean. This paper describes how the homogeneous Poisson process can be extended to incorporate time variation in the rate at which events occur and so help to account for overdispersion in annual counts through the use of regression and mixed models. The implications of these new models on the implied probability distribution of the annual maxima are also discussed. The models are illustrated using a historical flow series from the River Thames at Kingston

    Alternative distributions for observation driven count series models

    Get PDF
    Observation-driven models provide a flexible framework for modelling time series of counts. They are able to capture a wide range of dependence structures. Many applications in this field of research are concerned with count series whose conditional distribution given past observations and explanatory variables is assumed to follow a Poisson distribution. This assumption is very convenient since the Poisson distribution is simple and leads to models which are easy to implement. On the other hand this assumption is often too restrictive since it implies equidispersion, the fact that the conditional mean equals the conditional variance. This assumption is often violated in empirical applications. Therefore more flexible distributions which allow for overdispersion or underdispersion should be used. This paper is concerned with the use of alternative distributions in the framework of observationdriven count series models. In this paper different count distributions and their properties are reviewed and used for modelling. The models under consideration are applied to a time series of daily counts of asthma presentations at a Sydney hospital. This data set has already been analyzed by Davis et al. (1999, 2000). The Poisson-GLARMA model proposed by these authors is used as a benchmark. This paper extends the work of Davis et al. (1999) to distributions which are nested in either the generalized negative binomial or the generalized Poisson distribution. Additionally the maximum likelihood estimation for observation-driven models with generalized distributions is presented in this paper. --Count series,observation-driven models,GLARMA,dicrete distributions

    Smooth-car mixed models for spatial count data

    Get PDF
    Penalized splines (P-splines) and individual random effects are used for the analysis of spatial count data. P-splines are represented as mixed models to give a unified approach to the model estimation procedure. First, a model where the spatial variation is modelled by a two-dimensional P-spline at the centroids of the areas or regions is considered. In addition, individual area-effects are incorporated as random effects to account for individual variation among regions. Finally, the model is extended by considering a conditional autoregressive (CAR) structure for the random effects, these are the so called “Smooth-CAR” models, with the aim of separating the large-scale geographical trend, and local spatial correlation. The methodology proposed is applied to the analysis of lip cancer incidence rates in Scotland
    corecore