169 research outputs found

    An investigation of estimation performance for a multivariate Poisson-gamma model with parameter dependency

    Get PDF
    Statistical analysis can be overly reliant on naive assumptions of independence between different data generating processes. This results in having greater uncertainty when estimating underlying characteristics of processes as dependency creates an opportunity to boost sample size by incorporating more data into the analysis. However, this assumes that dependency has been appropriately specified, as mis-specified dependency can provide misleading information from the data. The main aim of this research is to investigate the impact of incorporating dependency into the data analysis. Our motivation for this work is concerned with estimating the reliability of items and as such we have restricted our investigation to study homogeneous Poisson processes (HPP), which can be used to model the rate of occurrence of events such as failures. In an HPP, dependency between rates can occur for numerous reasons. Whether it is similarity in mechanical designs, failure occurrence due to a common management culture or comparable failure count across machines for same failure modes. Multiple types of dependencies are considered. Dependencies can take different forms, such as simple linear dependency measured through the Pearson correlation, rank dependencies which capture non-linear dependencies and tail dependencies where the strength of the dependency may be stronger in extreme events as compared to more moderate one. The estimation of the measure of dependency between correlated processes can be challenging. We develop the research grounded in a Bayes or empirical Bayes inferential framework, where uncertainty in the actual rate of occurrence of a process is modelled with a prior probability distribution. We consider prior distributions to belong to the Gamma distribution given its flexibility and mathematical association with the Poisson process. For dependency modelling between processes we consider copulas which are a convenient and flexible way of capturing a variety of different dependency characteristics between distributions. We use a multivariate Poisson – Gamma probability model. The Poisson process captures aleatory uncertainty, the inherent variability in the data. Whereas the Gamma prior describes the epistemic uncertainty. By pooling processes with correlated underlying mean rate we are able to incorporate data from these processes into the inferential process and reduce the estimation error. There are three key research themes investigated in this thesis. First, to investigate the value in reducing estimation error by incorporating dependency within the analysis via theoretical analysis and simulation experiments. We show that correctly accounting for dependency can significantly reduce the estimation error. The findings should inform analysts a priori as to whether it is worth pursuing a more complex analysis for which the dependency parameter needs to be elicited. Second, to examine the consequences of mis-specifying the degree and form of dependency through controlled simulation experiments. We show the relative robustness of different ways of modelling the dependency using copula and Bayesian methods. The findings should inform analysts about the sensitivity of modelling choices. Third, to show how we can operationalise different methods for representing dependency through an industry case study. We show the consequences for a simple decision problem associated with the provision of spare parts to maintain operation of the industry process when depenency between event rates of the machines is appropriately modelled rather than being treated as independent processes.Statistical analysis can be overly reliant on naive assumptions of independence between different data generating processes. This results in having greater uncertainty when estimating underlying characteristics of processes as dependency creates an opportunity to boost sample size by incorporating more data into the analysis. However, this assumes that dependency has been appropriately specified, as mis-specified dependency can provide misleading information from the data. The main aim of this research is to investigate the impact of incorporating dependency into the data analysis. Our motivation for this work is concerned with estimating the reliability of items and as such we have restricted our investigation to study homogeneous Poisson processes (HPP), which can be used to model the rate of occurrence of events such as failures. In an HPP, dependency between rates can occur for numerous reasons. Whether it is similarity in mechanical designs, failure occurrence due to a common management culture or comparable failure count across machines for same failure modes. Multiple types of dependencies are considered. Dependencies can take different forms, such as simple linear dependency measured through the Pearson correlation, rank dependencies which capture non-linear dependencies and tail dependencies where the strength of the dependency may be stronger in extreme events as compared to more moderate one. The estimation of the measure of dependency between correlated processes can be challenging. We develop the research grounded in a Bayes or empirical Bayes inferential framework, where uncertainty in the actual rate of occurrence of a process is modelled with a prior probability distribution. We consider prior distributions to belong to the Gamma distribution given its flexibility and mathematical association with the Poisson process. For dependency modelling between processes we consider copulas which are a convenient and flexible way of capturing a variety of different dependency characteristics between distributions. We use a multivariate Poisson – Gamma probability model. The Poisson process captures aleatory uncertainty, the inherent variability in the data. Whereas the Gamma prior describes the epistemic uncertainty. By pooling processes with correlated underlying mean rate we are able to incorporate data from these processes into the inferential process and reduce the estimation error. There are three key research themes investigated in this thesis. First, to investigate the value in reducing estimation error by incorporating dependency within the analysis via theoretical analysis and simulation experiments. We show that correctly accounting for dependency can significantly reduce the estimation error. The findings should inform analysts a priori as to whether it is worth pursuing a more complex analysis for which the dependency parameter needs to be elicited. Second, to examine the consequences of mis-specifying the degree and form of dependency through controlled simulation experiments. We show the relative robustness of different ways of modelling the dependency using copula and Bayesian methods. The findings should inform analysts about the sensitivity of modelling choices. Third, to show how we can operationalise different methods for representing dependency through an industry case study. We show the consequences for a simple decision problem associated with the provision of spare parts to maintain operation of the industry process when depenency between event rates of the machines is appropriately modelled rather than being treated as independent processes

    Disease mapping and regression with count data in the presence of overdispersion and spatial autocorrelation: a Bayesian model averaging approach

    Get PDF
    This paper applies the generalised linear model for modelling geographical variation to esophageal cancer incidence data in the Caspian region of Iran. The data have a complex and hierarchical structure that makes them suitable for hierarchical analysis using Bayesian techniques, but with care required to deal with problems arising from counts of events observed in small geographical areas when overdispersion and residual spatial autocorrelation are present. These considerations lead to nine regression models derived from using three probability distributions for count data: Poisson, generalised Poisson and negative binomial, and three different autocorrelation structures. We employ the framework of Bayesian variable selection and a Gibbs sampling based technique to identify significant cancer risk factors. The framework deals with situations where the number of possible models based on different combinations of candidate explanatory variables is large enough such that calculation of posterior probabilities for all models is difficult or infeasible. The evidence from applying the modelling methodology suggests that modelling strategies based on the use of generalised Poisson and negative binomial with spatial autocorrelation work well and provide a robust basis for inference

    Inference for a zero-inflated Conway-Maxwell-Poisson regression for clustered count data.

    Get PDF
    This dissertation is directed toward developing a statistical methodology with applications of the Conway-Maxwell-Poisson (CMP) distribution (Conway, R. W., and Maxwell, W. L., 1962) to count data. The count data for this dissertation exhibit three different characteristics: clustering, zero inflation, and dispersion. Clustering suggests that observations within clusters are correlated, and the zero inflation phenomenon occurs when the data exhibit excessive zero counts. Dispersion implies that the mean is greater/smaller than the variance unlike a Poisson distribution. The dissertation starts with an introduction of inference for a zero-inflated clustered count data in the first chapter. Then, it presents novel methodologies through three different statistical approaches (Chapters 2-4). A marginal regression approach as the second chapter which begins with a description of a zero-inflated CMP model and subsequently develops proper statistical methodologies for estimating marginal regression parameters. Furthermore, various types of simulations are conducted to investigate whether the marginal regression approach leads to the proper statistical inference. This chapter also provides an application to a dental dataset, which is clustered, zero inflated, and dispersed. Chapter 3 develops a mixed effects model including a cluster-specific random effect term. This chapter also addresses numerical challenges of a mixed effects model approach through extensive simulations. For the application of the zero-inflated mixed effects model, next generation sequencing (NGS) data from a maize hybrids experiement is analyzed. While Chapter 3 applies a mixed effects model using the frequentist approach, Chapter 4 develops a Bayesian method to analyze such data under a mixed effects model sturucture. In that chapter, a hurdle model is applied to cope with a zero inflation phenomenon, rather than a zero-inflated model used in both Chapters 2 and 3. Furthermore, Chapter 4 provides the application to the same dental dataset used in Chapter 2. The application section introduces a new factor into a hurdle mixed effects model, which incorporates both fixed effects term and random effects term. Chapter 5 describes the future plan as the concluding chapter

    Pseudo Market Timing: Fact or Fiction?

    Get PDF
    The average firm going public or issuing new equity has underperformed the market in the long run. Endogeneity of the number of new issues has been proposed as a potential explanation of this long-run underperformance. Under pseudo market timing of new issues, ex post measures of average abnormal returns may be negative on average despite zero ex ante abnormal returns. We show that, under reasonable stationarity assumptions on the process generating events, traditional measures of average abnormal returns are consistent, and the pseudo market timing effect is a small sample problem. In simulations of an empirical model we demonstrate that the bias is small even in moderate sample sizes. An abnormal return measure capturing a feasible investment strategy is not biased. We argue that it is unlikely that pseudo market timing is the explanation for the long-run underperformance in equity issuances.Abnormal return measures; Endogenous events; Event studies; Initial public offerings; Long-run underperformance

    A spatial autoregressive Poisson gravity model

    Get PDF
    In this paper, a Poisson gravity model is introduced that incorporates spatial dependence of the explained variable without relying on restrictive distributional assumptions of the underlying data generating process. The model comprises a spatially filtered component - including the origin, destination and origin-destination specific variables - and a spatial residual variable that captures origin- and destination-based spatial autocorrelation. We derive a 2-stage nonlinear least squares estimator that is heteroscedasticity-robust and, thus, controls for the problem of over- or underdispersion that often is present in the empirical analysis of discrete data or, in the case of overdispersion, if spatial autocorrelation is present. This estimator can be shown to have desirable properties for different distributional assumptions, like the observed flows or (spatially) filtered component being either Poisson or Negative Binomial. In our spatial autoregressive model specifcation, the resulting parameter estimates can be interpreted as the implied total impact effects defined as the sum of direct and indirect spatial feedback effects. Monte Carlo results indicate marginal finite sample biases in the mean and standard deviation of the parameter estimates and convergence to the true parameter values as the sample size increases. In addition, the paper illustrates the model by analysing patent citation flows data across European regions

    Development of trip production models incorporating accessibility measures for a rapidly developing region

    Full text link
    Traffic forecasters traditionally rely on stability of populations and land uses to predict future trip data. In a number of rapidly growing cities of the western United States, where the population has been expanding at a pace far greater than the average community in the country, traditional travel demand models using the Four Step Process have been found to be not sufficiently accurate. The focus of this dissertation was to examine whether the predictive ability of traditional trip production models could be improved by the incorporation of accessibility and network variables, when applied to a rapidly growing region. The variables examined were developed on a disaggregate, per household basis using geographic information systems. The purpose of this research was to identify the factors which significantly affect trip production for a rapidly growing area, and to develop a regression model that improves upon the accuracy of trip production models that incorporate traditionally used socioeconomic variables; The travel survey data used in the research was taken from two household travel surveys, from the years 1996 and 2005. The dependent variables in the trip production equations---total number of non-work trips and total number of home-based shopping trips per household---were recorded from the household travel surveys. The three traditionally-used independent variables input into trip production regression equations---the number of persons in each household, the number of vehicles available for use by each household, and the household income---were also taken from the household travel surveys. Data were obtained from Clark County and used to develop additional independent variables. Once the development of variables was completed, regression equations were calibrated. The trip production models were then evaluated statistically, and observations were made; It was concluded that accessibility and transportation network variables can be developed on a disaggregate, per household basis for inclusion into trip production models. Whether or not such models created with the additional variables predict future trip production more effectively than models containing traditional variables proved inconclusive. However, by including the accessibility and transportation network variables in trip production equations, growth can be included in trip generation models

    Exponential-Family Random Graph Models for Valued Networks

    Get PDF
    Exponential-family random graph models (ERGMs) provide a principled and flexible way to model and simulate features common in social networks, such as propensities for homophily, mutuality, and friend-of-a-friend triad closure, through choice of model terms (sufficient statistics). However, those ERGMs modeling the more complex features have, to date, been limited to binary data: presence or absence of ties. Thus, analysis of valued networks, such as those where counts, measurements, or ranks are observed, has necessitated dichotomizing them, losing information and introducing biases. In this work, we generalize ERGMs to valued networks. Focusing on modeling counts, we formulate an ERGM for networks whose ties are counts and discuss issues that arise when moving beyond the binary case. We introduce model terms that generalize and model common social network features for such data and apply these methods to a network dataset whose values are counts of interactions.Comment: 42 pages, including 2 appendixes (3 pages total), 5 figures, 2 tables, 1 algorithm listing; a substantial revision and reorganization: major changes include focus shifted to counts in particular, sections added on modeling actor heterogeneity, a subsection on degeneracy, another example, and an appendix on non-steepness of the CMP distributio

    Methodological quality and reporting of Generalized Linear Mixed Models in clinical medicine (20002012): a systematic review

    Get PDF
    Background: Modeling count and binary data collected in hierarchical designs have increased the use of Generalized Linear Mixed Models (GLMMs) in medicine. This article presents a systematic review of the application and quality of results and information reported from GLMMs in the field of clinical medicine. Methods: A search using the Web of Science database was performed for published original articles in medical journals from 2000 to 2012. The search strategy included the topic ''generalized linear mixed models'',''hierarchical generalized linear models'', ''multilevel generalized linear model'' and as a research domain we refined by science technology. Papers reporting methodological considerations without application, and those that were not involved in clinical medicine or written in English were excluded. Results: A total of 443 articles were detected, with an increase over time in the number of articles. In total, 108 articles fit the inclusion criteria. Of these, 54.6% were declared to be longitudinal studies, whereas 58.3% and 26.9% were defined as repeated measurements and multilevel design, respectively. Twenty-two articles belonged to environmental and occupational public health, 10 articles to clinical neurology, 8 to oncology, and 7 to infectious diseases and pediatrics. The distribution of the response variable was reported in 88% of the articles, predominantly Binomial (n = 64) or Poisson (n = 22). Most of the useful information about GLMMs was not reported in most cases. Variance estimates of random effects were described in only 8 articles (9.2%). The model validation, the method of covariate selection and the method of goodness of fit were only reported in 8.0%, 36.8% and 14.9% of the articles, respectively. Conclusions: During recent years, the use of GLMMs in medical literature has increased to take into account the correlation of data when modeling qualitative data or counts. According to the current recommendations, the quality of reporting has room for improvement regarding the characteristics of the analysis, estimation method, validation, and selection of the model
    • …
    corecore