33,793 research outputs found
A Poisson Mixed Model with Nonnormal Random Effect Distribution
We propose in this paper a random intercept Poisson model in which the random
effect distribution is assumed to follow a generalized log-gamma (GLG)
distribution. We derive the first two moments for the marginal distribution as
well as the intraclass correlation. Even though numerical integration methods
are in general required for deriving the marginal models, we obtain the
multivariate negative binomial model for a particular parameter setting of the
hierarchical model. An iterative process is derived for obtaining the maximum
likelihood estimates for the parameters in the multivariate negative binomial
model. Residual analysis are proposed and two applications with real data are
given for illustration.Comment: Submitted in the Computational Statistics & Data Analysis journa
On the Inversion of High Energy Proton
Inversion of the K-fold stochastic autoconvolution integral equation is an
elementary nonlinear problem, yet there are no de facto methods to solve it
with finite statistics. To fix this problem, we introduce a novel inverse
algorithm based on a combination of minimization of relative entropy, the Fast
Fourier Transform and a recursive version of Efron's bootstrap. This gives us
power to obtain new perspectives on non-perturbative high energy QCD, such as
probing the ab initio principles underlying the approximately negative binomial
distributions of observed charged particle final state multiplicities, related
to multiparton interactions, the fluctuating structure and profile of proton
and diffraction. As a proof-of-concept, we apply the algorithm to ALICE
proton-proton charged particle multiplicity measurements done at different
center-of-mass energies and fiducial pseudorapidity intervals at the LHC,
available on HEPData. A strong double peak structure emerges from the
inversion, barely visible without it.Comment: 29 pages, 10 figures, v2: extended analysis (re-projection ratios,
2D
Growth Estimators and Confidence Intervals for the Mean of Negative Binomial Random Variables with Unknown Dispersion
The Negative Binomial distribution becomes highly skewed under extreme
dispersion. Even at moderately large sample sizes, the sample mean exhibits a
heavy right tail. The standard Normal approximation often does not provide
adequate inferences about the data's mean in this setting. In previous work, we
have examined alternative methods of generating confidence intervals for the
expected value. These methods were based upon Gamma and Chi Square
approximations or tail probability bounds such as Bernstein's Inequality. We
now propose growth estimators of the Negative Binomial mean. Under high
dispersion, zero values are likely to be overrepresented in the data. A growth
estimator constructs a Normal-style confidence interval by effectively removing
a small, pre--determined number of zeros from the data. We propose growth
estimators based upon multiplicative adjustments of the sample mean and direct
removal of zeros from the sample. These methods do not require estimating the
nuisance dispersion parameter. We will demonstrate that the growth estimators'
confidence intervals provide improved coverage over a wide range of parameter
values and asymptotically converge to the sample mean. Interestingly, the
proposed methods succeed despite adding both bias and variance to the Normal
approximation
Non-parametric Bayesian modelling of digital gene expression data
Next-generation sequencing technologies provide a revolutionary tool for
generating gene expression data. Starting with a fixed RNA sample, they
construct a library of millions of differentially abundant short sequence tags
or "reads", which constitute a fundamentally discrete measure of the level of
gene expression. A common limitation in experiments using these technologies is
the low number or even absence of biological replicates, which complicates the
statistical analysis of digital gene expression data. Analysis of this type of
data has often been based on modified tests originally devised for analysing
microarrays; both these and even de novo methods for the analysis of RNA-seq
data are plagued by the common problem of low replication. We propose a novel,
non-parametric Bayesian approach for the analysis of digital gene expression
data. We begin with a hierarchical model for modelling over-dispersed count
data and a blocked Gibbs sampling algorithm for inferring the posterior
distribution of model parameters conditional on these counts. The algorithm
compensates for the problem of low numbers of biological replicates by
clustering together genes with tag counts that are likely sampled from a common
distribution and using this augmented sample for estimating the parameters of
this distribution. The number of clusters is not decided a priori, but it is
inferred along with the remaining model parameters. We demonstrate the ability
of this approach to model biological data with high fidelity by applying the
algorithm on a public dataset obtained from cancerous and non-cancerous neural
tissues
Wald Confidence Intervals for a Single Poisson Parameter and Binomial Misclassification Parameter When the Data is Subject to Misclassification
This thesis is based on a Poisson model that uses both error-free data and error-prone data subject to misclassification in the form of false-negative and false-positive counts. We present maximum likelihood estimators (MLEs), Fisher\u27s Information, and Wald statistics for Poisson rate parameter and the two misclassification parameters. Next, we invert the Wald statistics to get asymptotic confidence intervals for Poisson rate parameter and false-negative rate parameter. The coverage and width properties for various sample size and parameter configurations are studied via a simulation study. Finally, we apply the MLEs and confidence intervals to one real data set and another realistic data set
Boosted Beta regression.
Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures
- …