87,191 research outputs found
Transformations in regression, estimation, testing and modelling
Transformation is a powerful tool for model building. In regression the response variable is transformed in order to achieve the usual assumptions of normality, constant variance and additivity of effects. Here the normality assumption is replaced by the Laplace distributional assumption, appropriate when more large errors occur than would be expected if the errors were normally distributed. The parametric model is enlarged to include a transformation parameter and a likelihood procedure is adopted for estimating this parameter simultaneously with other parameters of interest. Diagnostic methods are described for assessing the influence of individual observations on the choice of transformation. Examples are presented. In distribution methodology the independent responses are transformed in order that a distributional assumption is satisfied for the transformed data. Here the interest is in the family of distributions which are not dependent on an unknown shape parameter. The gamma distribution (known order), with special case the exponential distribution, is a member of this family. An information number approach is proposed for transforming a known distribution to the gamma distribution (known order). The approach provides an insight into the large-sample behaviour of the likelihood procedure considered by Draper and Guttman (1968) for investigating transformations of data which allow the transformed observations to follow a gamma distribution. The information number approach is illustrated for three examples end the improvement towards the gamma distribution introduced by transformation is measured numerically and graphically. A graphical procedure is proposed for the general case of investigating transformations of data which allow the transformed observations to follow a distribution dependent on unknown threshold and scale parameters. The procedure is extended to include model testing and estimation for any distribution which with the aid of a power transformation can be put in the simple form of a distribution that is not dependent on an unknown shape parameter. The procedure is based on a ratio, R(y), which is constructed from the power transformation. Also described is a ratio-based technique for estimating the threshold parameter in important parametric models, including the three-parameter Weibull and lognormal distributions. Ratio estimation for the weibull distribution is assessed and compared with the modified maximum likelihood estimation of Cohen and Whitten (1982) in terms of bias and root mean squared error, by means of a simulation study. The methods are illustrated with several examples and extend naturally to singly Type 1 and Type 2 censored data
Nonanticipating estimation applied to sequential analysis and changepoint detection
Suppose a process yields independent observations whose distributions belong
to a family parameterized by \theta\in\Theta. When the process is in control,
the observations are i.i.d. with a known parameter value \theta_0. When the
process is out of control, the parameter changes. We apply an idea of Robbins
and Siegmund [Proc. Sixth Berkeley Symp. Math. Statist. Probab. 4 (1972) 37-41]
to construct a class of sequential tests and detection schemes whereby the
unknown post-change parameters are estimated. This approach is especially
useful in situations where the parametric space is intricate and mixture-type
rules are operationally or conceptually difficult to formulate. We exemplify
our approach by applying it to the problem of detecting a change in the shape
parameter of a Gamma distribution, in both a univariate and a multivariate
setting.Comment: Published at http://dx.doi.org/10.1214/009053605000000183 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A non-Gaussian continuous state space model for asset degradation
The degradation model plays an essential role in asset life prediction and condition based maintenance. Various degradation models have been proposed. Within these models, the state space model has the ability to combine degradation data and failure event data. The state space model is also an effective approach to deal with the multiple observations and missing data issues. Using the state space degradation model, the deterioration process of assets is presented by a system state process which can be revealed by a sequence of observations. Current research largely assumes that the underlying system development process is discrete in time or states. Although some models have been developed to consider continuous time and space, these state space models are based on the Wiener process with the Gaussian assumption. This paper proposes a Gamma-based state space degradation model in order to remove the Gaussian assumption. Both condition monitoring observations and failure events are considered in the model so as to improve the accuracy of asset life prediction. A simulation study is carried out to illustrate the application procedure of the proposed model
Flexible Tweedie regression models for continuous data
Tweedie regression models provide a flexible family of distributions to deal
with non-negative highly right-skewed data as well as symmetric and heavy
tailed data and can handle continuous data with probability mass at zero. The
estimation and inference of Tweedie regression models based on the maximum
likelihood method are challenged by the presence of an infinity sum in the
probability function and non-trivial restrictions on the power parameter space.
In this paper, we propose two approaches for fitting Tweedie regression models,
namely, quasi- and pseudo-likelihood. We discuss the asymptotic properties of
the two approaches and perform simulation studies to compare our methods with
the maximum likelihood method. In particular, we show that the quasi-likelihood
method provides asymptotically efficient estimation for regression parameters.
The computational implementation of the alternative methods is faster and
easier than the orthodox maximum likelihood, relying on a simple Newton scoring
algorithm. Simulation studies showed that the quasi- and pseudo-likelihood
approaches present estimates, standard errors and coverage rates similar to the
maximum likelihood method. Furthermore, the second-moment assumptions required
by the quasi- and pseudo-likelihood methods enables us to extend the Tweedie
regression models to the class of quasi-Tweedie regression models in the
Wedderburn's style. Moreover, it allows to eliminate the non-trivial
restriction on the power parameter space, and thus provides a flexible
regression model to deal with continuous data. We provide \texttt{R}
implementation and illustrate the application of Tweedie regression models
using three data sets.Comment: 34 pages, 8 figure
Fast and scalable non-parametric Bayesian inference for Poisson point processes
We study the problem of non-parametric Bayesian estimation of the intensity
function of a Poisson point process. The observations are independent
realisations of a Poisson point process on the interval . We propose two
related approaches. In both approaches we model the intensity function as
piecewise constant on bins forming a partition of the interval . In
the first approach the coefficients of the intensity function are assigned
independent gamma priors, leading to a closed form posterior distribution. On
the theoretical side, we prove that as the posterior
asymptotically concentrates around the "true", data-generating intensity
function at an optimal rate for -H\"older regular intensity functions (). In the second approach we employ a gamma Markov chain prior on the
coefficients of the intensity function. The posterior distribution is no longer
available in closed form, but inference can be performed using a
straightforward version of the Gibbs sampler. Both approaches scale well with
sample size, but the second is much less sensitive to the choice of .
Practical performance of our methods is first demonstrated via synthetic data
examples. We compare our second method with other existing approaches on the UK
coal mining disasters data. Furthermore, we apply it to the US mass shootings
data and Donald Trump's Twitter data.Comment: 45 pages, 22 figure
Bayesian spectral modeling for multiple time series
We develop a novel Bayesian modeling approach to spectral density estimation for multiple time series. The log-periodogram distribution for each series is modeled as a mixture of Gaussian distributions with frequency-dependent weights and mean functions. The implied model for the log-spectral density is a mixture of linear mean functions with frequency-dependent weights. The mixture weights are built through successive differences of a logit-normal distribution function with frequency-dependent parameters. Building from the construction for a single spectral density, we develop a hierarchical extension for multiple time series. Specifically, we set the mean functions to be common to all spectral densities and make the weights specific to the time series through the parameters of the logit-normal distribution. In addition to accommodating flexible spectral density shapes, a practically important feature of the proposed formulation is that it allows for ready posterior simulation through a Gibbs sampler with closed form full conditional distributions for all model parameters. The modeling approach is illustrated with simulated datasets, and used for spectral analysis of multichannel electroencephalographic recordings (EEGs), which provides a key motivating application for the proposed methodology
General Semiparametric Shared Frailty Model Estimation and Simulation with frailtySurv
The R package frailtySurv for simulating and fitting semi-parametric shared
frailty models is introduced. Package frailtySurv implements semi-parametric
consistent estimators for a variety of frailty distributions, including gamma,
log-normal, inverse Gaussian and power variance function, and provides
consistent estimators of the standard errors of the parameters' estimators. The
parameters' estimators are asymptotically normally distributed, and therefore
statistical inference based on the results of this package, such as hypothesis
testing and confidence intervals, can be performed using the normal
distribution. Extensive simulations demonstrate the flexibility and correct
implementation of the estimator. Two case studies performed with publicly
available datasets demonstrate applicability of the package. In the Diabetic
Retinopathy Study, the onset of blindness is clustered by patient, and in a
large hard drive failure dataset, failure times are thought to be clustered by
the hard drive manufacturer and model
- …