87,191 research outputs found

    Transformations in regression, estimation, testing and modelling

    Get PDF
    Transformation is a powerful tool for model building. In regression the response variable is transformed in order to achieve the usual assumptions of normality, constant variance and additivity of effects. Here the normality assumption is replaced by the Laplace distributional assumption, appropriate when more large errors occur than would be expected if the errors were normally distributed. The parametric model is enlarged to include a transformation parameter and a likelihood procedure is adopted for estimating this parameter simultaneously with other parameters of interest. Diagnostic methods are described for assessing the influence of individual observations on the choice of transformation. Examples are presented. In distribution methodology the independent responses are transformed in order that a distributional assumption is satisfied for the transformed data. Here the interest is in the family of distributions which are not dependent on an unknown shape parameter. The gamma distribution (known order), with special case the exponential distribution, is a member of this family. An information number approach is proposed for transforming a known distribution to the gamma distribution (known order). The approach provides an insight into the large-sample behaviour of the likelihood procedure considered by Draper and Guttman (1968) for investigating transformations of data which allow the transformed observations to follow a gamma distribution. The information number approach is illustrated for three examples end the improvement towards the gamma distribution introduced by transformation is measured numerically and graphically. A graphical procedure is proposed for the general case of investigating transformations of data which allow the transformed observations to follow a distribution dependent on unknown threshold and scale parameters. The procedure is extended to include model testing and estimation for any distribution which with the aid of a power transformation can be put in the simple form of a distribution that is not dependent on an unknown shape parameter. The procedure is based on a ratio, R(y), which is constructed from the power transformation. Also described is a ratio-based technique for estimating the threshold parameter in important parametric models, including the three-parameter Weibull and lognormal distributions. Ratio estimation for the weibull distribution is assessed and compared with the modified maximum likelihood estimation of Cohen and Whitten (1982) in terms of bias and root mean squared error, by means of a simulation study. The methods are illustrated with several examples and extend naturally to singly Type 1 and Type 2 censored data

    Nonanticipating estimation applied to sequential analysis and changepoint detection

    Get PDF
    Suppose a process yields independent observations whose distributions belong to a family parameterized by \theta\in\Theta. When the process is in control, the observations are i.i.d. with a known parameter value \theta_0. When the process is out of control, the parameter changes. We apply an idea of Robbins and Siegmund [Proc. Sixth Berkeley Symp. Math. Statist. Probab. 4 (1972) 37-41] to construct a class of sequential tests and detection schemes whereby the unknown post-change parameters are estimated. This approach is especially useful in situations where the parametric space is intricate and mixture-type rules are operationally or conceptually difficult to formulate. We exemplify our approach by applying it to the problem of detecting a change in the shape parameter of a Gamma distribution, in both a univariate and a multivariate setting.Comment: Published at http://dx.doi.org/10.1214/009053605000000183 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A non-Gaussian continuous state space model for asset degradation

    Get PDF
    The degradation model plays an essential role in asset life prediction and condition based maintenance. Various degradation models have been proposed. Within these models, the state space model has the ability to combine degradation data and failure event data. The state space model is also an effective approach to deal with the multiple observations and missing data issues. Using the state space degradation model, the deterioration process of assets is presented by a system state process which can be revealed by a sequence of observations. Current research largely assumes that the underlying system development process is discrete in time or states. Although some models have been developed to consider continuous time and space, these state space models are based on the Wiener process with the Gaussian assumption. This paper proposes a Gamma-based state space degradation model in order to remove the Gaussian assumption. Both condition monitoring observations and failure events are considered in the model so as to improve the accuracy of asset life prediction. A simulation study is carried out to illustrate the application procedure of the proposed model

    Flexible Tweedie regression models for continuous data

    Full text link
    Tweedie regression models provide a flexible family of distributions to deal with non-negative highly right-skewed data as well as symmetric and heavy tailed data and can handle continuous data with probability mass at zero. The estimation and inference of Tweedie regression models based on the maximum likelihood method are challenged by the presence of an infinity sum in the probability function and non-trivial restrictions on the power parameter space. In this paper, we propose two approaches for fitting Tweedie regression models, namely, quasi- and pseudo-likelihood. We discuss the asymptotic properties of the two approaches and perform simulation studies to compare our methods with the maximum likelihood method. In particular, we show that the quasi-likelihood method provides asymptotically efficient estimation for regression parameters. The computational implementation of the alternative methods is faster and easier than the orthodox maximum likelihood, relying on a simple Newton scoring algorithm. Simulation studies showed that the quasi- and pseudo-likelihood approaches present estimates, standard errors and coverage rates similar to the maximum likelihood method. Furthermore, the second-moment assumptions required by the quasi- and pseudo-likelihood methods enables us to extend the Tweedie regression models to the class of quasi-Tweedie regression models in the Wedderburn's style. Moreover, it allows to eliminate the non-trivial restriction on the power parameter space, and thus provides a flexible regression model to deal with continuous data. We provide \texttt{R} implementation and illustrate the application of Tweedie regression models using three data sets.Comment: 34 pages, 8 figure

    Fast and scalable non-parametric Bayesian inference for Poisson point processes

    Get PDF
    We study the problem of non-parametric Bayesian estimation of the intensity function of a Poisson point process. The observations are nn independent realisations of a Poisson point process on the interval [0,T][0,T]. We propose two related approaches. In both approaches we model the intensity function as piecewise constant on NN bins forming a partition of the interval [0,T][0,T]. In the first approach the coefficients of the intensity function are assigned independent gamma priors, leading to a closed form posterior distribution. On the theoretical side, we prove that as n→∞,n\rightarrow\infty, the posterior asymptotically concentrates around the "true", data-generating intensity function at an optimal rate for hh-H\"older regular intensity functions (0<h≤10 < h\leq 1). In the second approach we employ a gamma Markov chain prior on the coefficients of the intensity function. The posterior distribution is no longer available in closed form, but inference can be performed using a straightforward version of the Gibbs sampler. Both approaches scale well with sample size, but the second is much less sensitive to the choice of NN. Practical performance of our methods is first demonstrated via synthetic data examples. We compare our second method with other existing approaches on the UK coal mining disasters data. Furthermore, we apply it to the US mass shootings data and Donald Trump's Twitter data.Comment: 45 pages, 22 figure

    Bayesian spectral modeling for multiple time series

    Get PDF
    We develop a novel Bayesian modeling approach to spectral density estimation for multiple time series. The log-periodogram distribution for each series is modeled as a mixture of Gaussian distributions with frequency-dependent weights and mean functions. The implied model for the log-spectral density is a mixture of linear mean functions with frequency-dependent weights. The mixture weights are built through successive differences of a logit-normal distribution function with frequency-dependent parameters. Building from the construction for a single spectral density, we develop a hierarchical extension for multiple time series. Specifically, we set the mean functions to be common to all spectral densities and make the weights specific to the time series through the parameters of the logit-normal distribution. In addition to accommodating flexible spectral density shapes, a practically important feature of the proposed formulation is that it allows for ready posterior simulation through a Gibbs sampler with closed form full conditional distributions for all model parameters. The modeling approach is illustrated with simulated datasets, and used for spectral analysis of multichannel electroencephalographic recordings (EEGs), which provides a key motivating application for the proposed methodology

    General Semiparametric Shared Frailty Model Estimation and Simulation with frailtySurv

    Get PDF
    The R package frailtySurv for simulating and fitting semi-parametric shared frailty models is introduced. Package frailtySurv implements semi-parametric consistent estimators for a variety of frailty distributions, including gamma, log-normal, inverse Gaussian and power variance function, and provides consistent estimators of the standard errors of the parameters' estimators. The parameters' estimators are asymptotically normally distributed, and therefore statistical inference based on the results of this package, such as hypothesis testing and confidence intervals, can be performed using the normal distribution. Extensive simulations demonstrate the flexibility and correct implementation of the estimator. Two case studies performed with publicly available datasets demonstrate applicability of the package. In the Diabetic Retinopathy Study, the onset of blindness is clustered by patient, and in a large hard drive failure dataset, failure times are thought to be clustered by the hard drive manufacturer and model
    • …
    corecore