41,483 research outputs found
Relaxation Penalties and Priors for Plausible Modeling of Nonidentified Bias Sources
In designed experiments and surveys, known laws or design feat ures provide
checks on the most relevant aspects of a model and identify the target
parameters. In contrast, in most observational studies in the health and social
sciences, the primary study data do not identify and may not even bound target
parameters. Discrepancies between target and analogous identified parameters
(biases) are then of paramount concern, which forces a major shift in modeling
strategies. Conventional approaches are based on conditional testing of
equality constraints, which correspond to implausible point-mass priors. When
these constraints are not identified by available data, however, no such
testing is possible. In response, implausible constraints can be relaxed into
penalty functions derived from plausible prior distributions. The resulting
models can be fit within familiar full or partial likelihood frameworks. The
absence of identification renders all analyses part of a sensitivity analysis.
In this view, results from single models are merely examples of what might be
plausibly inferred. Nonetheless, just one plausible inference may suffice to
demonstrate inherent limitations of the data. Points are illustrated with
misclassified data from a study of sudden infant death syndrome. Extensions to
confounding, selection bias and more complex data structures are outlined.Comment: Published in at http://dx.doi.org/10.1214/09-STS291 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Assessing the disclosure protection provided by misclassification for survey microdata
Government statistical agencies often apply statistical disclosure limitation techniques to survey microdata to protect confidentiality. There is a need for ways to assess the protection provided. This paper develops some simple methods for disclosure limitation techniques which perturb the values of categorical identifying variables. The methods are applied in numerical experiments based upon census data from the United Kingdom which are subject to two perturbation techniques: data swapping and the post randomisation method. Some simplifying approximations to the measure of risk are found to work well in capturing the impacts of these techniques. These approximations provide simple extensions of existing risk assessment methods based upon Poisson log-linear models. A numerical experiment is also undertaken to assess the impact of multivariate misclassification with an increasing number of identifying variables. The methods developed in this paper may also be used to obtain more realistic assessments of risk which take account of the kinds of measurement and other non-sampling errors commonly arising in surveys
A New Method for Protecting Interrelated Time Series with Bayesian Prior Distributions and Synthetic Data
Organizations disseminate statistical summaries of administrative data via the Web for unrestricted public use. They balance the trade-off between confidentiality protection and inference quality. Recent developments in disclosure avoidance techniques include the incorporation of synthetic data, which capture the essential features of underlying data by releasing altered data generated from a posterior predictive distribution. The United States Census Bureau collects millions of interrelated time series micro-data that are hierarchical and contain many zeros and suppressions. Rule-based disclosure avoidance techniques often require the suppression of count data for small magnitudes and the modification of data based on a small number of entities. Motivated by this problem, we use zero-inflated extensions of Bayesian Generalized Linear Mixed Models (BGLMM) with privacy-preserving prior distributions to develop methods for protecting and releasing synthetic data from time series about thousands of small groups of entities without suppression based on the of magnitudes or number of entities. We find that as the prior distributions of the variance components in the BGLMM become more precise toward zero, confidentiality protection increases and inference quality deteriorates. We evaluate our methodology using a strict privacy measure, empirical differential privacy, and a newly defined risk measure, Probability of Range Identification (PoRI), which directly measures attribute disclosure risk. We illustrate our results with the U.S. Census Bureauās Quarterly Workforce Indicators
Using prior information to identify boundaries in disease risk maps
Disease maps display the spatial pattern in disease risk, so that high-risk
clusters can be identified. The spatial structure in the risk map is typically
represented by a set of random effects, which are modelled with a conditional
autoregressive (CAR) prior. Such priors include a global spatial smoothing
parameter, whereas real risk surfaces are likely to include areas of smooth
evolution as well as discontinuities, the latter of which are known as risk
boundaries. Therefore, this paper proposes an extension to the class of CAR
priors, which can identify both areas of localised spatial smoothness and risk
boundaries. However, allowing for this localised smoothing requires large
numbers of correlation parameters to be estimated, which are unlikely to be
well identified from the data. To address this problem we propose eliciting an
informative prior about the locations of such boundaries, which can be combined
with the information from the data to provide more precise posterior inference.
We test our approach by simulation, before applying it to a study of the risk
of emergency admission to hospital in Greater Glasgow, Scotland
A comparative study of parametric mortality projection models
The relative merits of different parametric models for making life expectancy and annuity value predictions at both pensioner and adult ages are investigated. This study builds on current published research and considers recent model enhancements and the extent to which these enhancements address the deficiencies that have been identified of some of the models. The England & Wales male mortality experience is used to conduct detailed comparisons at pensioner ages, having first established a common basis for comparison across all models. The model comparison is then extended to include the England & Wales female experience and both the male and female USA mortality experiences over a wider age range, encompassing also the working ages
Sharp sensitivity bounds for mediation under unmeasured mediator-outcome confounding
It is often of interest to decompose a total effect of an exposure into the
component that acts on the outcome through some mediator and the component that
acts independently through other pathways. Said another way, we are interested
in the direct and indirect effects of the exposure on the outcome. Even if the
exposure is randomly assigned, it is often infeasible to randomize the
mediator, leaving the mediator-outcome confounding not fully controlled. We
develop a sensitivity analysis technique that can bound the direct and indirect
effects without parametric assumptions about the unmeasured mediator-outcome
confounding
Structural Nested Models and G-estimation: The Partially Realized Promise
Structural nested models (SNMs) and the associated method of G-estimation
were first proposed by James Robins over two decades ago as approaches to
modeling and estimating the joint effects of a sequence of treatments or
exposures. The models and estimation methods have since been extended to
dealing with a broader series of problems, and have considerable advantages
over the other methods developed for estimating such joint effects. Despite
these advantages, the application of these methods in applied research has been
relatively infrequent; we view this as unfortunate. To remedy this, we provide
an overview of the models and estimation methods as developed, primarily by
Robins, over the years. We provide insight into their advantages over other
methods, and consider some possible reasons for failure of the methods to be
more broadly adopted, as well as possible remedies. Finally, we consider
several extensions of the standard models and estimation methods.Comment: Published in at http://dx.doi.org/10.1214/14-STS493 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Recommended from our members
llc: a collection of R functions for fitting a class of Lee-Carter mortality models using iterative fitting algorithms
We implement a specialised iterative regression methodology in R for the analysis of age-period mortality data based on a class of generalised Lee-Carter (LC) type modelling structures. The LC-based modelling frameworks is viewed in the current literature as among the most efficient and transparent methods of modelling and projecting mortality improvements. Thus, we make use of the modelling approach discussed in Renshaw and Haberman (2006), which extends the basic LC model and proposes to make use of a tailored iterative process to generate parameter estimates based on Poisson likelihood. Furthermore, building on this methodology we develop and implement a stratified LC model for the measurement of the additive effect on the log scale of an explanatory factor (other than age and time). This modelling methodology is implemented in a publically available collection of programming functions that facilitate both the preparation of mortality data and the fitting and analysis of the given log-linear modelling structures. Also, the package incorporates methods to produce forecasts of future mortality rates and to compute the corresponding future life expectancy
- ā¦