190 research outputs found
Relaxation Penalties and Priors for Plausible Modeling of Nonidentified Bias Sources
In designed experiments and surveys, known laws or design feat ures provide
checks on the most relevant aspects of a model and identify the target
parameters. In contrast, in most observational studies in the health and social
sciences, the primary study data do not identify and may not even bound target
parameters. Discrepancies between target and analogous identified parameters
(biases) are then of paramount concern, which forces a major shift in modeling
strategies. Conventional approaches are based on conditional testing of
equality constraints, which correspond to implausible point-mass priors. When
these constraints are not identified by available data, however, no such
testing is possible. In response, implausible constraints can be relaxed into
penalty functions derived from plausible prior distributions. The resulting
models can be fit within familiar full or partial likelihood frameworks. The
absence of identification renders all analyses part of a sensitivity analysis.
In this view, results from single models are merely examples of what might be
plausibly inferred. Nonetheless, just one plausible inference may suffice to
demonstrate inherent limitations of the data. Points are illustrated with
misclassified data from a study of sudden infant death syndrome. Extensions to
confounding, selection bias and more complex data structures are outlined.Comment: Published in at http://dx.doi.org/10.1214/09-STS291 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Comment: The Need for Syncretism in Applied Statistics
Comment on "The Need for Syncretism in Applied Statistics" [arXiv:1012.1161]Comment: Published in at http://dx.doi.org/10.1214/10-STS308A the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The causal foundations of applied probability and statistics
Statistical science (as opposed to mathematical statistics) involves far more
than probability theory, for it requires realistic causal models of data
generators - even for purely descriptive goals. Statistical decision theory
requires more causality: Rational decisions are actions taken to minimize costs
while maximizing benefits, and thus require explication of causes of loss and
gain. Competent statistical practice thus integrates logic, context, and
probability into scientific inference and decision using narratives filled with
causality. This reality was seen and accounted for intuitively by the founders
of modern statistics, but was not well recognized in the ensuing statistical
theory (which focused instead on the causally inert properties of probability
measures). Nonetheless, both statistical foundations and basic statistics can
and should be taught using formal causal models. The causal view of statistical
science fits within a broader information-processing framework which
illuminates and unifies frequentist, Bayesian, and related probability-based
foundations of statistics. Causality theory can thus be seen as a key component
connecting computation to contextual information, not extra-statistical but
instead essential for sound statistical training and applications.Comment: 22 pages; in press for Dechter, R., Halpern, J., and Geffner, H.,
eds. Probabilistic and Causal Inference: The Works of Judea Pearl. ACM book
Divergence vs. Decision P-values: A Distinction Worth Making in Theory and Keeping in Practice
There are two distinct definitions of 'P-value' for evaluating a proposed
hypothesis or model for the process generating an observed dataset. The
original definition starts with a measure of the divergence of the dataset from
what was expected under the model, such as a sum of squares or a deviance
statistic. A P-value is then the ordinal location of the measure in a reference
distribution computed from the model and the data, and is treated as a
unit-scaled index of compatibility between the data and the model. In the other
definition, a P-value is a random variable on the unit interval whose
realizations can be compared to a cutoff alpha to generate a decision rule with
known error rates under the model and specific alternatives. It is commonly
assumed that realizations of such decision P-values always correspond to
divergence P-values. But this need not be so: Decision P-values can violate
intuitive single-sample coherence criteria where divergence P-values do not. It
is thus argued that divergence and decision P-values should be carefully
distinguished in teaching, and that divergence P-values are the relevant choice
when the analysis goal is to summarize evidence rather than implement a
decision rule.Comment: 49 pages. Scandinavian Journal of Statistics 2023, issue 1, with
discussion and rejoinder in issue
Connecting Simple and Precise P-values to Complex and Ambiguous Realities
Mathematics is a limited component of solutions to real-world problems, as it
expresses only what is expected to be true if all our assumptions are correct,
including implicit assumptions that are omnipresent and often incorrect.
Statistical methods are rife with implicit assumptions whose violation can be
life-threatening when results from them are used to set policy. Among them are
that there is human equipoise or unbiasedness in data generation, management,
analysis, and reporting. These assumptions correspond to levels of cooperation,
competence, neutrality, and integrity that are absent more often than we would
like to believe.
Given this harsh reality, we should ask what meaning, if any, we can assign
to the P-values, 'statistical significance' declarations, 'confidence'
intervals, and posterior probabilities that are used to decide what and how to
present (or spin) discussions of analyzed data. By themselves, P-values and CI
do not test any hypothesis, nor do they measure the significance of results or
the confidence we should have in them. The sense otherwise is an ongoing
cultural error perpetuated by large segments of the statistical and research
community via misleading terminology.
So-called 'inferential' statistics can only become contextually interpretable
when derived explicitly from causal stories about the real data generator (such
as randomization), and can only become reliable when those stories are based on
valid and public documentation of the physical mechanisms that generated the
data. Absent these assurances, traditional interpretations of statistical
results become pernicious fictions that need to be replaced by far more
circumspect descriptions of data and model relations.Comment: 25 pages. Body of text to appear as a rejoinder in the Scandinavian
Journal of Statistic
Epidemiologic measures and policy formulation: lessons from potential outcomes
This paper provides a critique of the common practice in the health-policy literature of focusing on hypothetical outcome removal at the expense of intervention analysis. The paper begins with an introduction to measures of causal effects within the potential-outcomes framework, focusing on underlying conceptual models, definitions and drawbacks of special relevance to policy formulation based on epidemiologic data. It is argued that, for policy purposes, one should analyze intervention effects within a multivariate-outcome framework to capture the impact of major sources of morbidity and mortality. This framework can clarify what is captured and missed by summary measures of population health, and shows that the concept of summary measure can and should be extended to multidimensional indices
- ā¦