2,396,732 research outputs found
Dealing with missing standard deviation and mean values in meta-analysis of continuous outcomes: a systematic review
Background: Rigorous, informative meta-analyses rely on availability of appropriate summary statistics or individual
participant data. For continuous outcomes, especially those with naturally skewed distributions, summary
information on the mean or variability often goes unreported. While full reporting of original trial data is the ideal,
we sought to identify methods for handling unreported mean or variability summary statistics in meta-analysis.
Methods: We undertook two systematic literature reviews to identify methodological approaches used to deal with
missing mean or variability summary statistics. Five electronic databases were searched, in addition to the Cochrane
Colloquium abstract books and the Cochrane Statistics Methods Group mailing list archive. We also conducted cited
reference searching and emailed topic experts to identify recent methodological developments. Details recorded
included the description of the method, the information required to implement the method, any underlying
assumptions and whether the method could be readily applied in standard statistical software. We provided a
summary description of the methods identified, illustrating selected methods in example meta-analysis scenarios.
Results: For missing standard deviations (SDs), following screening of 503 articles, fifteen methods were identified in
addition to those reported in a previous review. These included Bayesian hierarchical modelling at the meta-analysis
level; summary statistic level imputation based on observed SD values from other trials in the meta-analysis; a practical
approximation based on the range; and algebraic estimation of the SD based on other summary statistics. Following
screening of 1124 articles for methods estimating the mean, one approximate Bayesian computation approach and
three papers based on alternative summary statistics were identified. Illustrative meta-analyses showed that when
replacing a missing SD the approximation using the range minimised loss of precision and generally performed better
than omitting trials. When estimating missing means, a formula using the median, lower quartile and upper quartile
performed best in preserving the precision of the meta-analysis findings, although in some scenarios, omitting trials
gave superior results.
Conclusions: Methods based on summary statistics (minimum, maximum, lower quartile, upper quartile, median)
reported in the literature facilitate more comprehensive inclusion of randomised controlled trials with missing mean or
variability summary statistics within meta-analyses
Summary Statistics for Partitionings and Feature Allocations
Infinite mixture models are commonly used for clustering. One can sample from
the posterior of mixture assignments by Monte Carlo methods or find its maximum
a posteriori solution by optimization. However, in some problems the posterior
is diffuse and it is hard to interpret the sampled partitionings. In this
paper, we introduce novel statistics based on block sizes for representing
sample sets of partitionings and feature allocations. We develop an
element-based definition of entropy to quantify segmentation among their
elements. Then we propose a simple algorithm called entropy agglomeration (EA)
to summarize and visualize this information. Experiments on various infinite
mixture posteriors as well as a feature allocation dataset demonstrate that the
proposed statistics are useful in practice.Comment: Accepted to NIPS 2013:
https://nips.cc/Conferences/2013/Program/event.php?ID=376
Considerate Approaches to Achieving Sufficiency for ABC model selection
For nearly any challenging scientific problem evaluation of the likelihood is
problematic if not impossible. Approximate Bayesian computation (ABC) allows us
to employ the whole Bayesian formalism to problems where we can use simulations
from a model, but cannot evaluate the likelihood directly. When summary
statistics of real and simulated data are compared --- rather than the data
directly --- information is lost, unless the summary statistics are sufficient.
Here we employ an information-theoretical framework that can be used to
construct (approximately) sufficient statistics by combining different
statistics until the loss of information is minimized. Such sufficient sets of
statistics are constructed for both parameter estimation and model selection
problems. We apply our approach to a range of illustrative and real-world model
selection problems
Non-linear regression models for Approximate Bayesian Computation
Approximate Bayesian inference on the basis of summary statistics is
well-suited to complex problems for which the likelihood is either
mathematically or computationally intractable. However the methods that use
rejection suffer from the curse of dimensionality when the number of summary
statistics is increased. Here we propose a machine-learning approach to the
estimation of the posterior density by introducing two innovations. The new
method fits a nonlinear conditional heteroscedastic regression of the parameter
on the summary statistics, and then adaptively improves estimation using
importance sampling. The new algorithm is compared to the state-of-the-art
approximate Bayesian methods, and achieves considerable reduction of the
computational burden in two examples of inference in statistical genetics and
in a queueing model.Comment: 4 figures; version 3 minor changes; to appear in Statistics and
Computin
A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation
Approximate Bayesian computation (ABC) methods make use of comparisons
between simulated and observed summary statistics to overcome the problem of
computationally intractable likelihood functions. As the practical
implementation of ABC requires computations based on vectors of summary
statistics, rather than full data sets, a central question is how to derive
low-dimensional summary statistics from the observed data with minimal loss of
information. In this article we provide a comprehensive review and comparison
of the performance of the principal methods of dimension reduction proposed in
the ABC literature. The methods are split into three nonmutually exclusive
classes consisting of best subset selection methods, projection techniques and
regularization. In addition, we introduce two new methods of dimension
reduction. The first is a best subset selection method based on Akaike and
Bayesian information criteria, and the second uses ridge regression as a
regularization procedure. We illustrate the performance of these dimension
reduction techniques through the analysis of three challenging models and data
sets.Comment: Published in at http://dx.doi.org/10.1214/12-STS406 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …
