185 research outputs found
Model selection via Bayesian information capacity designs for generalised linear models
The first investigation is made of designs for screening experiments where
the response variable is approximated by a generalised linear model. A Bayesian
information capacity criterion is defined for the selection of designs that are
robust to the form of the linear predictor. For binomial data and logistic
regression, the effectiveness of these designs for screening is assessed
through simulation studies using all-subsets regression and model selection via
maximum penalised likelihood and a generalised information criterion. For
Poisson data and log-linear regression, similar assessments are made using
maximum likelihood and the Akaike information criterion for minimally-supported
designs that are constructed analytically. The results show that effective
screening, that is, high power with moderate type I error rate and false
discovery rate, can be achieved through suitable choices for the number of
design support points and experiment size. Logistic regression is shown to
present a more challenging problem than log-linear regression. Some areas for
future work are also indicated
Gibbs optimal design of experiments
Bayesian optimal design of experiments is a well-established approach to
planning experiments. Briefly, a probability distribution, known as a
statistical model, for the responses is assumed which is dependent on a vector
of unknown parameters. A utility function is then specified which gives the
gain in information for estimating the true value of the parameters using the
Bayesian posterior distribution. A Bayesian optimal design is given by
maximising the expectation of the utility with respect to the joint
distribution given by the statistical model and prior distribution for the true
parameter values. The approach takes account of the experimental aim via
specification of the utility and of all assumed sources of uncertainty via the
expected utility. However, it is predicated on the specification of the
statistical model. Recently, a new type of statistical inference, known as
Gibbs (or General Bayesian) inference, has been advanced. This is
Bayesian-like, in that uncertainty on unknown quantities is represented by a
posterior distribution, but does not necessarily rely on specification of a
statistical model. Thus the resulting inference should be less sensitive to
misspecification of the statistical model. The purpose of this paper is to
propose Gibbs optimal design: a framework for optimal design of experiments for
Gibbs inference. The concept behind the framework is introduced along with a
computational approach to find Gibbs optimal designs in practice. The framework
is demonstrated on exemplars including linear models, and experiments with
count and time-to-event responses
An approach for finding fully Bayesian optimal designs using normal-based approximations to loss functions
The generation of decision-theoretic Bayesian optimal designs is complicated by the significant computational challenge of minimising an analytically intractable expected loss function over a, potentially, high-dimensional design space. A new general approach for approximately finding Bayesian optimal designs is proposed which uses computationally efficient normal-based approximations to posterior summaries to aid in approximating the expected loss. This new approach is demonstrated on illustrative, yet challenging, examples including hierarchical models for blocked experiments, and experimental aims of parameter estimation and model discrimination. Where possible, the results of the proposed methodology are compared, both in terms of performance and computing time, to results from using computationally more expensive, but potentially more accurate, Monte Carlo approximations. Moreover, the methodology is also applied to problems where the use of Monte Carlo approximations is computationally infeasible
Robust designs for Poisson regression models
We consider the problem of how to construct robust designs for Poisson regression models. An analytical expression is derived for robust designs for first-order Poisson regression models where uncertainty exists in the prior parameter estimates. Given certain constraints in the methodology, it may be necessary to extend the robust designs for implementation in practical experiments. With these extensions, our methodology constructs designs which perform similarly, in terms of estimation, to current techniques, and offers the solution in a more timely manner. We further apply this analytic result to cases where uncertainty exists in the linear predictor. The application of this methodology to practical design problems such as screening experiments is explored. Given the minimal prior knowledge that is usually available when conducting such experiments, it is recommended to derive designs robust across a variety of systems. However, incorporating such uncertainty into the design process can be a computationally intense exercise. Hence, our analytic approach is explored as an alternative
Modelling Survival Data to Account for Model Uncertainty: A Single Model or Model Averaging?
This study considered the problem of predicting survival, based on three alternative models: a single Weibull, a\ud
mixture of Weibulls and a cure model. Instead of the common procedure of choosing a single ???best??? model, where\ud
???best??? is defined in terms of goodness of fit to the data, a Bayesian model averaging (BMA) approach was adopted to\ud
account for model uncertainty. This was illustrated using a case study in which the aim was the description of\ud
lymphoma cancer survival with covariates given by phenotypes and gene expression. The results of this study indicate\ud
that if the sample size is sufficiently large, one of the three models emerge as having highest probability given the\ud
data, as indicated by the goodness of fit measure; the Bayesian information criterion (BIC). However, when the sample\ud
size was reduced, no single model was revealed as ???best???, suggesting that a BMA approach would be appropriate.\ud
Although a BMA approach can compromise on goodness of fit to the data (when compared to the true model), it can\ud
provide robust predictions and facilitate more detailed investigation of the relationships between gene expression\ud
and patient survival
A framework for automated anomaly detection in high frequency water-quality data from in situ sensors
River water-quality monitoring is increasingly conducted using automated in
situ sensors, enabling timelier identification of unexpected values. However,
anomalies caused by technical issues confound these data, while the volume and
velocity of data prevent manual detection. We present a framework for automated
anomaly detection in high-frequency water-quality data from in situ sensors,
using turbidity, conductivity and river level data. After identifying end-user
needs and defining anomalies, we ranked their importance and selected suitable
detection methods. High priority anomalies included sudden isolated spikes and
level shifts, most of which were classified correctly by regression-based
methods such as autoregressive integrated moving average models. However, using
other water-quality variables as covariates reduced performance due to complex
relationships among variables. Classification of drift and periods of
anomalously low or high variability improved when we applied replaced anomalous
measurements with forecasts, but this inflated false positive rates.
Feature-based methods also performed well on high priority anomalies, but were
also less proficient at detecting lower priority anomalies, resulting in high
false negative rates. Unlike regression-based methods, all feature-based
methods produced low false positive rates, but did not and require training or
optimization. Rule-based methods successfully detected impossible values and
missing observations. Thus, we recommend using a combination of methods to
improve anomaly detection performance, whilst minimizing false detection rates.
Furthermore, our framework emphasizes the importance of communication between
end-users and analysts for optimal outcomes with respect to both detection
performance and end-user needs. Our framework is applicable to other types of
high frequency time-series data and anomaly detection applications
Bayesian spatio-temporal models for stream networks
Spatio-temporal models are widely used in many research areas including
ecology. The recent proliferation of the use of in-situ sensors in streams and
rivers supports space-time water quality modelling and monitoring in near
real-time. In this paper, we introduce a new family of dynamic spatio-temporal
models, in which spatial dependence is established based on stream distance and
temporal autocorrelation is incorporated using vector autoregression
approaches. We propose several variations of these novel models using a
Bayesian framework. Our results show that our proposed models perform well
using spatio-temporal data collected from real stream networks, particularly in
terms of out-of-sample RMSPE. This is illustrated considering a case study of
water temperature data in the northwestern United States.Comment: 26 pages, 10 fig
Are the current gRNA ranking prediction algorithms useful for genome editing in plants?
Introducing a new trait into a crop through conventional breeding commonly takes decades, but recently developed genome sequence modification technology has the potential to accelerate this process. One of these new breeding technologies relies on an RNA-directed DNA nuclease (CRISPR/Cas9) to cut the genomic DNA, in vivo, to facilitate the deletion or insertion of sequences. This sequence specific targeting is determined by guide RNAs (gRNAs). However, choosing an optimum gRNA sequence has its challenges. Almost all current gRNA design tools for use in plants are based on data from experiments in animals, although many allow the use of plant genomes to identify potential off-target sites. Here, we examine the predictive uniformity and performance of eight different online gRNA-site tools. Unfortunately, there was little consensus among the rankings by the different algorithms, nor a statistically significant correlation between rankings and in vivo effectiveness. This suggests that important factors affecting gRNA performance and/or target site accessibility, in plants, are yet to be elucidated and incorporated into gRNA-site prediction tools
- β¦