4,378 research outputs found
A framework for automated anomaly detection in high frequency water-quality data from in situ sensors
River water-quality monitoring is increasingly conducted using automated in
situ sensors, enabling timelier identification of unexpected values. However,
anomalies caused by technical issues confound these data, while the volume and
velocity of data prevent manual detection. We present a framework for automated
anomaly detection in high-frequency water-quality data from in situ sensors,
using turbidity, conductivity and river level data. After identifying end-user
needs and defining anomalies, we ranked their importance and selected suitable
detection methods. High priority anomalies included sudden isolated spikes and
level shifts, most of which were classified correctly by regression-based
methods such as autoregressive integrated moving average models. However, using
other water-quality variables as covariates reduced performance due to complex
relationships among variables. Classification of drift and periods of
anomalously low or high variability improved when we applied replaced anomalous
measurements with forecasts, but this inflated false positive rates.
Feature-based methods also performed well on high priority anomalies, but were
also less proficient at detecting lower priority anomalies, resulting in high
false negative rates. Unlike regression-based methods, all feature-based
methods produced low false positive rates, but did not and require training or
optimization. Rule-based methods successfully detected impossible values and
missing observations. Thus, we recommend using a combination of methods to
improve anomaly detection performance, whilst minimizing false detection rates.
Furthermore, our framework emphasizes the importance of communication between
end-users and analysts for optimal outcomes with respect to both detection
performance and end-user needs. Our framework is applicable to other types of
high frequency time-series data and anomaly detection applications
Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories
Phylodynamics is an area of population genetics that uses genetic sequence
data to estimate past population dynamics. Modern state-of-the-art Bayesian
nonparametric methods for recovering population size trajectories of unknown
form use either change-point models or Gaussian process priors. Change-point
models suffer from computational issues when the number of change-points is
unknown and needs to be estimated. Gaussian process-based methods lack local
adaptivity and cannot accurately recover trajectories that exhibit features
such as abrupt changes in trend or varying levels of smoothness. We propose a
novel, locally-adaptive approach to Bayesian nonparametric phylodynamic
inference that has the flexibility to accommodate a large class of functional
behaviors. Local adaptivity results from modeling the log-transformed effective
population size a priori as a horseshoe Markov random field, a recently
proposed statistical model that blends together the best properties of the
change-point and Gaussian process modeling paradigms. We use simulated data to
assess model performance, and find that our proposed method results in reduced
bias and increased precision when compared to contemporary methods. We also use
our models to reconstruct past changes in genetic diversity of human hepatitis
C virus in Egypt and to estimate population size changes of ancient and modern
steppe bison. These analyses show that our new method captures features of the
population size trajectories that were missed by the state-of-the-art methods.Comment: 36 pages, including supplementary informatio
Laws and Limits of Econometrics
We start by discussing some general weaknesses and limitations of the econometric approach. A template from sociology is used to formulate six laws that characterize mainstream activities of econometrics and the scientific limits of those activities, we discuss some proximity theorems that quantify by means of explicit bounds how close we can get to the generating mechanism of the data and the optimal forecasts of next period observations using a finite number of observations. The magnitude of the bound depends on the characteristics of the model and the trajectory of the observed data. The results show that trends are more elusive to model than stationary processes in the sense that the proximity bounds are larger. By contrast, the bounds are of smaller order for models that are unidentified or nearly unidentified, so that lack or near lack of identification may not be as fatal to the use of a model in practice as some recent results on inference suggest, we look at one possible future of econometrics that involves the use of advanced econometric methods interactively by way of a web browser. With these methods users may access a suite of econometric methods and data sets online. They may also upload data to remote servers and by simple web browser selections initiate the implementation of advanced econometric software algorithms, returning the results online and by file and graphics downloads.Activities and limitations of econometrics, automated modeling, nearly unidentified models, nonstationarity, online econometrics, policy analysis, prediction, quantitative bounds, trends, unit roots, weak instruments
Spurious Regression and Trending Variables
This paper analyses the asymptotic and finite sample implications of different types of nonstationary behavior among the dependent and explanatory variables in a linear spurious regression model. We study cases when the nonstationarity in the dependent and explanatory variables is deterministic as well as stochastic. In particular, we derive the order in probability of the t-statistic in a linear regression equation under a variety of empirically relevant data generation processes, and show that he spurious regression phenomenon is present in all cases considered, when at least one of the variables behaves in a nonstationary way. Simulation experiments confirm our asymptotic results.Trend Stationarity, Structural Breaks, Spurious Regression, Unit Roots, Trends
Spurious Regression and Econometric Trends
This paper analyses the asymptotic and finite sample implications of different types of nonstationary behavior among the dependent and explanatory variables in a linear spurious regression model. We study cases when the nonstationarity in the dependent and explanatory variables is deterministic as well as stochastic. In particular, we derive the order in probability of the t-statistic in a linear regression equation under a variety of empirically relevant data generation processes, and show that the spurious regression phenomenon is present in all cases considered, when at least one of the variables behaves in a nonstationary way. Simulation experiments confirm our asymptotic results.Spurious regression, trends, unit roots, trend stationarity, structural breaks
Uncertainty Quantification for complex computer models with nonstationary output. Bayesian optimal design for iterative refocussing
In this thesis, we provide the Uncertainty Quantification (UQ) tools to assist automatic and robust calibration of complex computer models. Our tools allow users to construct a cheap (statistical) surrogate, a Gaussian process (GP) emulator, based on a small number of climate model runs. History matching (HM), the calibration process of removing parameter space for which computer model outputs are inconsistent with the observations, is combined with an emulator. The remaining subset of parameter space is termed the Not Ruled Out Yet (NROY). A weakly stationary GP with a covariance function that depends on the distance between two input points is the principal tool in UQ. However, the stationarity assumption is inadequate when we operate with a heterogeneous model response. In this thesis, we develop diagnostic-led nonstationary GP emulators with a kernel mixture. We employ diagnostics from a stationary GP fit to identify input regions with distinct model behaviour and obtain mixing functions for a kernel mixture. The result is a continuous emulator in parameter space that adapts to changes in model response behaviour. History matching has proven to be more effective when performed in waves. At each wave of HM, a new ensemble is obtained to update an emulator before finding an NROY space. In this thesis, we propose a Bayesian experimental design with a loss function that compares the volume of the NROY space obtained with an updated emulator to the volume of the “true” NROY space obtained using a “perfect” emulator. We combine Bayesian Design Criterion with our proposed nonstationary GP emulator to perform calibration of climate model
- …