4,378 research outputs found

    A framework for automated anomaly detection in high frequency water-quality data from in situ sensors

    Full text link
    River water-quality monitoring is increasingly conducted using automated in situ sensors, enabling timelier identification of unexpected values. However, anomalies caused by technical issues confound these data, while the volume and velocity of data prevent manual detection. We present a framework for automated anomaly detection in high-frequency water-quality data from in situ sensors, using turbidity, conductivity and river level data. After identifying end-user needs and defining anomalies, we ranked their importance and selected suitable detection methods. High priority anomalies included sudden isolated spikes and level shifts, most of which were classified correctly by regression-based methods such as autoregressive integrated moving average models. However, using other water-quality variables as covariates reduced performance due to complex relationships among variables. Classification of drift and periods of anomalously low or high variability improved when we applied replaced anomalous measurements with forecasts, but this inflated false positive rates. Feature-based methods also performed well on high priority anomalies, but were also less proficient at detecting lower priority anomalies, resulting in high false negative rates. Unlike regression-based methods, all feature-based methods produced low false positive rates, but did not and require training or optimization. Rule-based methods successfully detected impossible values and missing observations. Thus, we recommend using a combination of methods to improve anomaly detection performance, whilst minimizing false detection rates. Furthermore, our framework emphasizes the importance of communication between end-users and analysts for optimal outcomes with respect to both detection performance and end-user needs. Our framework is applicable to other types of high frequency time-series data and anomaly detection applications

    Horseshoe-based Bayesian nonparametric estimation of effective population size trajectories

    Full text link
    Phylodynamics is an area of population genetics that uses genetic sequence data to estimate past population dynamics. Modern state-of-the-art Bayesian nonparametric methods for recovering population size trajectories of unknown form use either change-point models or Gaussian process priors. Change-point models suffer from computational issues when the number of change-points is unknown and needs to be estimated. Gaussian process-based methods lack local adaptivity and cannot accurately recover trajectories that exhibit features such as abrupt changes in trend or varying levels of smoothness. We propose a novel, locally-adaptive approach to Bayesian nonparametric phylodynamic inference that has the flexibility to accommodate a large class of functional behaviors. Local adaptivity results from modeling the log-transformed effective population size a priori as a horseshoe Markov random field, a recently proposed statistical model that blends together the best properties of the change-point and Gaussian process modeling paradigms. We use simulated data to assess model performance, and find that our proposed method results in reduced bias and increased precision when compared to contemporary methods. We also use our models to reconstruct past changes in genetic diversity of human hepatitis C virus in Egypt and to estimate population size changes of ancient and modern steppe bison. These analyses show that our new method captures features of the population size trajectories that were missed by the state-of-the-art methods.Comment: 36 pages, including supplementary informatio

    Laws and Limits of Econometrics

    Get PDF
    We start by discussing some general weaknesses and limitations of the econometric approach. A template from sociology is used to formulate six laws that characterize mainstream activities of econometrics and the scientific limits of those activities, we discuss some proximity theorems that quantify by means of explicit bounds how close we can get to the generating mechanism of the data and the optimal forecasts of next period observations using a finite number of observations. The magnitude of the bound depends on the characteristics of the model and the trajectory of the observed data. The results show that trends are more elusive to model than stationary processes in the sense that the proximity bounds are larger. By contrast, the bounds are of smaller order for models that are unidentified or nearly unidentified, so that lack or near lack of identification may not be as fatal to the use of a model in practice as some recent results on inference suggest, we look at one possible future of econometrics that involves the use of advanced econometric methods interactively by way of a web browser. With these methods users may access a suite of econometric methods and data sets online. They may also upload data to remote servers and by simple web browser selections initiate the implementation of advanced econometric software algorithms, returning the results online and by file and graphics downloads.Activities and limitations of econometrics, automated modeling, nearly unidentified models, nonstationarity, online econometrics, policy analysis, prediction, quantitative bounds, trends, unit roots, weak instruments

    Spurious Regression and Trending Variables

    Get PDF
    This paper analyses the asymptotic and finite sample implications of different types of nonstationary behavior among the dependent and explanatory variables in a linear spurious regression model. We study cases when the nonstationarity in the dependent and explanatory variables is deterministic as well as stochastic. In particular, we derive the order in probability of the t-statistic in a linear regression equation under a variety of empirically relevant data generation processes, and show that he spurious regression phenomenon is present in all cases considered, when at least one of the variables behaves in a nonstationary way. Simulation experiments confirm our asymptotic results.Trend Stationarity, Structural Breaks, Spurious Regression, Unit Roots, Trends

    Spurious Regression and Econometric Trends

    Get PDF
    This paper analyses the asymptotic and finite sample implications of different types of nonstationary behavior among the dependent and explanatory variables in a linear spurious regression model. We study cases when the nonstationarity in the dependent and explanatory variables is deterministic as well as stochastic. In particular, we derive the order in probability of the t-statistic in a linear regression equation under a variety of empirically relevant data generation processes, and show that the spurious regression phenomenon is present in all cases considered, when at least one of the variables behaves in a nonstationary way. Simulation experiments confirm our asymptotic results.Spurious regression, trends, unit roots, trend stationarity, structural breaks

    Uncertainty Quantification for complex computer models with nonstationary output. Bayesian optimal design for iterative refocussing

    Get PDF
    In this thesis, we provide the Uncertainty Quantification (UQ) tools to assist automatic and robust calibration of complex computer models. Our tools allow users to construct a cheap (statistical) surrogate, a Gaussian process (GP) emulator, based on a small number of climate model runs. History matching (HM), the calibration process of removing parameter space for which computer model outputs are inconsistent with the observations, is combined with an emulator. The remaining subset of parameter space is termed the Not Ruled Out Yet (NROY). A weakly stationary GP with a covariance function that depends on the distance between two input points is the principal tool in UQ. However, the stationarity assumption is inadequate when we operate with a heterogeneous model response. In this thesis, we develop diagnostic-led nonstationary GP emulators with a kernel mixture. We employ diagnostics from a stationary GP fit to identify input regions with distinct model behaviour and obtain mixing functions for a kernel mixture. The result is a continuous emulator in parameter space that adapts to changes in model response behaviour. History matching has proven to be more effective when performed in waves. At each wave of HM, a new ensemble is obtained to update an emulator before finding an NROY space. In this thesis, we propose a Bayesian experimental design with a loss function that compares the volume of the NROY space obtained with an updated emulator to the volume of the “true” NROY space obtained using a “perfect” emulator. We combine Bayesian Design Criterion with our proposed nonstationary GP emulator to perform calibration of climate model
    corecore