228 research outputs found

    Spatio-temporal Bayesian on-line changepoint detection with model selection

    Get PDF
    Bayesian On-line Changepoint Detection is extended to on-line model selection and non-stationary spatio-temporal processes. We propose spatially structured Vector Autoregressions (VARs) for modelling the process between changepoints (CPs) and give an upper bound on the approximation error of such models. The resulting algorithm performs prediction, model selection and CP detection on-line. Its time complexity is linear and its space complexity constant, and thus it is two orders of magnitudes faster than its closest competitor. In addition, it outperforms the state of the art for multivariate data.Comment: 10 pages, 7f figures, to appear in Proceedings of the 35th International Conference on Machine Learning 201

    A Composite Likelihood-based Approach for Change-point Detection in Spatio-temporal Process

    Full text link
    This paper develops a unified, accurate and computationally efficient method for change-point inference in non-stationary spatio-temporal processes. By modeling a non-stationary spatio-temporal process as a piecewise stationary spatio-temporal process, we consider simultaneous estimation of the number and locations of change-points, and model parameters in each segment. A composite likelihood-based criterion is developed for change-point and parameters estimation. Asymptotic theories including consistency and distribution of the estimators are derived under mild conditions. In contrast to classical results in fixed dimensional time series that the asymptotic error of change-point estimator is Op(1)O_{p}(1), exact recovery of true change-points is guaranteed in the spatio-temporal setting. More surprisingly, the consistency of change-point estimation can be achieved without any penalty term in the criterion function. A computational efficient pruned dynamic programming algorithm is developed for the challenging criterion optimization problem. Simulation studies and an application to U.S. precipitation data are provided to demonstrate the effectiveness and practicality of the proposed method

    Statistical Modeling for Complex Data

    Get PDF
    In this dissertation, we focus on statistical modeling techniques for exploring complex data with features such as high dimensionality, nonstationary structure, heavy-tailed distributions, missing data, etc. We study four problems: dimension reduction in high-dimensional data, clarifying complex patterns in nonstationary spatial data, improving hierarchical Bayesian modeling of spatio-temporal data with staircase pattern of missing observations, and detecting change points in spatio-temporal data with outliers and heavy-tailed observations. Sufficient dimension reduction draws a lot of attention in the last twenty years due to the largely increasing dimensions of the covariates. The semiparametric approach to dimension reduction proposed by Ma and Zhu [2012] is a novel and completely different approach to dimension-reduction problems from the existing literature. We present a theoretical result that relaxes a critical condition required by the semiparametric approach. The asymptotic normality of the estimators still maintains under weaker assumptions. This improvement increases the applicability of the semiparametric approach. For spatial data, nonstationarity brings difficulties to learn the underlying processes, more specifically, to find spatial dependency using the semivariogram model. We improve the modeling technique through dimension expansion proposed by Bornn et al. [2012] by considering the correlation structure. We propose two generalized least-squares methods. Both of the methods provide more accurate parameter estimations than the least-squares method, which has been demonstrated through simulation studies and real data analyses. As spatio-temporal data are usually observed over a large area and in many years, modeling spatio-temporal data is non-trivial. Missing data makes the task even more challenging. One of the problems discussed in this dissertation is to model ozone concentrations in a region in the presence of missing data. We propose a method without assumptions on the correlation structure to estimate the covariance matrix through dimension expansion method for modeling the semivariograms in nonstationary fields based on the estimations from the hierarchical Bayesian spatio-temporal modeling technique [Le and Zidek, 2006]. For demonstration, we apply the method in ozone concentrations at 25 stations in the Pittsburgh region studied in Jin et al. [2012]. The comparison of the proposed method and the one in Jin et al. [2012] are provided through leave-one-out cross-validation which shows that the proposed method is more general and applicable. The last problem which is also related to spatio-temporal data is to detect structural changes for spatio-temporal data with missing in the presence of outliers and heavy-tailed observations. We improve the estimation algorithm of a general spatio-temporal autoregressive (GSTAR) model proposed by Wu et al. [2017]. We propose M-estimation-based EM algorithm and change-point detection procedure. Through data examples, we compare the proposed algorithm and the proposed change-point detection procedure with the existing ones and show that our method provides more robust estimation and is more accurate in detecting change points in the presence of outliers and/or heavy-tailed observations

    Analysis of Heterogeneous Data Sources for Veterinary Syndromic Surveillance to Improve Public Health Response and Aid Decision Making

    Get PDF
    The standard technique of implementing veterinary syndromic surveillance (VSyS) is the detection of temporal or spatial anomalies in the occurrence of health incidents above a set threshold in an observed population using the Frequentist modelling approach. Most implementation of this technique also requires the removal of historical outbreaks from the datasets to construct baselines. Unfortunately, some challenges exist, such as data scarcity, delayed reporting of health incidents, and variable data availability from sources, which make the VSyS implementation and alarm interpretation difficult, particularly when quantifying surveillance risk with associated uncertainties. This problem indicates that alternate or improved techniques are required to interpret alarms when incorporating uncertainties and previous knowledge of health incidents into the model to inform decision-making. Such methods must be capable of retaining historical outbreaks to assess surveillance risk. In this research work, the Stochastic Quantitative Risk Assessment (SQRA) model was proposed and developed for detecting and quantifying the risk of disease outbreaks with associated uncertainties using the Bayesian probabilistic approach in PyMC3. A systematic and comparative evaluation of the available techniques was used to select the most appropriate method and software packages based on flexibility, efficiency, usability, ability to retain historical outbreaks, and the ease of developing a model in Python. The social media datasets (Twitter) were first applied to infer a possible disease outbreak incident with associated uncertainties. Then, the inferences were subsequently updated using datasets from the clinical and other healthcare sources to reduce uncertainties in the model and validate the outbreak. Therefore, the proposed SQRA model demonstrates an approach that uses the successive refinement of analysis of different data streams to define a changepoint signalling a disease outbreak. The SQRA model was tested and validated to show the method's effectiveness and reliability for differentiating and identifying risk regions with corresponding changepoints to interpret an ongoing disease outbreak incident. This demonstrates that a technique such as the SQRA method obtained through this research may aid in overcoming some of the difficulties identified in VSyS, such as data scarcity, delayed reporting, and variable availability of data from sources, ultimately contributing to science and practice

    Greedy online change point detection

    Full text link
    Standard online change point detection (CPD) methods tend to have large false discovery rates as their detections are sensitive to outliers. To overcome this drawback, we propose Greedy Online Change Point Detection (GOCPD), a computationally appealing method which finds change points by maximizing the probability of the data coming from the (temporal) concatenation of two independent models. We show that, for time series with a single change point, this objective is unimodal and thus CPD can be accelerated via ternary search with logarithmic complexity. We demonstrate the effectiveness of GOCPD on synthetic data and validate our findings on real-world univariate and multivariate settings.Comment: Accepted at IEEE MLSP 202
    • …
    corecore