93 research outputs found

    Optimal detection of changepoints with a linear computational cost

    Full text link
    We consider the problem of detecting multiple changepoints in large data sets. Our focus is on applications where the number of changepoints will increase as we collect more data: for example in genetics as we analyse larger regions of the genome, or in finance as we observe time-series over longer periods. We consider the common approach of detecting changepoints through minimising a cost function over possible numbers and locations of changepoints. This includes several established procedures for detecting changing points, such as penalised likelihood and minimum description length. We introduce a new method for finding the minimum of such cost functions and hence the optimal number and location of changepoints that has a computational cost which, under mild conditions, is linear in the number of observations. This compares favourably with existing methods for the same problem whose computational cost can be quadratic or even cubic. In simulation studies we show that our new method can be orders of magnitude faster than these alternative exact methods. We also compare with the Binary Segmentation algorithm for identifying changepoints, showing that the exactness of our approach can lead to substantial improvements in the accuracy of the inferred segmentation of the data.Comment: 25 pages, 4 figures, To appear in Journal of the American Statistical Associatio

    Automatic Locally Stationary Time Series Forecasting with application to predicting U.K. Gross Value Added Time Series under sudden shocks caused by the COVID pandemic

    Get PDF
    Accurate forecasting of the U.K. gross value added (GVA) is fundamental for measuring the growth of the U.K. economy. A common nonstationarity in GVA data, such as the ABML series, is its increase in variance over time due to inflation. Transformed or inflation-adjusted series can still be challenging for classical stationarity-assuming forecasters. We adopt a different approach that works directly with the GVA series by advancing recent forecasting methods for locally stationary time series. Our approach results in more accurate and reliable forecasts, and continues to work well even when the ABML series becomes highly variable during the COVID pandemic.Comment: 21 pages, 4 figure

    A computationally efficient, high-dimensional multiple changepoint procedure with application to global terrorism incidence

    Get PDF
    Detecting changepoints in datasets with many variates is a data science challenge of increasing importance. Motivated by the problem of detecting changes in the incidence of terrorism from a global terrorism database, we propose a novel approach to multiple changepoint detection in multivariate time series. Our method, which we call SUBSET, is a model-based approach which uses a penalised likelihood to detect changes for a wide class of parametric settings. We provide theory that guides the choice of penalties to use for SUBSET, and that shows it has high power to detect changes regardless of whether only a few variates or many variates change. Empirical results show that SUBSET out-performs many existing approaches for detecting changes in mean in Gaussian data; additionally, unlike these alternative methods, it can be easily extended to non-Gaussian settings such as are appropriate for modelling counts of terrorist events

    Case study:shipping trend estimation and prediction via multiscale variance stabilisation

    Get PDF
    <p>Shipping and shipping services are a key industry of great importance to the economy of Cyprus and the wider European Union. Assessment, management and future steering of the industry, and its associated economy, is carried out by a range of organisations and is of direct interest to a number of stakeholders. This article presents an analysis of shipping credit flow data: an important and archetypal series whose analysis is hampered by rapid changes of variance. Our analysis uses the recently developed data-driven Haar–Fisz transformation that enables accurate trend estimation and successful prediction in these kinds of situation. Our trend estimation is augmented by bootstrap confidence bands, new in this context. The good performance of the data-driven Haar–Fisz transform contrasts with the poor performance exhibited by popular and established variance stabilisation alternatives: the Box–Cox, logarithm and square root transformations.</p

    Wndchrm – an open source utility for biological image analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biological imaging is an emerging field, covering a wide range of applications in biological and clinical research. However, while machinery for automated experimenting and data acquisition has been developing rapidly in the past years, automated image analysis often introduces a bottleneck in high content screening.</p> <p>Methods</p> <p><it>Wndchrm </it>is an open source utility for biological image analysis. The software works by first extracting image content descriptors from the raw image, image transforms, and compound image transforms. Then, the most informative features are selected, and the feature vector of each image is used for classification and similarity measurement.</p> <p>Results</p> <p><it>Wndchrm </it>has been tested using several publicly available biological datasets, and provided results which are favorably comparable to the performance of task-specific algorithms developed for these datasets. The simple user interface allows researchers who are not knowledgeable in computer vision methods and have no background in computer programming to apply image analysis to their data.</p> <p>Conclusion</p> <p>We suggest that <it>wndchrm </it>can be effectively used for a wide range of biological image analysis tasks. Using <it>wndchrm </it>can allow scientists to perform automated biological image analysis while avoiding the costly challenge of implementing computer vision and pattern recognition algorithms.</p

    Bayesian Wavelet Shrinkage of the Haar-Fisz Transformed Wavelet Periodogram.

    Get PDF
    It is increasingly being realised that many real world time series are not stationary and exhibit evolving second-order autocovariance or spectral structure. This article introduces a Bayesian approach for modelling the evolving wavelet spectrum of a locally stationary wavelet time series. Our new method works by combining the advantages of a Haar-Fisz transformed spectrum with a simple, but powerful, Bayesian wavelet shrinkage method. Our new method produces excellent and stable spectral estimates and this is demonstrated via simulated data and on differenced infant electrocardiogram data. A major additional benefit of the Bayesian paradigm is that we obtain rigorous and useful credible intervals of the evolving spectral structure. We show how the Bayesian credible intervals provide extra insight into the infant electrocardiogram data

    High methylmercury in Arctic and subarctic ponds is related to nutrient levels in the warming eastern Canadian Arctic

    Get PDF
    Permafrost thaw ponds are ubiquitous in the eastern Canadian Arctic, yet little information exists on their potential as sources of methylmercury (MeHg) to freshwaters. They are microbially active and conducive to methylation of inorganic mercury, and are also affected by Arctic warming. This multiyear study investigated thaw ponds in a discontinuous permafrost region in the Subarctic taiga (Kuujjuarapik-Whapmagoostui, QC) and a continuous permafrost region in the Arctic tundra (Bylot Island, NU). MeHg concentrations in thaw ponds were well above levels measured in most freshwater ecosystems in the Canadian Arctic (>0.1 ng L−1). On Bylot, ice-wedge trough ponds showed significantly higher MeHg (0.3−2.2 ng L−1) than polygonal ponds (0.1−0.3 ng L−1) or lakes (<0.1 ng L−1). High MeHg was measured in the bottom waters of Subarctic thaw ponds near Kuujjuarapik (0.1−3.1 ng L−1). High water MeHg concentrations in thaw ponds were strongly correlated with variables associated with high inputs of organic matter (DOC, a320, Fe), nutrients (TP, TN), and microbial activity (dissolved CO2 and CH4). Thawing permafrost due to Arctic warming will continue to release nutrients and organic carbon into these systems and increase ponding in some regions, likely stimulating higher water concentrations of MeHg. Greater hydrological connectivity from permafrost thawing may potentially increase transport of MeHg from thaw ponds to neighboring aquatic ecosystems

    Identification of a Novel Chromosomal Passenger Complex and Its Unique Localization during Cytokinesis in Trypanosoma brucei

    Get PDF
    Aurora B kinase is a key component of the chromosomal passenger complex (CPC), which regulates chromosome segregation and cytokinesis. An ortholog of Aurora B was characterized in Trypanosoma brucei (TbAUK1), but other conserved components of the complex have not been found. Here we identified four novel TbAUK1 associated proteins by tandem affinity purification and mass spectrometry. Among these four proteins, TbKIN-A and TbKIN-B are novel kinesin homologs, whereas TbCPC1 and TbCPC2 are hypothetical proteins without any sequence similarity to those known CPC components from yeasts and metazoans. RNAi-mediated silencing of each of the four genes led to loss of spindle assembly, chromosome segregation and cytokinesis. TbKIN-A localizes to the mitotic spindle and TbKIN-B to the spindle midzone during mitosis, whereas TbCPC1, TbCPC2 and TbAUK1 display the dynamic localization pattern of a CPC. After mitosis, the CPC disappears from the central spindle and re-localizes at a dorsal mid-point of the mother cell, where the anterior tip of the daughter cell is tethered, to start cell division toward the posterior end, indicating a most unusual CPC-initiated cytokinesis in a eukaryote
    • …
    corecore