172 research outputs found

    Adaptive estimation and change detection of correlation and quantiles for evolving data streams

    Get PDF
    Streaming data processing is increasingly playing a central role in enterprise data architectures due to an abundance of available measurement data from a wide variety of sources and advances in data capture and infrastructure technology. Data streams arrive, with high frequency, as never-ending sequences of events, where the underlying data generating process always has the potential to evolve. Business operations often demand real-time processing of data streams for keeping models up-to-date and timely decision-making. For example in cybersecurity contexts, analysing streams of network data can aid the detection of potentially malicious behaviour. Many tools for statistical inference cannot meet the challenging demands of streaming data, where the computational cost of updates to models must be constant to ensure continuous processing as data scales. Moreover, these tools are often not capable of adapting to changes, or drift, in the data. Thus, new tools for modelling data streams with efficient data processing and model updating capabilities, referred to as streaming analytics, are required. Regular intervention for control parameter configuration is prohibitive to the truly continuous processing constraints of streaming data. There is a notable absence of such tools designed with both temporal-adaptivity to accommodate drift and the autonomy to not rely on control parameter tuning. Streaming analytics with these properties can be developed using an Adaptive Forgetting (AF) framework, with roots in adaptive filtering. The fundamental contributions of this thesis are to extend the streaming toolkit by using the AF framework to develop autonomous and temporally-adaptive streaming analytics. The first contribution uses the AF framework to demonstrate the development of a model, and validation procedure, for estimating time-varying parameters of bivariate data streams from cyber-physical systems. This is accompanied by a novel continuous monitoring change detection system that compares adaptive and non-adaptive estimates. The second contribution is the development of a streaming analytic for the correlation coefficient and an associated change detector to monitor changes to correlation structures across streams. This is demonstrated on cybersecurity network data. The third contribution is a procedure for estimating time-varying binomial data with thorough exploration of the nuanced behaviour of this estimator. The final contribution is a framework to enhance extant streaming quantile estimators with autonomous, temporally-adaptive properties. In addition, a novel streaming quantile procedure is developed and demonstrated, in an extensive simulation study, to show appealing performance.Open Acces

    Long-term-robust adaptation strategies for reservoir operation considering magnitude and timing of climate change: application to Diyala River Basin in Iraq

    Get PDF
    2020 Spring.Includes bibliographical references.Vulnerability assessment due to climate change impacts is of paramount importance for reservoir operation to achieve the goals of water resources management. This requires accurate forcing and basin data to build a valid hydrology model and assessment of the sensitivity of model results to the forcing data and uncertainty of model parameters. The first objective of this study is to construct the model and identify its sensitivity to the model parameters and uncertainty of the forcing data. The second objective is to develop a Parametric Regional Weather Generator (RP-WG) for use in areas with limited data availability that mimics observed characteristics. The third objective is to propose and assess a decision-making framework to evaluate pre-specified reservoir operation plans, determine the theoretical optimal plan, and identify the anticipated best timeframe for implementation by considering all possible climate scenarios. To construct the model, the Variable Infiltration Capacity (VIC) platform was selected to simulate the characteristics of the Diyala River Basin (DRB) in Iraq. Several methods were used to obtain the forcing data and they were validated using the Kling–Gupta efficiency (KGE) metric. Variables considered include precipitation, temperature, and wind speed. Model sensitivity and uncertainty were examined by the Generalized Likelihood Uncertainty Estimation (GLUE) and the Differential Evolution Adaptive Metropolis (DREAM) techniques. The proposed RP-WG was based on (1) a First-order, Two-state Markov Chain to simulate precipitation occurrences; (2) use of Wilks' technique to produce correlated weather variables at multiple sites with conservation of spatial, temporal, and cross correlations; and (3) the capability to produce a wide range of synthetic climate scenarios. A probabilistic decision-making framework under nonstationary hydroclimatic conditions was proposed with four stages: (1) climate exposure generation (2) supply scenario calculations, (3) demand scenario calculations, and (4) multi-objective performance assessment. The framework incorporated a new metric called Maximum Allowable Time to examine the timeframe for robust adaptations. Three synthetic pre-suggested plans were examined to avoid undesirable long-term climate change impacts, while the theoretical-optimal plan was identified by the Non-dominated Sorting Genetic Algorithm II. The multiplicative random cascade and Schaake Shuffle techniques were used to determine daily precipitation data, while a set of correction equations was developed to adjust the daily temperature and wind speed. The depth of the second soil layer caused most sensitivity in the VIC model, and the uncertainty intervals demonstrated the validity of the VIC model to generate reasonable forecasts. The daily VIC outputs were calibrated with a KGE average of 0.743, and they were free from non-normality, heteroscedasticity, and auto-correlation. Results of the PR-WG evaluation show that it exhibited high values of the KGE, preserved the statistical properties of the observed variables, and conserved the spatial, temporal, and cross correlations among the weather variables at all sites. Finally, risk assessment results show that current operational rules are robust for flood protection but vulnerable in drought periods. This implies that the project managers should pay special attention to the drought and spur new technologies to counteract. Precipitation changes were dominant in flood and drought management, and temperature and wind speed changes effects were significant during drought. The results demonstrated the framework's effectiveness to quantify detrimental climate change effects in magnitude and timing with the ability to provide a long-term guide (and timeframe) to avert the negative impacts

    Generalised Kernel Representations with Applications to Data Efficient Machine Learning

    Get PDF
    The universe of mathematical modelling from observational data is a vast space. It consists a cacophony of differing paths, with doors to worlds with seemingly diametrically opposed perspectives that all attempt to conjure a crystal ball of both intuitive understanding and predictive capability. Among these many worlds is an approach that is broadly called kernel methods, which, while complex in detail, when viewed from afar ultimately reduces to a rather simple question: how close is something to something else? What does it mean to be close? Specifically, how can we quantify closeness in some reasonable and principled way? This thesis presents four approaches that address generalised kernel learning. Firstly, we introduce a probabilistic framework that allows joint learning of model and kernel parameters in order to capture nonstationary spatial phenomena. Secondly, we introduce a theoretical framework based on optimal transport that enables online kernel parameter transfer. Such parameter transfer involves the ability to re-use previously learned parameters, without re-optimisation, on newly observed data. This extends the first contribution which was unable operate in real-time due to the necessity of reoptimising parameters to new observations. Thirdly, we introduce a learnable Fourier based kernel embeddings that exploits generalised quantile representations for stationary kernels. Finally, a method for input warped Fourier kernel embeddings is proposed that allows nonstationary data embeddings using simple stationary kernels. By introducing theoretically cohesive and algorithmically intuitive methods this thesis opens new doors to removing traditional assumptions that have hindered adoption of the kernel perspective. We hope that the ideas presented will demonstrate a curious and inspiring view to the potential of learnable kernel embeddings

    Hydrometeorological Extremes and Its Local Impacts on Human-Environmental Systems

    Get PDF
    This Special Issue of Atmosphere focuses on hydrometeorological extremes and their local impacts on human–environment systems. Particularly, we accepted submissions on the topics of observational and model-based studies that could provide useful information for infrastructure design, decision making, and policy making to achieve our goals of enhancing the resilience of human–environment systems to climate change and increased variability

    Replication or exploration? Sequential design for stochastic simulation experiments

    Full text link
    We investigate the merits of replication, and provide methods for optimal design (including replicates), with the goal of obtaining globally accurate emulation of noisy computer simulation experiments. We first show that replication can be beneficial from both design and computational perspectives, in the context of Gaussian process surrogate modeling. We then develop a lookahead based sequential design scheme that can determine if a new run should be at an existing input location (i.e., replicate) or at a new one (explore). When paired with a newly developed heteroskedastic Gaussian process model, our dynamic design scheme facilitates learning of signal and noise relationships which can vary throughout the input space. We show that it does so efficiently, on both computational and statistical grounds. In addition to illustrative synthetic examples, we demonstrate performance on two challenging real-data simulation experiments, from inventory management and epidemiology.Comment: 34 pages, 9 figure
    • …
    corecore