296 research outputs found
ExplainIt! -- A declarative root-cause analysis engine for time series data (extended version)
We present ExplainIt!, a declarative, unsupervised root-cause analysis engine
that uses time series monitoring data from large complex systems such as data
centres. ExplainIt! empowers operators to succinctly specify a large number of
causal hypotheses to search for causes of interesting events. ExplainIt! then
ranks these hypotheses, reducing the number of causal dependencies from
hundreds of thousands to a handful for human understanding. We show how a
declarative language, such as SQL, can be effective in declaratively
enumerating hypotheses that probe the structure of an unknown probabilistic
graphical causal model of the underlying system. Our thesis is that databases
are in a unique position to enable users to rapidly explore the possible causal
mechanisms in data collected from diverse sources. We empirically demonstrate
how ExplainIt! had helped us resolve over 30 performance issues in a commercial
product since late 2014, of which we discuss a few cases in detail.Comment: SIGMOD Industry Track 201
Analyzing Granger causality in climate data with time series classification methods
Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested
Upgrade of the HadGEM3-A based attribution system to high resolution and a new validation framework for probabilistic event attribution
We present a substantial upgrade of the Met Office system for the probabilistic attribution of extreme weather and climate events with higher horizontal and vertical resolution (60 km mid-latitudes and 85 vertical levels), the latest Hadley Centre atmospheric and land model (ENDGame dynamics with GA6.0 science and JULES at GL6.0) as well as an updated forcings set. A new set of experiments designed for the evaluation and implementation of an operational attribution service are described which consist of pairs of multi-decadal stochastic physics ensembles continued on a season by season basis by large ensembles that are able to sample extreme at- mospheric states possible in the recent past. Diagnostics from these experiments form the HadGEM3-A contribution to the international Climate of the 20th Century Plus (C20Cþ) project and were analysed under the European Climate and Weather Events: Interpretation and Attribution (EUCLEIA) event attribution project as well as contributing to the Climate Science for Service Partnership (CSSP)-China programme. After discussing the framing issues surrounding questions that can be asked with our system we construct a novel approach to the evaluation of atmosphere-only ensembles intended for event attribution, in the process highlighting and clarifying the distinction between hindcast skill and model performance. A framework based around assessing model representation of predictable components and ensuring exchangeability of model and real world statistics leads to a form of detection and attribution to boundary condition forcing as a means of quantifying one degree of freedom of potential model error and allowing for the bias correction of event probabilities and resulting probability ratios. This method is then applied systematically across the globe to assess contributions from anthropogenic influence and specific boundary conditions to the changing probability of observed and record seasonal mean temperatures of four recent 3-month seasons from March 2016–February 2017
End-to-end anomaly detection in stream data
Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health
Recommended from our members
Electric Vehicle - Smart Grid Integration: Load Modeling, Scheduling, and Cyber Security
The modern world has witnessed the surge of electric vehicles (EVs) driven by government policy worldwide to reduce transportation’s dependence on fossil fuels. According to (Slowik, 2019), the global EV market has grown sharply with the annual light-duty EV sales surpassing 2 million in 2018, which is about a 70% increase from 2017. The increase in EV population implies the rise in energy demand, and that introduces new challenges to the electricity sector. EV charging load demand in high penetration scenarios, which is foreseen, may lead to stability and quality issues in power grids. Generation capacity and the electricity infrastructure upgrade may be required to address those issues; however, it increases generation costs significantly. The most common EV chargers installed today deliver around 7 kW of power, which is over four times that of an average household power consumption in the US. EV charging load often shows two peaks in a day, one in the morning when people plug in the EV at the workplace and the other in the evening when people get home from work. Without proper energy management for EV charging, the vast power demand due to a large number of plugged-in EVs can stress the electric grid, degrade the electric power quality, and impact the wholesale electricity market. Although an EV battery may store energy up to 80 kWh, which requires more than 10 hours to charge at 7kW from empty, we found that most EVs need only 12 kWh per charge or 1.7 hours at 7 kW to meet daily commute requirement while they stay in the parking garage for a more extended period. This implies that EVs can have considerable time-flexibility for charging, and it is not necessary to start chargingright after plugging in, which is likely to result in the charging power add-up. A proper EV charging schedule can well allocate the charging load to prevent power peaks. Therefore, EV charging scheduling can play a significant role in mitigating the adverse effects of vast EV charging demand without upgrading the power grid capacity.To optimize the EV charging schedule while satisfies EVs’ charging demand, each EV’s stay duration and energy need are essential parameters for the optimization. Those parameters are based on predictions to minimize human intervention. Nonetheless, the uncertainty of EV user behavior poses a challenge to the prediction accuracy. Therefore, this dissertation demonstrates an ensemble machine learning-based method to model and predict the EV loads accurately, thereby improving the performance of EV charging scheduling.On the other hand, this smart EV-grid integration, which requires massive communication, including collecting, transmitting, and distributing real-time data within the network, makes it more susceptible to cyber-physical threats. Potential breaches could not only affect grid operation but also reduce consumers’ willingness to adopting EVs over conventional fuel-powered vehicles. This dissertation also presents the vulnerability analysis and risk assessment for a smart EV charging system to develop the countermeasures to secure the network. Also, while it is inevitable that the security has flaws, this dissertation provides a novel anomaly detection approach based on the invariant correlations of different measurements within the EV charging network
Deep Learning for Decision Making and Autonomous Complex Systems
Deep learning consists of various machine learning algorithms that aim to learn multiple levels of abstraction from data in a hierarchical manner. It is a tool to construct models using the data that mimics a real world process without an exceedingly tedious modelling of the actual process. We show that deep learning is a viable solution to decision making in mechanical engineering problems and complex physical systems.
In this work, we demonstrated the application of this data-driven method in the design of microfluidic devices to serve as a map between the user-defined cross-sectional shape of the flow and the corresponding arrangement of micropillars in the flow channel that contributed to the flow deformation. We also present how deep learning can be used in the early detection of combustion instability for prognostics and health monitoring of a combustion engine, such that appropriate measures can be taken to prevent detrimental effects as a result of unstable combustion.
One of the applications in complex systems concerns robotic path planning via the systematic learning of policies and associated rewards. In this context, a deep architecture is implemented to infer the expected value of information gained by performing an action based on the states of the environment. We also applied deep learning-based methods to enhance natural low-light images in the context of a surveillance framework and autonomous robots. Further, we looked at how machine learning methods can be used to perform root-cause analysis in cyber-physical systems subjected to a wide variety of operation anomalies. In all studies, the proposed frameworks have been shown to demonstrate promising feasibility and provided credible results for large-scale implementation in the industry
- …