7,915 research outputs found

    Improving SIEM for critical SCADA water infrastructures using machine learning

    Get PDF
    Network Control Systems (NAC) have been used in many industrial processes. They aim to reduce the human factor burden and efficiently handle the complex process and communication of those systems. Supervisory control and data acquisition (SCADA) systems are used in industrial, infrastructure and facility processes (e.g. manufacturing, fabrication, oil and water pipelines, building ventilation, etc.) Like other Internet of Things (IoT) implementations, SCADA systems are vulnerable to cyber-attacks, therefore, a robust anomaly detection is a major requirement. However, having an accurate anomaly detection system is not an easy task, due to the difficulty to differentiate between cyber-attacks and system internal failures (e.g. hardware failures). In this paper, we present a model that detects anomaly events in a water system controlled by SCADA. Six Machine Learning techniques have been used in building and evaluating the model. The model classifies different anomaly events including hardware failures (e.g. sensor failures), sabotage and cyber-attacks (e.g. DoS and Spoofing). Unlike other detection systems, our proposed work helps in accelerating the mitigation process by notifying the operator with additional information when an anomaly occurs. This additional information includes the probability and confidence level of event(s) occurring. The model is trained and tested using a real-world dataset

    Data-driven cyber attack detection and mitigation for decentralized wide-area protection and control in smart grids

    Get PDF
    Modern power systems have already evolved into complicated cyber physical systems (CPS), often referred to as smart grids, due to the continuous expansion of the electrical infrastructure, the augmentation of the number of heterogeneous system components and players, and the consequential application of a diversity of information and telecommunication technologies to facilitate the Wide Area Monitoring, Protection and Control (WAMPAC) of the day-to-day power system operation. Because of the reliance on cyber technologies, WAMPAC, among other critical functions, is prone to various malicious cyber attacks. Successful cyber attacks, especially those sabotage the operation of Bulk Electric System (BES), can cause great financial losses and social panics. Application of conventional IT security solutions is indispensable, but it often turns out to be insufficient to mitigate sophisticated attacks that deploy zero-day vulnerabilities or social engineering tactics. To further improve the resilience of the operation of smart grids when facing cyber attacks, it is desirable to make the WAMPAC functions per se capable of detecting various anomalies automatically, carrying out adaptive activity adjustments in time and thus staying unimpaired even under attack. Most of the existing research efforts attempt to achieve this by adding novel functional modules, such as model-based anomaly detectors, to the legacy centralized WAMPAC functions. In contrast, this dissertation investigates the application of data-driven algorithms in cyber attack detection and mitigation within a decentralized architecture aiming at improving the situational awareness and self-adaptiveness of WAMPAC. First part of the research focuses on the decentralization of System Integrity Protection Scheme (SIPS) with Multi-Agent System (MAS), within which the data-driven anomaly detection and optimal adaptive load shedding are further explored. An algorithm named as Support Vector Machine embedded Layered Decision Tree (SVMLDT) is proposed for the anomaly detection, which provides satisfactory detection accuracy as well as decision-making interpretability. The adaptive load shedding is carried out by every agent individually with dynamic programming. The load shedding relies on the load profile propagation among peer agents and the attack adaptiveness is accomplished by maintaining the historical mean of load shedding proportion. Load shedding only takes place after the consensus pertaining to the anomaly detection is achieved among all interconnected agents and it serves the purpose of mitigating certain cyber attacks. The attack resilience of the decentralized SIPS is evaluated using IEEE 39 bus model. It is shown that, unlike the traditional centralized SIPS, the proposed solution is able to carry out the remedial actions under most Denial of Service (DoS) attacks. The second part investigates the clustering based anomalous behavior detection and peer-assisted mitigation for power system generation control. To reduce the dimensionality of the data, three metrics are designed to interpret the behavior conformity of generator within the same balancing area. Semi-supervised K-means clustering and a density sensitive clustering algorithm based on Hieararchical DBSCAN (HDBSCAN) are both applied in clustering in the 3D feature space. Aiming to mitigate the cyber attacks targeting the generation control commands, a peer-assisted strategy is proposed. When the control commands from control center is detected as anomalous, i.e. either missing or the payload of which have been manipulated, the generating unit utilizes the peer data to infer and estimate a new generation adjustment value as replacement. Linear regression is utilized to obtain the relation of control values received by different generating units, Moving Target Defense (MTD) is adopted during the peer selection and 1-dimensional clustering is performed with the inferred control values, which are followed by the final control value estimation. The mitigation strategy proposed requires that generating units can communicate with each other in a peer-to-peer manner. Evaluation results suggest the efficacy of the proposed solution in counteracting data availability and data integrity attacks targeting the generation controls. However, the strategy stays effective only if less than half of the generating units are compromised and it is not able to mitigate cyber attacks targeting the measurements involved in the generation control

    Real-Time Machine Learning Models To Detect Cyber And Physical Anomalies In Power Systems

    Get PDF
    A Smart Grid is a cyber-physical system (CPS) that tightly integrates computation and networking with physical processes to provide reliable two-way communication between electricity companies and customers. However, the grid availability and integrity are constantly threatened by both physical faults and cyber-attacks which may have a detrimental socio-economic impact. The frequency of the faults and attacks is increasing every year due to the extreme weather events and strong reliance on the open internet architecture that is vulnerable to cyber-attacks. In May 2021, for instance, Colonial Pipeline, one of the largest pipeline operators in the U.S., transports refined gasoline and jet fuel from Texas up the East Coast to New York was forced to shut down after being attacked by ransomware, causing prices to rise at gasoline pumps across the country. Enhancing situational awareness within the grid can alleviate these risks and avoid their adverse consequences. As part of this process, the phasor measurement units (PMU) are among the suitable assets since they collect time-synchronized measurements of grid status (30-120 samples/s), enabling the operators to react rapidly to potential anomalies. However, it is still challenging to process and analyze the open-ended source of PMU data as there are more than 2500 PMU distributed across the U.S. and Canada, where each of which generates more than 1.5 TB/month of streamed data. Further, the offline machine learning algorithms cannot be used in this scenario, as they require loading and scanning the entire dataset before processing. The ultimate objective of this dissertation is to develop early detection of cyber and physical anomalies in a real-time streaming environment setting by mining multi-variate large-scale synchrophasor data. To accomplish this objective, we start by investigating the cyber and physical anomalies, analyzing their impact, and critically reviewing the current detection approaches. Then, multiple machine learning models were designed to identify physical and cyber anomalies; the first one is an artificial neural network-based approach for detecting the False Data Injection (FDI) attack. This attack was specifically selected as it poses a serious risk to the integrity and availability of the grid; Secondly, we extend this approach by developing a Random Forest Regressor-based model which not only detects anomalies, but also identifies their location and duration; Lastly, we develop a real-time hoeffding tree-based model for detecting anomalies in steaming networks, and explicitly handling concept drifts. These models have been tested and the experimental results confirmed their superiority over the state-of-the-art models in terms of detection accuracy, false-positive rate, and processing time, making them potential candidates for strengthening the grid\u27s security

    Market Manipulation in Stock and Power Markets: A Study of Indicator-Based Monitoring and Regulatory Challenges

    Get PDF
    In recent years, algorithmic-based market manipulation in stock and power markets has considerably increased, and it is difficult to identify all such manipulation cases. This causes serious challenges for market regulators. This work highlights and lists various aspects of the monitoring of stock and power markets, using as test cases the regulatory agencies and regulatory policies in diverse regions, including Hong Kong, the United Kingdom, the United States and the European Union. Reported cases of market manipulations in the regions are examined. In order to help establish a relevant digital regulatory system, this work reviews and categorizes the indicators used to monitor the stock and power markets, and provides an in-depth analysis of the relationship between the indicators and market manipulation. This study specifically compiles a set of 10 indicators for detecting manipulation in the stock market, utilizing the perspectives of return rate, liquidity, volatility, market sentiment, closing price and firm governance. Additionally, 15 indicators are identified for detecting manipulation in the power market, utilizing the perspectives of market power (also known as pricing power or market structure), market conduct and market performance. Finally, the study elaborates on the current challenges in the regulation of stock and power markets in terms of parameter performance, data availability and technical requirements.publishedVersio

    Understanding Electricity-Theft Behavior via Multi-Source Data

    Full text link
    Electricity theft, the behavior that involves users conducting illegal operations on electrical meters to avoid individual electricity bills, is a common phenomenon in the developing countries. Considering its harmfulness to both power grids and the public, several mechanized methods have been developed to automatically recognize electricity-theft behaviors. However, these methods, which mainly assess users' electricity usage records, can be insufficient due to the diversity of theft tactics and the irregularity of user behaviors. In this paper, we propose to recognize electricity-theft behavior via multi-source data. In addition to users' electricity usage records, we analyze user behaviors by means of regional factors (non-technical loss) and climatic factors (temperature) in the corresponding transformer area. By conducting analytical experiments, we unearth several interesting patterns: for instance, electricity thieves are likely to consume much more electrical power than normal users, especially under extremely high or low temperatures. Motivated by these empirical observations, we further design a novel hierarchical framework for identifying electricity thieves. Experimental results based on a real-world dataset demonstrate that our proposed model can achieve the best performance in electricity-theft detection (e.g., at least +3.0% in terms of F0.5) compared with several baselines. Last but not least, our work has been applied by the State Grid of China and used to successfully catch electricity thieves in Hangzhou with a precision of 15% (an improvement form 0% attained by several other models the company employed) during monthly on-site investigation.Comment: 11 pages, 8 figures, WWW'20 full pape

    Human-aware application of data science techniques

    Get PDF
    In recent years there has been an increase in the use of artificial intelligence and other data-based techniques to automate decision-making in companies, and discover new knowledge in research. In many cases, all this has been performed using very complex algorithms (so-called black-box algorithms), which are capable of detecting very complex patterns, but unfortunately remain nearly uninterpretable. Recently, many researchers and regulatory institutions have begun to raise awareness of their use. On the one hand, the subjects who depend on these decisions are increasingly questioning their use, as they may be victims of biases or erroneous predictions. On the other hand, companies and institutions that use these algorithms want to understand what their algorithm does, extract new knowledge, and prevent errors and improve their predictions in general. All this has meant that researchers have started to focus on the interpretability of their algorithms (for example, through explainable algorithms), and regulatory institutions have started to regulate the use of the data to ensure ethical aspects such as accountability or fairness. This thesis brings together three data science projects in which black-box predictive machine learning has been implemented to make predictions: - The development of an NTL detection system for an international utility company from Spain (Naturgy). We combine a black-box algorithm and an explanatory algorithm to guarantee our system's accuracy, transparency, and robustness. Moreover, we focus our efforts on empowering the stakeholder to play an active role in the model training process. - A collaboration with the University of Padova to provide explainability to a Deep Learning-based KPI system currently implemented by the MyInvenio company. - A collaboration between the author of the thesis and the Universitat de Barcelona to implement an AI solution (a black-box algorithm combined with an explanatory algorithm) to a social science problem. The unique characteristics of each project allow us to offer in this thesis a comprehensive analysis of the challenges and problems that exist in order to achieve a fair, transparent, unbiased and generalizable use of data in a data science project. With the feedback arising from the research carried out to provide satisfactory solutions to these three projects, we aim to: - Understand the reasons why a prediction model can be regarded as unfair or untruthful, making the model not generalisable, and the consequences from a technical point of view in terms of low accuracy of the model, but also how this can affect us as a society. - Determine and correct (or at least mitigate) the situations that cause the problems in terms of robustness and fairness of our data. - Assess the difference between the interpretable algorithms and black-box algorithms. Also, evaluate how well the explanatory algorithms can explain the predictions made by the predictive algorithms. - Highlight what the stakeholder's role in guaranteeing a robust model is and how to convert a data-driven approach to solve a predictive problem into a data-informed approach, where the data patterns and the human knowledge are combined to maximize profit.En els últims anys s'ha produït un augment de l'ús de la intel·ligència artificial i altres tècniques basades en dades per automatitzar la presa de decisions en les empreses, i descobrir nous coneixements en la recerca. En molts casos, tot això s'ha realitzat utilitzant algorismes molt complexos (anomenats algorismes de caixa negra), que són capaços de detectar patrons molt complexos, però, per desgràcia, continuen sent gairebé ininterpretables. Recentment, molts investigadors i institucions reguladores han començat a conscienciar sobre el seu ús. D'una banda, els subjectes que depenen d'aquestes decisions estan qüestionant cada vegada més el seu ús, ja que poden ser víctimes de prejudicis o prediccions errònies. D'altra banda, les empreses i institucions que utilitzen aquests algoritmes volen entendre el que fa el seu algorisme, extreure nous coneixements i prevenir errors i millorar les seves prediccions en general. Tot això ha fet que els investigadors hagin començat a centrar-se en la interpretació dels seus algorismes (per exemple, mitjançant algorismes explicables), i les institucions reguladores han començat a regular l'ús de les dades per garantir aspectes ètics com la rendició de comptes o la justícia. Aquesta tesi reuneix tres projectes de ciència de dades en els quals s'ha implementat aprenentatge automàtic amb algorismes de caixa negra per fer prediccions: - El desenvolupament d'un sistema de detecció de NTL (Non-Technical Losses, pèrdues d'energia no tècniques) per a una empresa internacional del sector de l'energia d'Espanya (Naturgy). Aquest sistema combina un algorisme de caixa negra i un algorisme explicatiu per garantir la precisió, la transparència i la robustesa del nostre sistema. A més, centrem els nostres esforços en la capacitació dels treballadors de l'empresa (els "stakeholders") per a exercir un paper actiu en el procés de formació dels models. - Una col·laboració amb la Universitat de Padova per proporcionar l'explicabilitat a un sistema KPI basat en Deep Learning actualment implementat per l'empresa MyInvenio. - Una col·laboració de l'autor de la tesi amb la Universitat de Barcelona per implementar una solució d'AI (un algorisme de caixa negra combinat amb un algorisme explicatiu) a un problema de ciències socials. Les característiques úniques de cada projecte ens permeten oferir en aquesta tesi una anàlisi exhaustiva dels reptes i problemes que existeixen per a aconseguir un ús just, transparent, imparcial i generalitzable de les dades en un projecte de ciència de dades. Amb el feedback obtingut de la recerca realitzada per a oferir solucions satisfactòries a aquests tres projectes, el nostre objectiu és: - Entendre les raons per les quals un model de predicció pot considerar-se injust o poc fiable, fent que el model no sigui generalitzable, i les conseqüències des d'un punt de vista tècnic en termes de baixa precisió del model, però també com pot afectar-nos com a societat. - Determinar i corregir (o almenys mitigar) les situacions que causen els problemes en termes de robustesa i imparcialitat de les nostres dades. - Avaluar la diferència entre els algorismes interpretables i els algorismes de caixa negra. A més, avaluar com els algorismes explicatius poden explicar les prediccions fetes pels algorismes predictius. - Ressaltar el paper de les parts interessades ("Stakeholders") per a garantir un model robust i com convertir un enfocament únicament basat en les dades per resoldre un problema predictiu en un enfocament basat en les dades però complementat amb altres coneixements, on els patrons de dades i el coneixement humà es combinen per maximitzar els beneficis.Postprint (published version

    Game-Theoretic and Machine-Learning Techniques for Cyber-Physical Security and Resilience in Smart Grid

    Get PDF
    The smart grid is the next-generation electrical infrastructure utilizing Information and Communication Technologies (ICTs), whose architecture is evolving from a utility-centric structure to a distributed Cyber-Physical System (CPS) integrated with a large-scale of renewable energy resources. However, meeting reliability objectives in the smart grid becomes increasingly challenging owing to the high penetration of renewable resources and changing weather conditions. Moreover, the cyber-physical attack targeted at the smart grid has become a major threat because millions of electronic devices interconnected via communication networks expose unprecedented vulnerabilities, thereby increasing the potential attack surface. This dissertation is aimed at developing novel game-theoretic and machine-learning techniques for addressing the reliability and security issues residing at multiple layers of the smart grid, including power distribution system reliability forecasting, risk assessment of cyber-physical attacks targeted at the grid, and cyber attack detection in the Advanced Metering Infrastructure (AMI) and renewable resources. This dissertation first comprehensively investigates the combined effect of various weather parameters on the reliability performance of the smart grid, and proposes a multilayer perceptron (MLP)-based framework to forecast the daily number of power interruptions in the distribution system using time series of common weather data. Regarding evaluating the risk of cyber-physical attacks faced by the smart grid, a stochastic budget allocation game is proposed to analyze the strategic interactions between a malicious attacker and the grid defender. A reinforcement learning algorithm is developed to enable the two players to reach a game equilibrium, where the optimal budget allocation strategies of the two players, in terms of attacking/protecting the critical elements of the grid, can be obtained. In addition, the risk of the cyber-physical attack can be derived based on the successful attack probability to various grid elements. Furthermore, this dissertation develops a multimodal data-driven framework for the cyber attack detection in the power distribution system integrated with renewable resources. This approach introduces the spare feature learning into an ensemble classifier for improving the detection efficiency, and implements the spatiotemporal correlation analysis for differentiating the attacked renewable energy measurements from fault scenarios. Numerical results based on the IEEE 34-bus system show that the proposed framework achieves the most accurate detection of cyber attacks reported in the literature. To address the electricity theft in the AMI, a Distributed Intelligent Framework for Electricity Theft Detection (DIFETD) is proposed, which is equipped with Benford’s analysis for initial diagnostics on large smart meter data. A Stackelberg game between utility and multiple electricity thieves is then formulated to model the electricity theft actions. Finally, a Likelihood Ratio Test (LRT) is utilized to detect potentially fraudulent meters
    • …
    corecore