16 research outputs found
VAASI: Crafting valid and abnormal adversarial samples for anomaly detection systems in industrial scenarios
In the realm of industrial anomaly detection, machine and deep learning models face a critical vulnerability to adversarial attacks. In this context, existing attack methodologies primarily target continuous features, often in the context of images, making them unsuitable for the categorical or discrete features prevalent in industrial systems. To fortify the cybersecurity of industrial environments, this paper introduces a groundbreaking adversarial attack approach tailored to the unique demands of these settings. Our novel technique enables the creation of targeted adversarial samples that are valid within the framework of supervised cyberattack detection models in industrial scenarios, preserving the consistency of discrete values and correcting cases where an adversarial sample transitions into a normal one. Our approach leverages the SHAP interpretability method to identify the most salient features for each sample. Subsequently, the Projected Gradient Descent technique is employed to perturb continuous features, ensuring adversarial sample generation. To handle categorical features for a specific adversarial sample, our method scrutinizes the closest sample within the normal training dataset and replicates its categorical feature values. Additionally, Decision Trees trained within a Random Forest are utilized to ensure that the resulting adversarial samples maintain the essential abnormal behavior required for detection. The validation of our proposal was conducted using the WADI dataset obtained from a water distribution plant, providing a realistic industrial context. During validation, we assessed the mean error and the total number of adversarial samples generated by our approach, comparing it with the original Projected Gradient Descent method and the Carlini & Wagner attack across various parameter configurations. Remarkably, our proposal consistently achieved the best trade-off between mean error and the number of generated adversarial samples, showcasing its superiority in safeguarding industrial systems
An interpretable semi‐supervised system for detecting cyberattacks using anomaly detection in industrial scenarios
When detecting cyberattacks in Industrial settings, it is not sufficient to determine whether the system is suffering a cyberattack. It is also fundamental to explain why the system is under a cyberattack and which are the assets affected. In this context, the Anomaly Detection based on Machine Learning (ML) and Deep Learning (DL) techniques showed great performance when detecting cyberattacks in industrial scenarios. However, two main limitations hinder using them in a real environment. Firstly, most solutions are trained using a supervised approach, which is impractical in the real industrial world. Secondly, the use of black‐box ML and DL techniques makes it impossible to interpret the decision made by the model. This article proposes an interpretable and semi‐supervised system to detect cyberattacks in Industrial settings. Besides, our proposal was validated using data collected from the Tennessee Eastman Process. To the best of our knowledge, this system is the only one that offers interpretability together with a semi‐supervised approach in an industrial setting. Our system discriminates between causes and effects of anomalies and also achieved the best performance for 11 types of anomalies out of 20 with an overall recall of 0.9577, a precision of 0.9977, and a F1‐score of 0.9711
Proyecto Tetris: aprendizaje de la programación en ensamblador por piezas
El aprendizaje del lenguaje ensamblador constituye con frecuencia uno de los objetivos formativos de alguna de las primeras asignaturas de Arquitectura de Computadores del Grado en Ingeniería Informática. Si bien el desarrollo y la depuración de programas en lenguaje ensamblador resultan esenciales para ayudar a comprender el funcionamiento básico de un procesador, son aspectos que presentan especial dificultad y/o falta de atractivo para el alumnado. En este trabajo presentamos nuestra experiencia con la enseñanza del lenguaje ensamblador MIPS a través de la codificación del videojuego Tetris. El proyecto Tetris se desarrolla en el contexto de una asignatura de primer curso y segundo cuatrimestre. Para que resulte asequible a este nivel, se proporciona al alumno una versión incompleta del programa, que habrá de completar mediante la traducción directa a ensamblador de funciones escritas en lenguaje C (también proporcionadas), y mediante la codificación del resto de la funcionalidad directamente en ensamblador. El resultado es una versión del juego plenamente operativa. El desarrollo se realiza utilizando una versión extendida del simulador MARS. Los resultados obtenidos por los alumnos muestran que este proyecto les facilita el aprendizaje del ensamblador, pues el 85.6% de los que superan el proyecto Tetris aprueban también el examen de prácticas.Learning assembly language represents typically one of the formative objectives of some of the first courses related to computer architecture in Computer Engineering degrees. Although the development and debugging of programs written in assembly language are essential to help students understand the basic operation of a processor, they are also aspects that present special difficulty and/or lack of attractiveness for students. In this work we present our experience in the teaching of the MIPS assembly language through the coding of the videogame Tetris. The Tetris project is developed in the context of a first-year and second-term course. In order to be affordable at this level, the student is provided with an incomplete version of the program, which must be completed through the direct translation to the MIPS assembly language of functions written in the C language (also provided), and through the implementation of additional functionality directly in assembly language. The result is a fully operational version of the game. The development is done using an extended version of the MARS simulator. The results obtained by the students show that this project facilitates the learning of the assembly, since 85.6% of those who pass the Tetris project also pass the practicum exam
A Methodology for Evaluating the Robustness of Anomaly Detectors to Adversarial Attacks in Industrial Scenarios
Anomaly Detection systems based on Machine and Deep learning are the most promising solutions to detect cyberattacks in the industry. However, these techniques are vulnerable to adversarial attacks that downgrade prediction performance. Several techniques have been proposed to measure the robustness of Anomaly Detection in the literature. However, they do not consider that, although a small perturbation in an anomalous sample belonging to an attack, i.e., Denial of Service, could cause it to be misclassified as normal while retaining its ability to damage, an excessive perturbation might also transform it into a truly normal sample, with no real impact on the industrial system. This paper presents a methodology to calculate the robustness of Anomaly Detection models in industrial scenarios. The methodology comprises four steps and uses a set of additional models called support models to determine if an adversarial sample remains anomalous. We carried out the validation using the Tennessee Eastman process, a simulated testbed of a chemical process. In such a scenario, we applied the methodology to both a Long-Short Term Memory (LSTM) neural network and 1-dimensional Convolutional Neural Network (1D-CNN) focused on detecting anomalies produced by different cyberattacks. The experiments showed that 1D-CNN is significantly more robust than LSTM for our testbed. Specifically, a perturbation of 60% (empirical robustness of 0.6) of the original sample is needed to generate adversarial samples for LSTM, whereas in 1D-CNN the perturbation required increases up to 111% (empirical robustness of 1.11)
Detección de botnets y ransomware en redes de datos mediante técnicas de aprendizaje automático
Los sistemas de ciberdefensa existentes basados en Sistemas de Detección de Intrusiones (IDS en inglés) incluyen enfoques proactivos para anticipar ataques que exploran vulnerabilidades en sistemas informáticos y así poder ejecutar acciones de mitigación. Sin embargo, existen entornos en los que los IDS tienen dificultades para alcanzar su objetivo. Por ejemplo, en el entorno de las redes de comunicaciones móviles, la próxima tecnología 5G impondrá velocidades de transmisión y volúmenes de datos tan altos que examinar todos los paquetes que circulen por la red será un reto inalcanzable para los IDS actuales. A esto hay que añadir que el volumen de datos que circulan cifrados por la red es cada día mayor, lo que impide el examen de la carga útil del paquete.
Dos de los problemas de ciberserguridad más relevantes actualmente por su impacto y difusión son las botnets y el ransomware. Ambos tienen en común que generan tráfico de red siguiendo unos patrones característicos. Todos estos patrones pueden interpretarse como anomalías en el tráfico normal de la red, donde una anomalía puede definirse como un patrón que no se ajusta al comportamiento esperado o normal.
El principal objetivo de esta tesis consiste en investigar la forma de aplicar métodos de aprendizaje automático a la detección de anomalías en redes de datos con restricciones motivadas, por ejemplo, por el volumen de tráfico circulante (redes 5G), tener que trabajar con tráfico cifrado (entornos clínicos), o la necesidad de una detección y mitigación automática y en tiempo real. Esta tesis plantea que un flujo por sí solo, sin acceso a la carga útil de los paquetes, no aporta suficiente información, y se propone estudiar si un contexto para ese flujo, formado por los flujos recibidos previamente durante un periodo de tiempo, permitiría una detección más precisa. Dada la pérdida de información, los patrones a detectar serán más complejos, siendo necesario emplear métodos de aprendizaje automático tanto clásicos como profundos. Esta tesis además defiende que la evaluación del tráfico podrá hacerse a la velocidad de las exigentes redes 5G, y que el tiempo de detección/mitigación permitirá impedir la propagación de ransomware. Todo esto de forma dinámica, inteligente, en tiempo real, e integrado dentro de una arquitectura adecuada.
Para llevar a cabo estos objetivos se ha seguido la siguiente metodología:
• Estudio de los sistemas de detección de anomalías basados en aprendizaje automático aplicados a redes de datos existentes en la literatura.
• Identificación de escenarios donde la detección de anomalías suponga un reto, analizando la viabilidad del enfoque basado en flujos de red en estos contextos.
• Estudio y selección de los algoritmos de aprendizaje automático más adecuados a cada escenario.
• Planteamiento de una arquitectura basada en NFV/SDN para cada escenario que integre de forma dinámica y flexible detección y mitigación de anomalías en tiempo real.
• Utilización de un conjunto de datos público existente adecuado para evaluar la propuesta o creación de uno para ponerlo a disposición de la comunidad científica.
• Evaluación experimental de las arquitecturas propuestas en clasificación, consumo de recursos y velocidad de detección/mitigación.
A continuación se enumeran los principales resultados obtenidos en el desarrollo de esta tesis doctoral.
• Se presentó una forma novedosa de calcular un vector de características asociado a un flujo, incorporando información agregada de los flujos recibidos durante un tiempo antes para proporcionar a dicho flujo un contexto.
• Se propuso un sistema adaptativo basado en NFV/SDN para la detección de anomalías en el contexto de las redes de datos sobre 5G. Integrado en este sistema se incluye un modelo de detección basado en aprendizaje profundo en dos niveles, donde el nivel inferior se ejecuta en el borde de la red detectando síntomas de anomalías que el nivel superior utiliza para identificar una posible anomalía global.
• Se obtuvieron medidas del rendimiento en tiempo de ejecución al evaluar la implementación del modelo en el borde (una red neuronal profunda), con las bibliotecas de desarrollo para aprendizaje profundo más populares. Estos tiempos se emplearon para demostrar la adaptabilidad de la arquitectura propuesta para 5G.
• Se determinó que dicha red neuronal profunda, usando el vector de características mencionado, es capaz de detectar tanto botnets conocidas como desconocidas.
• Se presentó un segundo sistema basado en NFV/SDN, capaz de detectar, clasificar y mitigar ataques de ransomware en las habitaciones de hospital del futuro de forma automática, inteligente y en tiempo real. Este sistema se apoya en el vector de características diseñado e incorpora todo un ciclo de vida que incluye unas etapas fuera de línea para la adquisición de datos y entrenamiento junto con otras en tiempo real para la detección y mitigación.
• Se ha mostrado la efectividad de esta propuesta para la detección y mitigación de ransomware conocido y desconocido, en tiempo suficientemente corto como para evitar su propagación, mediante experimentos realizados en un entorno virtualizado. Para ello se generó un conjunto de datos a partir de tráfico capturado en dicho entorno y se ha puesto a disposición de la comunidad científicaThe existing cyberdefense systems based on Intrusion Detection Systems (IDS) include (pro-)active approaches to anticipate and mitigate attacks that exploit vulnerabilities in computing systems. However, there exist environments in which IDS have difficulties in reaching their goal. For example, in the context of mobile communications, the high transmission rates and large data volumes expected in the future 5G technology will prevent actual IDS from examining every packet in the network. Additionally, the use of encrypted traffic is increasingly frequent, preventing payload examination.
Two of the most relevant cybersecurity threats are botnets and ransomware. Both of them generate rather characteristic network traffic patterns which can be interpreted as anomalies in the normal network traffic. In general, an anomaly can be defined as a pattern that does not follow an expected behavior considered as normal.
The main objective of this doctoral thesis is to research how to use machine learning techniques for anomaly detection in data networks with constraints. These constraints can be motivated, for example, by an enormous traffic volume (5G networks), encrypted traffic (clinical environments), or the requirement of automatic and real-time detection and mitigation, among others. This doctoral thesis argues that one only netflow, without accessing to the packet payload, does not provide sufficient information; therefore, it proposes adding a context to the netflow to allow a more accurate detection. This context will be obtained from the netflows received in a given period of time preceding the netflow in question. By using netflows, the detection must be done with less information; thus, the patterns to be detected will be more complex and it is necessary to utilize machine learning algorithms to identify them. Moreover, this work argues that this netflow evaluation can be done at the rate of the demanding 5G networks, and that the detection/mitigation time can prevent ransomware spread. All this is integrated into a suitable architecture, and it is done in real time, and in a dynamic and intelligent way.
In order to achieve these goal, the following methodology has been applied:
• Critical analysis of the machine learning-based anomaly detection systems applied to data networks in literature.
• Identification of scenarios where anomaly detection is challenging by means of analyzing the feasibility of a netflow-based solution in these contexts.
• Thorough study of a selected set of suitable machine learning algorithms for each scenario.
• Design of an architecture based on NFV/SDN for each scenario, integrating anomaly detection and mitigation in a dynamic and flexible way, as well as in real time.
• Use of an existing public data set appropriate to evaluate the proposal or creation of one to make it available to the scientific community.
• Experimental evaluation of the proposed architectures in classification, resource consumption and detection/mitigation time.
The main results obtained in the development of this doctoral thesis are listed below.
• A novel way of calculating a feature vector associated to a netflow was presented. This feature vector incorporates aggregated information of the preceding netflows received in a time interval to provide a context to this netflow.
• An adaptive system based on NFV/SDN was proposed for the detection of anomalies in the context of 5G data networks. Integrated into this system is a detection model based on deep learning at two levels. The lower level runs at the edge of the network, detecting symptoms of anomalies that the upper level uses to identify a potential global anomaly.
• Runtime performance measures were obtained by evaluating the implementation of the model at the edge (a deep neural network), with the most popular deep learning development libraries. These measured times were used to demonstrate the adaptability of the proposed 5G architecture.
• It was determined that this deep neural network, using the feature vector mentioned above, is capable of detecting both known and unknown botnets.
• A second system based on NFV/SDN was introduced, capable of detecting, classifying and mitigating ransomware attacks in the hospital rooms of the future automatically, intelligently and in real time. This system builds on the designed feature vector and incorporates an entire life cycle that includes offline data acquisition and training, along with real-time detection and mitigation.
• The effectiveness of this proposal has been shown for detection and mitigation of known and unknown ransomware, through extensive experiments carried out in a virtualized environment. To this end, a new dataset was generated from traffic captured in that environment and has been made available to the scientific community. Our experiments demonstrated that the proposed method is able to avoid the ransomware spread
31 Review of MADICS: A Methodology for Anomaly Detection in Industrial Control Systems
Diverse cyberattack detection systems have been proposed over the years in the context of Industrial Control Systems (ICS). However, the lack of standard methodologies to
detect cyberattacks in industrial scenarios prevents researchers from accurately comparing proposals and results. In this work,
we present MADICS, a methodology to detect cyberattacks in industrial scenarios that intends to be a guideline for future works in the field. In order to validate MADICS, we used the popular SWaT dataset, which was collected from a fully operational water treatment plant. The experiments showed that following MADICS, we achieved state-of-the-art precision of 0.984, as well as a recall of 0.750 and F1-score of 0.851, above the
average of other works, proofing that the proposed methodology is suitable to be used in real industrial scenarios
SUSAN: A Deep Learning based anomaly detection framework for sustainable industry
Nowadays, sustainability is the core of green technologies, being a critical aspect in many industries concerned with reducing carbon emissions and energy consumption optimization. While this concern increases, the number of cyberattacks causing sustainability issues in industries also grows. These cyberattacks impact industrial systems that control and monitor the right functioning of processes and systems. Furthermore, they are very specialized, requiring knowledge about the target industrial processes, and being undetectable for traditional cybersecurity solutions. To overcome this challenge, we present SUSAN, a Deep Learning-based framework, to build anomaly detectors that expose cyberattacks affecting the sustainability of industrial systems. SUSAN follows a modular and flexible design that allows the ensembling of several detectors to achieve more precise detections. To demonstrate the feasibility of SUSAN, we implemented the framework in a water treatment plant using the SWaT testbed. The experiments performed achieved the best recall rate (0.910) and acceptable precision (0.633), resulting in an F1-score of 0.747. Regarding individual cyberattacks that impact the system’s sustainability, our implementation detected all of them, and, concerning the related work, it achieved the most balanced results, with 0.64 as the worst recall rate. Finally, a false-positive rate of 0.000388 makes our solution feasible in real scenarios
Intelligent and Dynamic Ransomware Spread Detection and Mitigation in Integrated Clinical Environments
Medical Cyber-Physical Systems (MCPS) hold the promise of reducing human errors and optimizing healthcare by delivering new ways to monitor, diagnose and treat patients through integrated clinical environments (ICE). Despite the benefits provided by MCPS, many of the ICE medical devices have not been designed to satisfy cybersecurity requirements and, consequently, are vulnerable to recent attacks. Nowadays, ransomware attacks account for 85% of all malware in healthcare, and more than 70% of attacks confirmed data disclosure. With the goal of improving this situation, the main contribution of this paper is an automatic, intelligent and real-time system to detect, classify, and mitigate ransomware in ICE. The proposed solution is fully integrated with the ICE++ architecture, our previous work, and makes use of Machine Learning (ML) techniques to detect and classify the spreading phase of ransomware attacks affecting ICE. Additionally, Network Function Virtualization (NFV) and Software Defined Networking (SDN)paradigms are considered to mitigate the ransomware spreading by isolating and replacing infected devices. Different experiments returned a precision/recall of 92.32%/99.97% in anomaly detection, an accuracy of 99.99% in ransomware classification, and promising detection and mitigation times. Finally, different labelled ransomware datasets in ICE have been created and made publicly available