4 research outputs found

    Detection of encrypted cryptomining malware connections with machine and deep learning

    Get PDF
    Nowadays, malware has become an epidemic problem. Among the attacks exploiting the computer resources of victims, one that has become usual is related to the massive amounts of computational resources needed for digital currency cryptomining. Cybercriminals steal computer resources from victims, associating these resources to the crypto-currency mining pools they benefit from. This research work focuses on offering a solution for detecting such abusive cryptomining activity, just by means of passive network monitoring. To this end, we identify a new set of highly relevant network flow features to be used jointly with a rich set of machine and deep-learning models for real-time cryptomining flow detection. We deployed a complex and realistic cryptomining scenario for training and testing machine and deep learning models, in which clients interact with real servers across the Internet and use encrypted connections. A complete set of experiments were carried out to demonstrate that, using a combination of these highly informative features with complex machine learning models, cryptomining attacks can be detected on the wire with telco-grade precision and accuracy, even if the traffic is encrypted

    The European Industrial Data Space (EIDS)

    Get PDF
    This research work has been performed in the framework of the Boost 4.0 Big Data lighthouse initiative, a project that has received funding from the European Union’s Horizon 2020 research and innovation program under grant agreement no. 780732. This datadriven digital transformation research is also endorsed by the Digital Factory Alliance (DFA)The path that the European Commission foresees to leverage data in the best possible way for the sake of European citizens and the digital single market clearly addresses the need for a European Data Space. This data space must follow the rules, derived from European values. The European Data Strategy rests on four pillars: (1) Governance framework for access and use; (2) Investments in Europe’s data capabilities and infrastructures; (3) Competences and skills of individuals and SMEs; (4) Common European Data Spaces in nine strategic areas such as industrial manufacturing, mobility, health, and energy. The project BOOST 4.0 developed a prototype for the industrial manufacturing sector, called European Industrial Data Space (EIDS), an endeavour of 53 companies. The publication will show the developed architectural pattern as well as the developed components and introduce the required infrastructure that was developed for the EIDS. Additionally, the population of such a data space with Big Data enabled services and platforms is described and will be enriched with the perspective of the pilots that have been build based on EIDS.publishersversionpublishe

    Contribuciones a la Aplicación de Machine Learning en Escenarios Novedosos de Tiempo Real

    Full text link
    En el ámbito del aprendizaje supervisado, la selección de características es crucial para identifcar las variables más relevantes que infuyen en el rendimiento de un modelo, especialmente en entornos donde se requiere realizar inferencias en tiempo real o el consumo energético del modelo está limitado. Los métodos automáticos de selección de características son útiles para lidiar con la complejidad y el alto número de características en los conjuntos de datos, pero la selección manual de características por expertos del dominio puede generar un conjunto de características que mejoren el rendimiento y la interpretabilidad del modelo. Esta tesis se enfoca en la aplicación de modelos de Machine Learning, tanto tradicionales como avanzados, para la resolución de problemas en escenarios previamente no abordados con el denominador común de que todos ellos necesitan de un despliegue en tiempo real. El objetivo principal de esta tesis es la investigación de las interrelaciones existentes entre las características seleccionadas de los datos de entrada (features) y los modelos aplicados. En concreto, esta tesis plantea la exploración de diversas formas de seleccionar, transformar o reemplazar características en datasets para mejorar el rendimiento de los modelos de Machine Learning o Deep Learning cuando van a ser desplegados en entornos de tiempo real donde podrían existir restricciones de consumo energético y /o necesidades de inferencia rápida. Con el fin de desarrollar estos objetivos, la tesis selecciona tres casos de uso provenientes de distintos dominios, pero que comparten la particularidad de requerir despliegues en tiempo real: Industria 4.0, Telecomunicaciones/Ciberseguridad y Medioambiente. En el caso de uso Industria 4.0 la tesis explora como los modelos de Machine Learning basados en técnicas avanzadas de Deep Learning pueden predecir en tiempo real y con sufciente antelación el comportamiento de un AGV (Automated Guided Vehicle) controlado por un PLC (Programmable Logic Controller) virtualizado de forma remota, utilizando la información de guiado del AGV junto con información estadística sobre la conexión de red entre el AGV y el PLC en una red 5G sometida a perturbaciones y errores. En el caso de uso de Telecomunicaciones/Ciberseguridad se investiga como los modelos de Machine Learning pueden identifcar en tiempo real conexiones de criptominado utilizando únicamente información estadística sobre las conexiones de red que en algunos casos pueden estar cifradas. Finalmente, el caso de uso de Medioambiente, la tesis investiga si un soft-sensor (sensor software) basado en modelos de Machine Learning podría sustituir a un sensor real de coste elevado para medir fuorescencia de Chl-a (Clorofla) en una masa de agua. El soft-sensor infere el valor de Chl-a a partir de un conjunto de variables (temperatura, pH, conductividad y nivel de batería) obtenidas de sensores reales de bajo coste. En el contexto de estos tres casos de uso, la tesis explora los efectos de la selección de características en el rendimiento de modelos Machine Learning tradicionales y modelos Deep Learning. Dentro de estos últimos, la tesis propone dos modelos derivados de la arquitectura Transformers, que ha revolucionado recientemente el área de Procesamiento del Lenguaje Natural. Los experimentos realizados en esta tesis demuestran que el aprendizaje automático es una herramienta efectiva para abordar problemas relacionados con la predicción, detección o inferencia de eventos complejos en escenarios de tiempo real dentro de las tres áreas de aplicación seleccionadas. Aunque estas áreas de aplicación presentan múltiples diferencias entre si, la tesis presenta un conjunto de conclusiones comunes acerca de la aplicación práctica en ellas de modelos de aprendizaje automático y de las implicaciones de la selección de características en la complejidad y rendimiento de los modelos. ABSTRACT In the supervised learning domain, feature selection is crucial to identify the most relevant variables that infuence the performance of a model, especially in environments where real-time inference is required or the model’s energy consumption is limited. Automatic feature selection methods are useful to deal with the complexity and high number of features in datasets, but manual feature selection by domain experts can generate a set of features that improve the performance and interpretability of the model. This thesis focuses on the application of Machine Learning models, both traditional and advanced, to solve problems in scenarios that have not been previously addressed, with the common denominator that all of them require real-time deployment. The main objective of this thesis is to investigate the interrelationships between the selected input data features and the applied models. Specifcally, this thesis proposes the exploration of various ways of selecting, transforming, or replacing features in datasets to improve the performance of Machine Learning or Deep Learning models when they are deployed in real-time environments where energy consumption constraints and/or the need for fast inference may exist. To achieve these objectives, the thesis selects three use cases from diferent domains, but they all share the particularity of requiring real-time deployments: Industry 4.0, Telecommunications/ Cybersecurity, and Environment. In the Industry 4.0 use case, the thesis explores how Machine Learning models based on advanced Deep Learning techniques can predict in real-time and with sufcient lead time the behaviour of an Automated Guided Vehicle (AGV) controlled by a remotely virtualized Programmable Logic Controller (PLC), using the AGV’s guidance information along with statistical information about the network connection between the AGV and the PLC in a 5G network subjected to disturbances and errors. In the Telecommunications/ Cybersecurity use case, the thesis investigates how Machine Learning models can identify real-time cryptomining connections using only statistical information about network connections that may be encrypted in some cases. Finally, in the Environment use case, the thesis investigates whether a soft-sensor based on Machine Learning models could replace an expensive real sensor to measure Chl-a (Chlorophyll) fuorescence in a body of water. The soft-sensor infers the value of Chl-a from a set of variables (temperature, pH, conductivity, and battery level) obtained from low-cost real sensors. In the context of these three use cases, the thesis explores the efects of feature selection on the performance of traditional Machine Learning models and Deep Learning models. Within the latter, the thesis proposes two models derived from the Transformers architecture, which has recently revolutionized the Natural Language Processing area. The experiments conducted in this thesis demonstrate that machine learning is an efective tool to address problems related to the prediction, detection, or inference of complex events in real-time scenarios within the three selected application areas. Although these application areas present multiple diferences among them, the thesis presents a set of common conclusions about the practical application of machine learning models and the implications of feature selection on the complexity and performance of the models

    Transformers for Multi-Horizon Forecasting in an Industry 4.0 Use Case

    No full text
    Recently, a novel approach in the field of Industry 4.0 factory operations was proposed for a new generation of automated guided vehicles (AGVs) that are connected to a virtualized programmable logic controller (PLC) via a 5G multi-access edge-computing (MEC) platform to enable remote control. However, this approach faces a critical challenge as the 5G network may encounter communication disruptions that can lead to AGV deviations and, with this, potential safety risks and workplace issues. To mitigate this problem, several works have proposed the use of fixed-horizon forecasting techniques based on deep-learning models that can anticipate AGV trajectory deviations and take corrective maneuvers accordingly. However, these methods have limited prediction flexibility for the AGV operator and are not robust against network instability. To address this limitation, this study proposes a novel approach based on multi-horizon forecasting techniques to predict the deviation of remotely controlled AGVs. As its primary contribution, the work presents two new versions of the state-of-the-art transformer architecture that are well-suited to the multi-horizon prediction problem. We conduct a comprehensive comparison between the proposed models and traditional deep-learning models, such as the long short-term memory (LSTM) neural network, to evaluate the performance and capabilities of the proposed models in relation to traditional deep-learning architectures. The results indicate that (i) the transformer-based models outperform LSTM in both multi-horizon and fixed-horizon scenarios, (ii) the prediction accuracy at a specific time-step of the best multi-horizon forecasting model is very close to that obtained by the best fixed-horizon forecasting model at the same step, (iii) models that use a time-sequence structure in their inputs tend to perform better in multi-horizon scenarios compared to their fixed horizon counterparts and other multi-horizon models that do not consider a time topology in their inputs, and (iv) our experiments showed that the proposed models can perform inference within the required time constraints for real-time decision making
    corecore