312 research outputs found

    MalBoT-DRL: Malware botnet detection using deep reinforcement learning in IoT networks

    Get PDF
    In the dynamic landscape of cyber threats, multi-stage malware botnets have surfaced as significant threats of concern. These sophisticated threats can exploit Internet of Things (IoT) devices to undertake an array of cyberattacks, ranging from basic infections to complex operations such as phishing, cryptojacking, and distributed denial of service (DDoS) attacks. Existing machine learning solutions are often constrained by their limited generalizability across various datasets and their inability to adapt to the mutable patterns of malware attacks in real world environments, a challenge known as model drift. This limitation highlights the pressing need for adaptive Intrusion Detection Systems (IDS), capable of adjusting to evolving threat patterns and new or unseen attacks. This paper introduces MalBoT-DRL, a robust malware botnet detector using deep reinforcement learning. Designed to detect botnets throughout their entire lifecycle, MalBoT-DRL has better generalizability and offers a resilient solution to model drift. This model integrates damped incremental statistics with an attention rewards mechanism, a combination that has not been extensively explored in literature. This integration enables MalBoT-DRL to dynamically adapt to the ever-changing malware patterns within IoT environments. The performance of MalBoT-DRL has been validated via trace-driven experiments using two representative datasets, MedBIoT and N-BaIoT, resulting in exceptional average detection rates of 99.80% and 99.40% in the early and late detection phases, respectively. To the best of our knowledge, this work introduces one of the first studies to investigate the efficacy of reinforcement learning in enhancing the generalizability of IDS

    Acoustic Lens Design Using Machine Learning

    Get PDF
    This thesis aims to contribute to the development of a novel approach and efficient method for the inverse design of acoustic metamaterial lenses using machine learning, specifically, deep learning, generative modeling, and reinforcement learning. Acoustic lenses can focus incident plane waves at the focal point, enabling them to detect structures non-intrusively. These lenses can be utilized in biomedical engineering, medical devices, structural engineering, ultrasound imaging, health monitoring, etc. Finding the global optimum through a traditional iterative optimization process for designing the acoustic lens is challenging. It may become infeasible due to high dimensional parameter space and the compute resources needed. Machine learning techniques have been shown promising for finding the global optimum. Generative modeling is a powerful technique enabling recent advancements in drug discoveries, organic molecule development, and photonics. We combined generative modeling with global optimization and an analytical form of gradients computed by means of multiple scattering theory. In addition, reinforcement learning can potentially outperform traditional optimization algorithms. Thus, in this thesis, the acoustic lens is modeled using two machine learning techniques, such as generative modeling, using 2D-Global Topology Optimization Networks (2D-GLOnets), and reinforcement learning using the Deep Deterministic Policy Gradient (DDPG) algorithm. Results from the aforementioned methods are compared with traditional optimization algorithms

    Effective control of two-dimensional Rayleigh--B\'enard convection: invariant multi-agent reinforcement learning is all you need

    Full text link
    Rayleigh-B\'enard convection (RBC) is a recurrent phenomenon in several industrial and geoscience flows and a well-studied system from a fundamental fluid-mechanics viewpoint. However, controlling RBC, for example by modulating the spatial distribution of the bottom-plate heating in the canonical RBC configuration, remains a challenging topic for classical control-theory methods. In the present work, we apply deep reinforcement learning (DRL) for controlling RBC. We show that effective RBC control can be obtained by leveraging invariant multi-agent reinforcement learning (MARL), which takes advantage of the locality and translational invariance inherent to RBC flows inside wide channels. The MARL framework applied to RBC allows for an increase in the number of control segments without encountering the curse of dimensionality that would result from a naive increase in the DRL action-size dimension. This is made possible by the MARL ability for re-using the knowledge generated in different parts of the RBC domain. We show in a case study that MARL DRL is able to discover an advanced control strategy that destabilizes the spontaneous RBC double-cell pattern, changes the topology of RBC by coalescing adjacent convection cells, and actively controls the resulting coalesced cell to bring it to a new stable configuration. This modified flow configuration results in reduced convective heat transfer, which is beneficial in several industrial processes. Therefore, our work both shows the potential of MARL DRL for controlling large RBC systems, as well as demonstrates the possibility for DRL to discover strategies that move the RBC configuration between different topological configurations, yielding desirable heat-transfer characteristics. These results are useful for both gaining further understanding of the intrinsic properties of RBC, as well as for developing industrial applications.Comment: 34 pages, 11 figures submitted to Physics of Fluid

    Machine Learning in IoT Security:Current Solutions and Future Challenges

    Get PDF
    The future Internet of Things (IoT) will have a deep economical, commercial and social impact on our lives. The participating nodes in IoT networks are usually resource-constrained, which makes them luring targets for cyber attacks. In this regard, extensive efforts have been made to address the security and privacy issues in IoT networks primarily through traditional cryptographic approaches. However, the unique characteristics of IoT nodes render the existing solutions insufficient to encompass the entire security spectrum of the IoT networks. This is, at least in part, because of the resource constraints, heterogeneity, massive real-time data generated by the IoT devices, and the extensively dynamic behavior of the networks. Therefore, Machine Learning (ML) and Deep Learning (DL) techniques, which are able to provide embedded intelligence in the IoT devices and networks, are leveraged to cope with different security problems. In this paper, we systematically review the security requirements, attack vectors, and the current security solutions for the IoT networks. We then shed light on the gaps in these security solutions that call for ML and DL approaches. We also discuss in detail the existing ML and DL solutions for addressing different security problems in IoT networks. At last, based on the detailed investigation of the existing solutions in the literature, we discuss the future research directions for ML- and DL-based IoT security

    How to Control Hydrodynamic Force on Fluidic Pinball via Deep Reinforcement Learning

    Full text link
    Deep reinforcement learning (DRL) for fluidic pinball, three individually rotating cylinders in the uniform flow arranged in an equilaterally triangular configuration, can learn the efficient flow control strategies due to the validity of self-learning and data-driven state estimation for complex fluid dynamic problems. In this work, we present a DRL-based real-time feedback strategy to control the hydrodynamic force on fluidic pinball, i.e., force extremum and tracking, from cylinders' rotation. By adequately designing reward functions and encoding historical observations, and after automatic learning of thousands of iterations, the DRL-based control was shown to make reasonable and valid control decisions in nonparametric control parameter space, which is comparable to and even better than the optimal policy found through lengthy brute-force searching. Subsequently, one of these results was analyzed by a machine learning model that enabled us to shed light on the basis of decision-making and physical mechanisms of the force tracking process. The finding from this work can control hydrodynamic force on the operation of fluidic pinball system and potentially pave the way for exploring efficient active flow control strategies in other complex fluid dynamic problems

    Aprendizagem de coordenação em sistemas multi-agente

    Get PDF
    The ability for an agent to coordinate with others within a system is a valuable property in multi-agent systems. Agents either cooperate as a team to accomplish a common goal, or adapt to opponents to complete different goals without being exploited. Research has shown that learning multi-agent coordination is significantly more complex than learning policies in singleagent environments, and requires a variety of techniques to deal with the properties of a system where agents learn concurrently. This thesis aims to determine how can machine learning be used to achieve coordination within a multi-agent system. It asks what techniques can be used to tackle the increased complexity of such systems and their credit assignment challenges, how to achieve coordination, and how to use communication to improve the behavior of a team. Many algorithms for competitive environments are tabular-based, preventing their use with high-dimension or continuous state-spaces, and may be biased against specific equilibrium strategies. This thesis proposes multiple deep learning extensions for competitive environments, allowing algorithms to reach equilibrium strategies in complex and partially-observable environments, relying only on local information. A tabular algorithm is also extended with a new update rule that eliminates its bias against deterministic strategies. Current state-of-the-art approaches for cooperative environments rely on deep learning to handle the environment’s complexity and benefit from a centralized learning phase. Solutions that incorporate communication between agents often prevent agents from being executed in a distributed manner. This thesis proposes a multi-agent algorithm where agents learn communication protocols to compensate for local partial-observability, and remain independently executed. A centralized learning phase can incorporate additional environment information to increase the robustness and speed with which a team converges to successful policies. The algorithm outperforms current state-of-the-art approaches in a wide variety of multi-agent environments. A permutation invariant network architecture is also proposed to increase the scalability of the algorithm to large team sizes. Further research is needed to identify how can the techniques proposed in this thesis, for cooperative and competitive environments, be used in unison for mixed environments, and whether they are adequate for general artificial intelligence.A capacidade de um agente se coordenar com outros num sistema é uma propriedade valiosa em sistemas multi-agente. Agentes cooperam como uma equipa para cumprir um objetivo comum, ou adaptam-se aos oponentes de forma a completar objetivos egoístas sem serem explorados. Investigação demonstra que aprender coordenação multi-agente é significativamente mais complexo que aprender estratégias em ambientes com um único agente, e requer uma variedade de técnicas para lidar com um ambiente onde agentes aprendem simultaneamente. Esta tese procura determinar como aprendizagem automática pode ser usada para encontrar coordenação em sistemas multi-agente. O documento questiona que técnicas podem ser usadas para enfrentar a superior complexidade destes sistemas e o seu desafio de atribuição de crédito, como aprender coordenação, e como usar comunicação para melhorar o comportamento duma equipa. Múltiplos algoritmos para ambientes competitivos são tabulares, o que impede o seu uso com espaços de estado de alta-dimensão ou contínuos, e podem ter tendências contra estratégias de equilíbrio específicas. Esta tese propõe múltiplas extensões de aprendizagem profunda para ambientes competitivos, permitindo a algoritmos atingir estratégias de equilíbrio em ambientes complexos e parcialmente-observáveis, com base em apenas informação local. Um algoritmo tabular é também extendido com um novo critério de atualização que elimina a sua tendência contra estratégias determinísticas. Atuais soluções de estado-da-arte para ambientes cooperativos têm base em aprendizagem profunda para lidar com a complexidade do ambiente, e beneficiam duma fase de aprendizagem centralizada. Soluções que incorporam comunicação entre agentes frequentemente impedem os próprios de ser executados de forma distribuída. Esta tese propõe um algoritmo multi-agente onde os agentes aprendem protocolos de comunicação para compensarem por observabilidade parcial local, e continuam a ser executados de forma distribuída. Uma fase de aprendizagem centralizada pode incorporar informação adicional sobre ambiente para aumentar a robustez e velocidade com que uma equipa converge para estratégias bem-sucedidas. O algoritmo ultrapassa abordagens estado-da-arte atuais numa grande variedade de ambientes multi-agente. Uma arquitetura de rede invariante a permutações é também proposta para aumentar a escalabilidade do algoritmo para grandes equipas. Mais pesquisa é necessária para identificar como as técnicas propostas nesta tese, para ambientes cooperativos e competitivos, podem ser usadas em conjunto para ambientes mistos, e averiguar se são adequadas a inteligência artificial geral.Apoio financeiro da FCT e do FSE no âmbito do III Quadro Comunitário de ApoioPrograma Doutoral em Informátic

    Cooperative planning for an unmanned combat aerial vehicle fleet using reinforcement learning

    Get PDF
    In this study, reinforcement learning (RL)-based centralized path planning is performed for an unmanned combat aerial vehicle (UCAV) fleet in a human-made hostile environment. The proposed method provides a novel approach in which closing speed and approximate time-to-go terms are used in the reward function to obtain cooperative motion while ensuring no-fly-zones (NFZs) and time-of-arrival constraints. Proximal policy optimization (PPO) algorithm is used in the training phase of the RL agent. System performance is evaluated in two different cases. In case 1, the warfare environment contains only the target area, and simultaneous arrival is desired to obtain the saturated attack effect. In case 2, the warfare environment contains NFZs in addition to the target area and the standard saturated attack and collision avoidance requirements. Particle swarm optimization (PSO)-based cooperative path planning algorithm is implemented as the baseline method, and it is compared with the proposed algorithm in terms of execution time and developed performance metrics. Monte Carlo simulation studies are performed to evaluate the system performance. According to the simulation results, the proposed system is able to generate feasible flight paths in real-time while considering the physical and operational constraints such as acceleration limits, NFZ restrictions, simultaneous arrival, and collision avoidance requirements. In that respect, the approach provides a novel and computationally efficient method for solving the large-scale cooperative path planning for UCAV fleets

    Patient-specific simulation for autonomous surgery

    Get PDF
    An Autonomous Robotic Surgical System (ARSS) has to interact with the complex anatomical environment, which is deforming and whose properties are often uncertain. Within this context, an ARSS can benefit from the availability of patient-specific simulation of the anatomy. For example, simulation can provide a safe and controlled environment for the design, test and validation of the autonomous capabilities. Moreover, it can be used to generate large amounts of patient-specific data that can be exploited to learn models and/or tasks. The aim of this Thesis is to investigate the different ways in which simulation can support an ARSS and to propose solutions to favor its employability in robotic surgery. We first address all the phases needed to create such a simulation, from model choice in the pre-operative phase based on the available knowledge to its intra-operative update to compensate for inaccurate parametrization. We propose to rely on deep neural networks trained with synthetic data both to generate a patient-specific model and to design a strategy to update model parametrization starting directly from intra-operative sensor data. Afterwards, we test how simulation can assist the ARSS, both for task learning and during task execution. We show that simulation can be used to efficiently train approaches that require multiple interactions with the environment, compensating for the riskiness to acquire data from real surgical robotic systems. Finally, we propose a modular framework for autonomous surgery that includes deliberative functions to handle real anatomical environments with uncertain parameters. The integration of a personalized simulation proves fundamental both for optimal task planning and to enhance and monitor real execution. The contributions presented in this Thesis have the potential to introduce significant step changes in the development and actual performance of autonomous robotic surgical systems, making them closer to applicability to real clinical conditions
    corecore