312 research outputs found
MalBoT-DRL: Malware botnet detection using deep reinforcement learning in IoT networks
In the dynamic landscape of cyber threats, multi-stage malware botnets have surfaced as significant threats of concern. These sophisticated threats can exploit Internet of Things (IoT) devices to undertake an array of cyberattacks, ranging from basic infections to complex operations such as phishing, cryptojacking, and distributed denial of service (DDoS) attacks. Existing machine learning solutions are often constrained by their limited generalizability across various datasets and their inability to adapt to the mutable patterns of malware attacks in real world environments, a challenge known as model drift. This limitation highlights the pressing need for adaptive Intrusion Detection Systems (IDS), capable of adjusting to evolving threat patterns and new or unseen attacks. This paper introduces MalBoT-DRL, a robust malware botnet detector using deep reinforcement learning. Designed to detect botnets throughout their entire lifecycle, MalBoT-DRL has better generalizability and offers a resilient solution to model drift. This model integrates damped incremental statistics with an attention rewards mechanism, a combination that has not been extensively explored in literature. This integration enables MalBoT-DRL to dynamically adapt to the ever-changing malware patterns within IoT environments. The performance of MalBoT-DRL has been validated via trace-driven experiments using two representative datasets, MedBIoT and N-BaIoT, resulting in exceptional average detection rates of 99.80% and 99.40% in the early and late detection phases, respectively. To the best of our knowledge, this work introduces one of the first studies to investigate the efficacy of reinforcement learning in enhancing the generalizability of IDS
Acoustic Lens Design Using Machine Learning
This thesis aims to contribute to the development of a novel approach and efficient method for the inverse design of acoustic metamaterial lenses using machine learning, specifically, deep learning, generative modeling, and reinforcement learning. Acoustic lenses can focus incident plane waves at the focal point, enabling them to detect structures non-intrusively. These lenses can be utilized in biomedical engineering, medical devices, structural engineering, ultrasound imaging, health monitoring, etc. Finding the global optimum through a traditional iterative optimization process for designing the acoustic lens is challenging. It may become infeasible due to high dimensional parameter space and the compute resources needed. Machine learning techniques have been shown promising for finding the global optimum. Generative modeling is a powerful technique enabling recent advancements in drug discoveries, organic molecule development, and photonics. We combined generative modeling with global optimization and an analytical form of gradients computed by means of multiple scattering theory. In addition, reinforcement learning can potentially outperform traditional optimization algorithms. Thus, in this thesis, the acoustic lens is modeled using two machine learning techniques, such as generative modeling, using 2D-Global Topology Optimization Networks (2D-GLOnets), and reinforcement learning using the Deep Deterministic Policy Gradient (DDPG) algorithm. Results from the aforementioned methods are compared with traditional optimization algorithms
Effective control of two-dimensional Rayleigh--B\'enard convection: invariant multi-agent reinforcement learning is all you need
Rayleigh-B\'enard convection (RBC) is a recurrent phenomenon in several
industrial and geoscience flows and a well-studied system from a fundamental
fluid-mechanics viewpoint. However, controlling RBC, for example by modulating
the spatial distribution of the bottom-plate heating in the canonical RBC
configuration, remains a challenging topic for classical control-theory
methods. In the present work, we apply deep reinforcement learning (DRL) for
controlling RBC. We show that effective RBC control can be obtained by
leveraging invariant multi-agent reinforcement learning (MARL), which takes
advantage of the locality and translational invariance inherent to RBC flows
inside wide channels. The MARL framework applied to RBC allows for an increase
in the number of control segments without encountering the curse of
dimensionality that would result from a naive increase in the DRL action-size
dimension. This is made possible by the MARL ability for re-using the knowledge
generated in different parts of the RBC domain. We show in a case study that
MARL DRL is able to discover an advanced control strategy that destabilizes the
spontaneous RBC double-cell pattern, changes the topology of RBC by coalescing
adjacent convection cells, and actively controls the resulting coalesced cell
to bring it to a new stable configuration. This modified flow configuration
results in reduced convective heat transfer, which is beneficial in several
industrial processes. Therefore, our work both shows the potential of MARL DRL
for controlling large RBC systems, as well as demonstrates the possibility for
DRL to discover strategies that move the RBC configuration between different
topological configurations, yielding desirable heat-transfer characteristics.
These results are useful for both gaining further understanding of the
intrinsic properties of RBC, as well as for developing industrial applications.Comment: 34 pages, 11 figures submitted to Physics of Fluid
Machine Learning in IoT Security:Current Solutions and Future Challenges
The future Internet of Things (IoT) will have a deep economical, commercial
and social impact on our lives. The participating nodes in IoT networks are
usually resource-constrained, which makes them luring targets for cyber
attacks. In this regard, extensive efforts have been made to address the
security and privacy issues in IoT networks primarily through traditional
cryptographic approaches. However, the unique characteristics of IoT nodes
render the existing solutions insufficient to encompass the entire security
spectrum of the IoT networks. This is, at least in part, because of the
resource constraints, heterogeneity, massive real-time data generated by the
IoT devices, and the extensively dynamic behavior of the networks. Therefore,
Machine Learning (ML) and Deep Learning (DL) techniques, which are able to
provide embedded intelligence in the IoT devices and networks, are leveraged to
cope with different security problems. In this paper, we systematically review
the security requirements, attack vectors, and the current security solutions
for the IoT networks. We then shed light on the gaps in these security
solutions that call for ML and DL approaches. We also discuss in detail the
existing ML and DL solutions for addressing different security problems in IoT
networks. At last, based on the detailed investigation of the existing
solutions in the literature, we discuss the future research directions for ML-
and DL-based IoT security
How to Control Hydrodynamic Force on Fluidic Pinball via Deep Reinforcement Learning
Deep reinforcement learning (DRL) for fluidic pinball, three individually
rotating cylinders in the uniform flow arranged in an equilaterally triangular
configuration, can learn the efficient flow control strategies due to the
validity of self-learning and data-driven state estimation for complex fluid
dynamic problems. In this work, we present a DRL-based real-time feedback
strategy to control the hydrodynamic force on fluidic pinball, i.e., force
extremum and tracking, from cylinders' rotation. By adequately designing reward
functions and encoding historical observations, and after automatic learning of
thousands of iterations, the DRL-based control was shown to make reasonable and
valid control decisions in nonparametric control parameter space, which is
comparable to and even better than the optimal policy found through lengthy
brute-force searching. Subsequently, one of these results was analyzed by a
machine learning model that enabled us to shed light on the basis of
decision-making and physical mechanisms of the force tracking process. The
finding from this work can control hydrodynamic force on the operation of
fluidic pinball system and potentially pave the way for exploring efficient
active flow control strategies in other complex fluid dynamic problems
Aprendizagem de coordenação em sistemas multi-agente
The ability for an agent to coordinate with others within a system is a
valuable property in multi-agent systems. Agents either cooperate as a team
to accomplish a common goal, or adapt to opponents to complete different
goals without being exploited. Research has shown that learning multi-agent
coordination is significantly more complex than learning policies in singleagent
environments, and requires a variety of techniques to deal with the
properties of a system where agents learn concurrently. This thesis aims to
determine how can machine learning be used to achieve coordination within
a multi-agent system. It asks what techniques can be used to tackle the
increased complexity of such systems and their credit assignment challenges,
how to achieve coordination, and how to use communication to improve the
behavior of a team.
Many algorithms for competitive environments are tabular-based, preventing
their use with high-dimension or continuous state-spaces, and may be
biased against specific equilibrium strategies. This thesis proposes multiple
deep learning extensions for competitive environments, allowing algorithms
to reach equilibrium strategies in complex and partially-observable environments,
relying only on local information. A tabular algorithm is also extended
with a new update rule that eliminates its bias against deterministic strategies.
Current state-of-the-art approaches for cooperative environments rely
on deep learning to handle the environment’s complexity and benefit from a
centralized learning phase. Solutions that incorporate communication between
agents often prevent agents from being executed in a distributed
manner. This thesis proposes a multi-agent algorithm where agents learn
communication protocols to compensate for local partial-observability, and
remain independently executed. A centralized learning phase can incorporate
additional environment information to increase the robustness and speed with
which a team converges to successful policies. The algorithm outperforms
current state-of-the-art approaches in a wide variety of multi-agent environments.
A permutation invariant network architecture is also proposed
to increase the scalability of the algorithm to large team sizes. Further research
is needed to identify how can the techniques proposed in this thesis,
for cooperative and competitive environments, be used in unison for mixed
environments, and whether they are adequate for general artificial intelligence.A capacidade de um agente se coordenar com outros num sistema é uma
propriedade valiosa em sistemas multi-agente. Agentes cooperam como
uma equipa para cumprir um objetivo comum, ou adaptam-se aos oponentes
de forma a completar objetivos egoístas sem serem explorados. Investigação
demonstra que aprender coordenação multi-agente é significativamente
mais complexo que aprender estratégias em ambientes com um
único agente, e requer uma variedade de técnicas para lidar com um ambiente
onde agentes aprendem simultaneamente. Esta tese procura determinar
como aprendizagem automática pode ser usada para encontrar coordenação
em sistemas multi-agente. O documento questiona que técnicas podem ser
usadas para enfrentar a superior complexidade destes sistemas e o seu desafio
de atribuição de crédito, como aprender coordenação, e como usar
comunicação para melhorar o comportamento duma equipa.
Múltiplos algoritmos para ambientes competitivos são tabulares, o que impede
o seu uso com espaços de estado de alta-dimensão ou contínuos, e
podem ter tendências contra estratégias de equilíbrio específicas. Esta tese
propõe múltiplas extensões de aprendizagem profunda para ambientes competitivos,
permitindo a algoritmos atingir estratégias de equilíbrio em ambientes
complexos e parcialmente-observáveis, com base em apenas informação
local. Um algoritmo tabular é também extendido com um novo critério de
atualização que elimina a sua tendência contra estratégias determinísticas.
Atuais soluções de estado-da-arte para ambientes cooperativos têm base em
aprendizagem profunda para lidar com a complexidade do ambiente, e beneficiam
duma fase de aprendizagem centralizada. Soluções que incorporam
comunicação entre agentes frequentemente impedem os próprios de ser executados
de forma distribuída. Esta tese propõe um algoritmo multi-agente
onde os agentes aprendem protocolos de comunicação para compensarem
por observabilidade parcial local, e continuam a ser executados de forma
distribuída. Uma fase de aprendizagem centralizada pode incorporar informação
adicional sobre ambiente para aumentar a robustez e velocidade
com que uma equipa converge para estratégias bem-sucedidas. O algoritmo
ultrapassa abordagens estado-da-arte atuais numa grande variedade de ambientes
multi-agente. Uma arquitetura de rede invariante a permutações é
também proposta para aumentar a escalabilidade do algoritmo para grandes
equipas. Mais pesquisa é necessária para identificar como as técnicas propostas
nesta tese, para ambientes cooperativos e competitivos, podem ser
usadas em conjunto para ambientes mistos, e averiguar se são adequadas a
inteligência artificial geral.Apoio financeiro da FCT e do FSE no âmbito do III Quadro Comunitário de ApoioPrograma Doutoral em Informátic
Cooperative planning for an unmanned combat aerial vehicle fleet using reinforcement learning
In this study, reinforcement learning (RL)-based centralized path planning is performed for an unmanned combat aerial vehicle (UCAV) fleet in a human-made hostile environment. The proposed method provides a novel approach in which closing speed and approximate time-to-go terms are used in the reward function to obtain cooperative motion while ensuring no-fly-zones (NFZs) and time-of-arrival constraints. Proximal policy optimization (PPO) algorithm is used in the training phase of the RL agent. System performance is evaluated in two different cases. In case 1, the warfare environment contains only the target area, and simultaneous arrival is desired to obtain the saturated attack effect. In case 2, the warfare environment contains NFZs in addition to the target area and the standard saturated attack and collision avoidance requirements. Particle swarm optimization (PSO)-based cooperative path planning algorithm is implemented as the baseline method, and it is compared with the proposed algorithm in terms of execution time and developed performance metrics. Monte Carlo simulation studies are performed to evaluate the system performance. According to the simulation results, the proposed system is able to generate feasible flight paths in real-time while considering the physical and operational constraints such as acceleration limits, NFZ restrictions, simultaneous arrival, and collision avoidance requirements. In that respect, the approach provides a novel and computationally efficient method for solving the large-scale cooperative path planning for UCAV fleets
Patient-specific simulation for autonomous surgery
An Autonomous Robotic Surgical System (ARSS) has to interact with the complex anatomical environment, which is deforming and whose properties are often uncertain. Within this context, an ARSS can benefit from the availability of patient-specific simulation of the anatomy. For example, simulation can provide a safe and controlled environment for the design, test and validation of the autonomous capabilities. Moreover, it can be used to generate large amounts of patient-specific data that can be exploited to learn models and/or tasks. The aim of this Thesis is to investigate the different ways in which simulation can support an ARSS and to propose solutions to favor its employability in robotic surgery. We first address all the phases needed to create such a simulation, from model choice in the pre-operative phase based on the available knowledge to its intra-operative update to compensate for inaccurate parametrization. We propose to rely on deep neural networks trained with synthetic data both to generate a patient-specific model and to design a strategy to update model parametrization starting directly from intra-operative sensor data. Afterwards, we test how simulation can assist the ARSS, both for task learning and during task execution. We show that simulation can be used to efficiently train approaches that require multiple interactions with the environment, compensating for the riskiness to acquire data from real surgical robotic systems. Finally, we propose a modular framework for autonomous surgery that includes deliberative functions to handle real anatomical environments with uncertain parameters. The integration of a personalized simulation proves fundamental both for optimal task planning and to enhance and monitor real execution. The contributions presented in this Thesis have the potential to introduce significant step changes in the development and actual performance of autonomous robotic surgical systems, making them closer to applicability to real clinical conditions
- …