1,973 research outputs found
Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
Simulations are attractive environments for training agents as they provide
an abundant source of data and alleviate certain safety concerns during the
training process. But the behaviours developed by agents in simulation are
often specific to the characteristics of the simulator. Due to modeling error,
strategies that are successful in simulation may not transfer to their real
world counterparts. In this paper, we demonstrate a simple method to bridge
this "reality gap". By randomizing the dynamics of the simulator during
training, we are able to develop policies that are capable of adapting to very
different dynamics, including ones that differ significantly from the dynamics
on which the policies were trained. This adaptivity enables the policies to
generalize to the dynamics of the real world without any training on the
physical system. Our approach is demonstrated on an object pushing task using a
robotic arm. Despite being trained exclusively in simulation, our policies are
able to maintain a similar level of performance when deployed on a real robot,
reliably moving an object to a desired location from random initial
configurations. We explore the impact of various design decisions and show that
the resulting policies are robust to significant calibration error
Reinforcement learning for efficient network penetration testing
Penetration testing (also known as pentesting or PT) is a common practice for actively assessing the defenses of a computer network by planning and executing all possible attacks to discover and exploit existing vulnerabilities. Current penetration testing methods are increasingly becoming non-standard, composite and resource-consuming despite the use of evolving tools. In this paper, we propose and evaluate an AI-based pentesting system which makes use of machine learning techniques, namely reinforcement learning (RL) to learn and reproduce average and complex pentesting activities. The proposed system is named Intelligent Automated Penetration Testing System (IAPTS) consisting of a module that integrates with industrial PT frameworks to enable them to capture information, learn from experience, and reproduce tests in future similar testing cases. IAPTS aims to save human resources while producing much-enhanced results in terms of time consumption, reliability and frequency of testing. IAPTS takes the approach of modeling PT environments and tasks as a partially observed Markov decision process (POMDP) problem which is solved by POMDP-solver. Although the scope of this paper is limited to network infrastructures PT planning and not the entire practice, the obtained results support the hypothesis that RL can enhance PT beyond the capabilities of any human PT expert in terms of time consumed, covered attacking vectors, accuracy and reliability of the outputs. In addition, this work tackles the complex problem of expertise capturing and re-use by allowing the IAPTS learning module to store and re-use PT policies in the same way that a human PT expert would learn but in a more efficient way
Reinforcement Learning based Circuit Compilation via ZX-calculus
Màster Oficial de Ciència i Tecnologia Quàntiques / Quantum Science and Technology, Facultat de Física, Universitat de Barcelona. Curs: 2022-2023. Tutors: Jordi Riu, Marta P EstarellasZX-calculus is a formalism that can be used for quantum circuit compilation and optimization. We developed a Reinforcement Learning approach for enhanced circuit optimization via the ZX-diagram graph representation of the quantum circuit. The agent is trained using the well-established Proximal Policy Optimization (PPO) algorithm, and it uses Conditional Action Trees to perform Invalid Action Masking to reduce the space of actions available to the agent and speed up its training. Additionally, we also design and implement a Genetic Algorithm for the same task. Both the genetic algorithm and the most widely used ZX-calculus-based library for circuit optimization, the PyZX library, are used to benchmark our RL approach. We find our RL algorithm to be competitive against both approaches, but further exploration is required
Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning
Despite recent advances in natural language understanding and generation, and
decades of research on the development of conversational bots, building
automated agents that can carry on rich open-ended conversations with humans
"in the wild" remains a formidable challenge. In this work we develop a
real-time, open-ended dialogue system that uses reinforcement learning (RL) to
power a bot's conversational skill at scale. Our work pairs the succinct
embedding of the conversation state generated using SOTA (supervised) language
models with RL techniques that are particularly suited to a dynamic action
space that changes as the conversation progresses. Trained using crowd-sourced
data, our novel system is able to substantially exceeds the (strong) baseline
supervised model with respect to several metrics of interest in a live
experiment with real users of the Google Assistant
- …