4 research outputs found
A Comparison of Reinforcement Learning Frameworks for Software Testing Tasks
Software testing activities scrutinize the artifacts and the behavior of a
software product to find possible defects and ensure that the product meets its
expected requirements. Recently, Deep Reinforcement Learning (DRL) has been
successfully employed in complex testing tasks such as game testing, regression
testing, and test case prioritization to automate the process and provide
continuous adaptation. Practitioners can employ DRL by implementing from
scratch a DRL algorithm or using a DRL framework. DRL frameworks offer
well-maintained implemented state-of-the-art DRL algorithms to facilitate and
speed up the development of DRL applications. Developers have widely used these
frameworks to solve problems in various domains including software testing.
However, to the best of our knowledge, there is no study that empirically
evaluates the effectiveness and performance of implemented algorithms in DRL
frameworks. Moreover, some guidelines are lacking from the literature that
would help practitioners choose one DRL framework over another. In this paper,
we empirically investigate the applications of carefully selected DRL
algorithms on two important software testing tasks: test case prioritization in
the context of Continuous Integration (CI) and game testing. For the game
testing task, we conduct experiments on a simple game and use DRL algorithms to
explore the game to detect bugs. Results show that some of the selected DRL
frameworks such as Tensorforce outperform recent approaches in the literature.
To prioritize test cases, we run experiments on a CI environment where DRL
algorithms from different frameworks are used to rank the test cases. Our
results show that the performance difference between implemented algorithms in
some cases is considerable, motivating further investigation.Comment: Accepted for publication at EMSE (Empirical Software Engineering
journal) 202
How to Certify Machine Learning Based Safety-critical Systems? A Systematic Literature Review
Context: Machine Learning (ML) has been at the heart of many innovations over
the past years. However, including it in so-called 'safety-critical' systems
such as automotive or aeronautic has proven to be very challenging, since the
shift in paradigm that ML brings completely changes traditional certification
approaches.
Objective: This paper aims to elucidate challenges related to the
certification of ML-based safety-critical systems, as well as the solutions
that are proposed in the literature to tackle them, answering the question 'How
to Certify Machine Learning Based Safety-critical Systems?'.
Method: We conduct a Systematic Literature Review (SLR) of research papers
published between 2015 to 2020, covering topics related to the certification of
ML systems. In total, we identified 217 papers covering topics considered to be
the main pillars of ML certification: Robustness, Uncertainty, Explainability,
Verification, Safe Reinforcement Learning, and Direct Certification. We
analyzed the main trends and problems of each sub-field and provided summaries
of the papers extracted.
Results: The SLR results highlighted the enthusiasm of the community for this
subject, as well as the lack of diversity in terms of datasets and type of
models. It also emphasized the need to further develop connections between
academia and industries to deepen the domain study. Finally, it also
illustrated the necessity to build connections between the above mention main
pillars that are for now mainly studied separately.
Conclusion: We highlighted current efforts deployed to enable the
certification of ML based software systems, and discuss some future research
directions.Comment: 60 pages (92 pages with references and complements), submitted to a
journal (Automated Software Engineering). Changes: Emphasizing difference
traditional software engineering / ML approach. Adding Related Works, Threats
to Validity and Complementary Materials. Adding a table listing papers
reference for each section/subsection
On Assessing The Safety of Reinforcement Learning algorithms Using Formal Methods
The increasing adoption of Reinforcement Learning in safety-critical systems
domains such as autonomous vehicles, health, and aviation raises the need for
ensuring their safety. Existing safety mechanisms such as adversarial training,
adversarial detection, and robust learning are not always adapted to all
disturbances in which the agent is deployed. Those disturbances include moving
adversaries whose behavior can be unpredictable by the agent, and as a matter
of fact harmful to its learning. Ensuring the safety of critical systems also
requires methods that give formal guarantees on the behaviour of the agent
evolving in a perturbed environment. It is therefore necessary to propose new
solutions adapted to the learning challenges faced by the agent. In this paper,
first we generate adversarial agents that exhibit flaws in the agent's policy
by presenting moving adversaries. Secondly, We use reward shaping and a
modified Q-learning algorithm as defense mechanisms to improve the agent's
policy when facing adversarial perturbations. Finally, probabilistic model
checking is employed to evaluate the effectiveness of both mechanisms. We have
conducted experiments on a discrete grid world with a single agent facing
non-learning and learning adversaries. Our results show a diminution in the
number of collisions between the agent and the adversaries. Probabilistic model
checking provides lower and upper probabilistic bounds regarding the agent's
safety in the adversarial environment