4,338 research outputs found
Visual GUI testing in practice: An extended industrial case study
Context: Visual GUI testing (VGT) is referred to as the latest generation
GUI-based testing. It is a tool-driven technique, which uses image recognition
for interacting with and asserting the behavior of the system under test.
Motivated by the industrial need of a large Turkish software and systems
company providing solutions in the areas of defense and IT sector, an
action-research project was recently initiated to implement VGT in several
teams and projects in the company.
Objective: To address the above needs, we planned and carried out an
empirical investigation with the goal of assessing VGT using two tools (Sikuli
and JAutomate). The purpose was to determine a suitable approach and tool for
VGT of a given project (software product) in the company, increase the know-how
in the company's test teams.
Method: Using an action-research case-study design, we investigated the use
of VGT in the studied organization. Specifically, using the two selected VGT
tools, we conducted a quantitative and a qualitative evaluation of VGT.
Results: By assessing the list of Challenges, Problems and Limitations (CPL),
proposed in previous work, in the context of our empirical study, we found that
test-tool- and SUT-related CPLs were quite comparable to a previous empirical
study, e.g., the synchronization between SUT and test tools were not always
robust and there were failures in test tools' image recognition features. When
assessing the types of test maintenance activities, when executing the
automated test cases on next versions of the SUTs, for the case of the two test
tools, we found that about half of the test cases (59.1% and 47.8%) failed in
the next version.
Conclusion: By our results, we confirm some of the previously-reported issues
when conducting VGT. Further, we highlight some additional challenges in test
maintenance when using VGT
Sample-Efficient Policy Learning based on Completely Behavior Cloning
Direct policy search is one of the most important algorithm of reinforcement
learning. However, learning from scratch needs a large amount of experience
data and can be easily prone to poor local optima. In addition to that, a
partially trained policy tends to perform dangerous action to agent and
environment. In order to overcome these challenges, this paper proposed a
policy initialization algorithm called Policy Learning based on Completely
Behavior Cloning (PLCBC). PLCBC first transforms the Model Predictive Control
(MPC) controller into a piecewise affine (PWA) function using multi-parametric
programming, and uses a neural network to express this function. By this way,
PLCBC can completely clone the MPC controller without any performance loss, and
is totally training-free. The experiments show that this initialization
strategy can help agent learn at the high reward state region, and converge
faster and better
Comparing Deep Reinforcement Learning and Evolutionary Methods in Continuous Control
Reinforcement Learning and the Evolutionary Strategy are two major approaches
in addressing complicated control problems. Both are strong contenders and have
their own devotee communities. Both groups have been very active in developing
new advances in their own domain and devising, in recent years, leading-edge
techniques to address complex continuous control tasks. Here, in the context of
Deep Reinforcement Learning, we formulate a parallelized version of the
Proximal Policy Optimization method and a Deep Deterministic Policy Gradient
method. Moreover, we conduct a thorough comparison between the state-of-the-art
techniques in both camps fro continuous control; evolutionary methods and Deep
Reinforcement Learning methods. The results show there is no consistent winner.Comment: NIPS 2017 Deep Reinforcement Learning Symposiu
Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target
Multi-step methods such as Retrace() and -step -learning have
become a crucial component of modern deep reinforcement learning agents. These
methods are often evaluated as a part of bigger architectures and their
evaluations rarely include enough samples to draw statistically significant
conclusions about their performance. This type of methodology makes it
difficult to understand how particular algorithmic details of multi-step
methods influence learning. In this paper we combine the -step action-value
algorithms Retrace, -learning, Tree Backup, Sarsa, and with an
architecture analogous to DQN. We test the performance of all these algorithms
in the mountain car environment; this choice of environment allows for faster
training times and larger sample sizes. We present statistical analyses on the
effects of the off-policy correction, the backup length parameter , and the
update frequency of the target network on the performance of these algorithms.
Our results show that (1) using off-policy correction can have an adverse
effect on the performance of Sarsa and ; (2) increasing the backup
length consistently improved performance across all the different
algorithms; and (3) the performance of Sarsa and -learning was more robust
to the effect of the target network update frequency than the performance of
Tree Backup, , and Retrace in this particular task
Fast Skill Learning for Variable Compliance Robotic Assembly
The robotic assembly represents a group of benchmark problems for
reinforcement learning and variable compliance control that features
sophisticated contact manipulation. One of the key challenges in applying
reinforcement learning to physical robot is the sample complexity, the
requirement of large amounts of experience for learning. We mitigate this
sample complexity problem by incorporating an iteratively refitted model into
the learning process through model-guided exploration. Yet, fitting a local
model of the physical environment is of major difficulties. In this work, a
Kalman filter is used to combine the adaptive linear dynamics with a coarse
prior model from analytical description, and proves to give more accurate
predictions than the existing method. Experimental results show that the
proposed model fitting strategy can be incorporated into a model predictive
controller to generate good exploration behaviors for learning acceleration,
while preserving the benefits of model-free reinforcement learning for
uncertain environments. In addition to the sample complexity, the inevitable
robot overloaded during operation also tends to limit the learning efficiency.
To address this problem, we present a method to restrict the largest possible
potential energy in the compliance control system and therefore keep the
contact force within the legitimate range.Comment: 10 pages, 5 figure
An Integrated Framework for Process Discovery Algorithm Evaluation
Process mining offers techniques to exploit event data by providing insights
and recommendations to improve business processes. The growing amount of
algorithms for process discovery has raised the question of which algorithms
perform best on a given event log. Current evaluation frameworks for
empirically evaluating discovery techniques depend on the notation used
(behavioral identical models may give different results) and cannot provide
more general statements about populations of models. Therefore, this paper
proposes a new integrated evaluation framework that uses a classification
approach to make it modeling notation independent. Furthermore, it is founded
on experimental design to ensure the generalization of results. It supports two
main evaluation objectives: benchmarking process discovery algorithms and
sensitivity analysis, i.e. studying the effect of model and log characteristics
on a discovery algorithm's accuracy. The framework is designed as a scientific
workflow which enables automated, extendable and shareable evaluation
experiments. An extensive experiment including four discovery algorithms and
six control-flow characteristics validates the relevance and flexibility of the
framework. Ultimately, the paper aims to advance the state-of-the-art for
evaluating process discovery techniques
Adaptive Dialog Policy Learning with Hindsight and User Modeling
Reinforcement learning methods have been used to compute dialog policies from
language-based interaction experiences. Efficiency is of particular importance
in dialog policy learning, because of the considerable cost of interacting with
people, and the very poor user experience from low-quality conversations.
Aiming at improving the efficiency of dialog policy learning, we develop
algorithm LHUA (Learning with Hindsight, User modeling, and Adaptation) that,
for the first time, enables dialog agents to adaptively learn with hindsight
from both simulated and real users. Simulation and hindsight provide the dialog
agent with more experience and more (positive) reinforcements respectively.
Experimental results suggest that, in success rate and policy quality, LHUA
outperforms competitive baselines from the literature, including its
no-simulation, no-adaptation, and no-hindsight counterparts
Organizing Experience: A Deeper Look at Replay Mechanisms for Sample-based Planning in Continuous State Domains
Model-based strategies for control are critical to obtain sample efficient
learning. Dyna is a planning paradigm that naturally interleaves learning and
planning, by simulating one-step experience to update the action-value
function. This elegant planning strategy has been mostly explored in the
tabular setting. The aim of this paper is to revisit sample-based planning, in
stochastic and continuous domains with learned models. We first highlight the
flexibility afforded by a model over Experience Replay (ER). Replay-based
methods can be seen as stochastic planning methods that repeatedly sample from
a buffer of recent agent-environment interactions and perform updates to
improve data efficiency. We show that a model, as opposed to a replay buffer,
is particularly useful for specifying which states to sample from during
planning, such as predecessor states that propagate information in reverse from
a state more quickly. We introduce a semi-parametric model learning approach,
called Reweighted Experience Models (REMs), that makes it simple to sample next
states or predecessors. We demonstrate that REM-Dyna exhibits similar
advantages over replay-based methods in learning in continuous state problems,
and that the performance gap grows when moving to stochastic domains, of
increasing size.Comment: IJCAI 201
Characterizing Input Methods for Human-to-robot Demonstrations
Human demonstrations are important in a range of robotics applications, and
are created with a variety of input methods. However, the design space for
these input methods has not been extensively studied. In this paper, focusing
on demonstrations of hand-scale object manipulation tasks to robot arms with
two-finger grippers, we identify distinct usage paradigms in robotics that
utilize human-to-robot demonstrations, extract abstract features that form a
design space for input methods, and characterize existing input methods as well
as a novel input method that we introduce, the instrumented tongs. We detail
the design specifications for our method and present a user study that compares
it against three common input methods: free-hand manipulation, kinesthetic
guidance, and teleoperation. Study results show that instrumented tongs provide
high quality demonstrations and a positive experience for the demonstrator
while offering good correspondence to the target robot.Comment: 2019 ACM/IEEE International Conference on Human-Robot Interaction
(HRI
Differential Variable Speed Limits Control for Freeway Recurrent Bottlenecks via Deep Reinforcement learning
Variable speed limits (VSL) control is a flexible way to improve traffic
condition,increase safety and reduce emission. There is an emerging trend of
using reinforcement learning technique for VSL control and recent studies have
shown promising results. Currently, deep learning is enabling reinforcement
learning to develope autonomous control agents for problems that were
previously intractable. In this paper, we propose a more effective deep
reinforcement learning (DRL) model for differential variable speed limits
(DVSL) control, in which the dynamic and different speed limits among lanes can
be imposed. The proposed DRL models use a novel actor-critic architecture which
can learn a large number of discrete speed limits in a continues action space.
Different reward signals, e.g. total travel time, bottleneck speed, emergency
braking, and vehicular emission are used to train the DVSL controller, and
comparison between these reward signals are conducted. We test proposed DRL
baased DVSL controllers on a simulated freeway recurrent bottleneck. Results
show that the efficiency, safety and emissions can be improved by the proposed
method. We also show some interesting findings through the visulization of the
control policies generated from DRL models.Comment: 24 pages, 7 figures, 1 tabl
- …