118,602 research outputs found
Applying Thompson Sampling to Online Hypothesis Testing
Online hypothesis testing occurs in many branches of science. Most notably it is of use when there are too many hypotheses to test with traditional multiple hypothesis testing or when the hypotheses are created one-by-one. When testing multiple hypotheses one-by-one, the order in which the hypotheses are tested often has great influence to the power of the procedure.
In this thesis we investigate the applicability of reinforcement learning tools to solve the exploration – exploitation problem that often arises in online hypothesis testing. We show that a common reinforcement learning tool, Thompson sampling, can be used to gain a modest amount of power using a method for online hypothesis testing called alpha-investing. Finally we examine the size of this effect using both synthetic data and a practical case involving simulated data studying urban pollution.
We found that, by choosing the order of tested hypothesis with Thompson sampling, the power of alpha investing is improved. The level of improvement depends on the assumptions that the experimenter is willing to make and their validity. In a practical situation the presented procedure rejected up to 6.8 percentage points more hypotheses than testing the hypotheses in a random order
Online Aircraft System Identification Using a Novel Parameter Informed Reinforcement Learning Method
This thesis presents the development and analysis of a novel method for training reinforcement learning neural networks for online aircraft system identification of multiple similar linear systems, such as all fixed wing aircraft. This approach, termed Parameter Informed Reinforcement Learning (PIRL), dictates that reinforcement learning neural networks should be trained using input and output trajectory/history data as is convention; however, the PIRL method also includes any known and relevant aircraft parameters, such as airspeed, altitude, center of gravity location and/or others. Through this, the PIRL Agent is better suited to identify novel/test-set aircraft.
First, the PIRL method is applied to mass-spring-damper systems with differing mass, spring constants, and damper constants. The reinforcement learning agent is trained using a random value for each constant within a fixed range. It is then tested over that same range as well as constants with a variation of three times the trained range. The effect of varying hyperparameters for the reinforcement learning agent was observed as well as the performance of the agent with added sensor noise and with reduced PIRL parameters. These initial studies show that PIRL is able to create accurate models within a short timeframe. They additionally demonstrate robustness to significant sensor noise.
Second, a linear fixed wing aircraft longitudinal flight model is used to evaluate the effectiveness of PIRL in the context of aircraft system identification. The reinforcement learning agent is provided with simulated flight test data generated using stability and control parameters obtained using the United States Air Force’s Stability and Control Digital DATCOM. Nine aircraft are selected as training aircraft and one for testing. The agent is trained with each training episode comprising a randomly chosen aircraft from the set and its dynamics model is used to generate artificial online flight data. PIRL was evaluated with respect to its accuracy and speed of convergence and was found to generate models that are more accurate than those obtained using conventional reinforcement learning and extended Kalman filters
Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems
Deep Neural Networks (DNNs) have been widely used to perform real-world tasks
in cyber-physical systems such as Autonomous Diving Systems (ADS). Ensuring the
correct behavior of such DNN-Enabled Systems (DES) is a crucial topic. Online
testing is one of the promising modes for testing such systems with their
application environments (simulated or real) in a closed loop taking into
account the continuous interaction between the systems and their environments.
However, the environmental variables (e.g., lighting conditions) that might
change during the systems' operation in the real world, causing the DES to
violate requirements (safety, functional), are often kept constant during the
execution of an online test scenario due to the two major challenges: (1) the
space of all possible scenarios to explore would become even larger if they
changed and (2) there are typically many requirements to test simultaneously.
In this paper, we present MORLOT (Many-Objective Reinforcement Learning for
Online Testing), a novel online testing approach to address these challenges by
combining Reinforcement Learning (RL) and many-objective search. MORLOT
leverages RL to incrementally generate sequences of environmental changes while
relying on many-objective search to determine the changes so that they are more
likely to achieve any of the uncovered objectives. We empirically evaluate
MORLOT using CARLA, a high-fidelity simulator widely used for autonomous
driving research, integrated with Transfuser, a DNN-enabled ADS for end-to-end
driving. The evaluation results show that MORLOT is significantly more
effective and efficient than alternatives with a large effect size. In other
words, MORLOT is a good option to test DES with dynamically changing
environments while accounting for multiple safety requirements
A Reinforcement Learning Framework for Time-Dependent Causal Effects Evaluation in A/B Testing
A/B testing, or online experiment is a standard business strategy to compare
a new product with an old one in pharmaceutical, technological, and traditional
industries. Major challenges arise in online experiments where there is only
one unit that receives a sequence of treatments over time. In those
experiments, the treatment at a given time impacts current outcome as well as
future outcomes. The aim of this paper is to introduce a reinforcement learning
framework for carrying A/B testing, while characterizing the long-term
treatment effects. Our proposed testing procedure allows for sequential
monitoring and online updating, so it is generally applicable to a variety of
treatment designs in different industries. In addition, we systematically
investigate the theoretical properties (e.g., asymptotic distribution and
power) of our testing procedure. Finally, we apply our framework to both
synthetic datasets and a real-world data example obtained from a ride-sharing
company to illustrate its usefulness
Dynamic causal effects evaluation in A/B testing with a reinforcement learning framework
A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. Major challenges arise in online experiments of two-sided marketplace platforms (e.g., Uber) where there is only one unit that receives a sequence of treatments over time. In those experiments, the treatment at a given time impacts current outcome as well as future outcomes. The aim of this article is to introduce a reinforcement learning framework for carrying A/B testing in these experiments, while characterizing the long-term treatment effects. Our proposed testing procedure allows for sequential monitoring and online updating. It is generally applicable to a variety of treatment designs in different industries. In addition, we systematically investigate the theoretical properties (e.g., size and power) of our testing procedure. Finally, we apply our framework to both simulated data and a real-world data example obtained from a technological company to illustrate its advantage over the current practice. A Python implementation of our test is available at https://github.com/callmespring/CausalRL. Supplementary materials for this article are available online
A Large-Scale, Open-Domain, Mixed-Interface Dialogue-Based ITS for STEM
We present Korbit, a large-scale, open-domain, mixed-interface,
dialogue-based intelligent tutoring system (ITS). Korbit uses machine learning,
natural language processing and reinforcement learning to provide interactive,
personalized learning online. Korbit has been designed to easily scale to
thousands of subjects, by automating, standardizing and simplifying the content
creation process. Unlike other ITS, a teacher can develop new learning modules
for Korbit in a matter of hours. To facilitate learning across a widerange of
STEM subjects, Korbit uses a mixed-interface, which includes videos,
interactive dialogue-based exercises, question-answering, conceptual diagrams,
mathematical exercises and gamification elements. Korbit has been built to
scale to millions of students, by utilizing a state-of-the-art cloud-based
micro-service architecture. Korbit launched its first course in 2019 on machine
learning, and since then over 7,000 students have enrolled. Although Korbit was
designed to be open-domain and highly scalable, A/B testing experiments with
real-world students demonstrate that both student learning outcomes and student
motivation are substantially improved compared to typical online courses
Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration
Testing in Continuous Integration (CI) involves test case prioritization,
selection, and execution at each cycle. Selecting the most promising test cases
to detect bugs is hard if there are uncertainties on the impact of committed
code changes or, if traceability links between code and tests are not
available. This paper introduces Retecs, a new method for automatically
learning test case selection and prioritization in CI with the goal to minimize
the round-trip time between code commits and developer feedback on failed test
cases. The Retecs method uses reinforcement learning to select and prioritize
test cases according to their duration, previous last execution and failure
history. In a constantly changing environment, where new test cases are created
and obsolete test cases are deleted, the Retecs method learns to prioritize
error-prone test cases higher under guidance of a reward function and by
observing previous CI cycles. By applying Retecs on data extracted from three
industrial case studies, we show for the first time that reinforcement learning
enables fruitful automatic adaptive test case selection and prioritization in
CI and regression testing.Comment: Spieker, H., Gotlieb, A., Marijan, D., & Mossige, M. (2017).
Reinforcement Learning for Automatic Test Case Prioritization and Selection
in Continuous Integration. In Proceedings of 26th International Symposium on
Software Testing and Analysis (ISSTA'17) (pp. 12--22). AC
- …