118,602 research outputs found

    Applying Thompson Sampling to Online Hypothesis Testing

    Get PDF
    Online hypothesis testing occurs in many branches of science. Most notably it is of use when there are too many hypotheses to test with traditional multiple hypothesis testing or when the hypotheses are created one-by-one. When testing multiple hypotheses one-by-one, the order in which the hypotheses are tested often has great influence to the power of the procedure. In this thesis we investigate the applicability of reinforcement learning tools to solve the exploration – exploitation problem that often arises in online hypothesis testing. We show that a common reinforcement learning tool, Thompson sampling, can be used to gain a modest amount of power using a method for online hypothesis testing called alpha-investing. Finally we examine the size of this effect using both synthetic data and a practical case involving simulated data studying urban pollution. We found that, by choosing the order of tested hypothesis with Thompson sampling, the power of alpha investing is improved. The level of improvement depends on the assumptions that the experimenter is willing to make and their validity. In a practical situation the presented procedure rejected up to 6.8 percentage points more hypotheses than testing the hypotheses in a random order

    Online Aircraft System Identification Using a Novel Parameter Informed Reinforcement Learning Method

    Get PDF
    This thesis presents the development and analysis of a novel method for training reinforcement learning neural networks for online aircraft system identification of multiple similar linear systems, such as all fixed wing aircraft. This approach, termed Parameter Informed Reinforcement Learning (PIRL), dictates that reinforcement learning neural networks should be trained using input and output trajectory/history data as is convention; however, the PIRL method also includes any known and relevant aircraft parameters, such as airspeed, altitude, center of gravity location and/or others. Through this, the PIRL Agent is better suited to identify novel/test-set aircraft. First, the PIRL method is applied to mass-spring-damper systems with differing mass, spring constants, and damper constants. The reinforcement learning agent is trained using a random value for each constant within a fixed range. It is then tested over that same range as well as constants with a variation of three times the trained range. The effect of varying hyperparameters for the reinforcement learning agent was observed as well as the performance of the agent with added sensor noise and with reduced PIRL parameters. These initial studies show that PIRL is able to create accurate models within a short timeframe. They additionally demonstrate robustness to significant sensor noise. Second, a linear fixed wing aircraft longitudinal flight model is used to evaluate the effectiveness of PIRL in the context of aircraft system identification. The reinforcement learning agent is provided with simulated flight test data generated using stability and control parameters obtained using the United States Air Force’s Stability and Control Digital DATCOM. Nine aircraft are selected as training aircraft and one for testing. The agent is trained with each training episode comprising a randomly chosen aircraft from the set and its dynamics model is used to generate artificial online flight data. PIRL was evaluated with respect to its accuracy and speed of convergence and was found to generate models that are more accurate than those obtained using conventional reinforcement learning and extended Kalman filters

    Many-Objective Reinforcement Learning for Online Testing of DNN-Enabled Systems

    Get PDF
    Deep Neural Networks (DNNs) have been widely used to perform real-world tasks in cyber-physical systems such as Autonomous Diving Systems (ADS). Ensuring the correct behavior of such DNN-Enabled Systems (DES) is a crucial topic. Online testing is one of the promising modes for testing such systems with their application environments (simulated or real) in a closed loop taking into account the continuous interaction between the systems and their environments. However, the environmental variables (e.g., lighting conditions) that might change during the systems' operation in the real world, causing the DES to violate requirements (safety, functional), are often kept constant during the execution of an online test scenario due to the two major challenges: (1) the space of all possible scenarios to explore would become even larger if they changed and (2) there are typically many requirements to test simultaneously. In this paper, we present MORLOT (Many-Objective Reinforcement Learning for Online Testing), a novel online testing approach to address these challenges by combining Reinforcement Learning (RL) and many-objective search. MORLOT leverages RL to incrementally generate sequences of environmental changes while relying on many-objective search to determine the changes so that they are more likely to achieve any of the uncovered objectives. We empirically evaluate MORLOT using CARLA, a high-fidelity simulator widely used for autonomous driving research, integrated with Transfuser, a DNN-enabled ADS for end-to-end driving. The evaluation results show that MORLOT is significantly more effective and efficient than alternatives with a large effect size. In other words, MORLOT is a good option to test DES with dynamically changing environments while accounting for multiple safety requirements

    A Reinforcement Learning Framework for Time-Dependent Causal Effects Evaluation in A/B Testing

    Full text link
    A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. Major challenges arise in online experiments where there is only one unit that receives a sequence of treatments over time. In those experiments, the treatment at a given time impacts current outcome as well as future outcomes. The aim of this paper is to introduce a reinforcement learning framework for carrying A/B testing, while characterizing the long-term treatment effects. Our proposed testing procedure allows for sequential monitoring and online updating, so it is generally applicable to a variety of treatment designs in different industries. In addition, we systematically investigate the theoretical properties (e.g., asymptotic distribution and power) of our testing procedure. Finally, we apply our framework to both synthetic datasets and a real-world data example obtained from a ride-sharing company to illustrate its usefulness

    Dynamic causal effects evaluation in A/B testing with a reinforcement learning framework

    Get PDF
    A/B testing, or online experiment is a standard business strategy to compare a new product with an old one in pharmaceutical, technological, and traditional industries. Major challenges arise in online experiments of two-sided marketplace platforms (e.g., Uber) where there is only one unit that receives a sequence of treatments over time. In those experiments, the treatment at a given time impacts current outcome as well as future outcomes. The aim of this article is to introduce a reinforcement learning framework for carrying A/B testing in these experiments, while characterizing the long-term treatment effects. Our proposed testing procedure allows for sequential monitoring and online updating. It is generally applicable to a variety of treatment designs in different industries. In addition, we systematically investigate the theoretical properties (e.g., size and power) of our testing procedure. Finally, we apply our framework to both simulated data and a real-world data example obtained from a technological company to illustrate its advantage over the current practice. A Python implementation of our test is available at https://github.com/callmespring/CausalRL. Supplementary materials for this article are available online

    A Large-Scale, Open-Domain, Mixed-Interface Dialogue-Based ITS for STEM

    Get PDF
    We present Korbit, a large-scale, open-domain, mixed-interface, dialogue-based intelligent tutoring system (ITS). Korbit uses machine learning, natural language processing and reinforcement learning to provide interactive, personalized learning online. Korbit has been designed to easily scale to thousands of subjects, by automating, standardizing and simplifying the content creation process. Unlike other ITS, a teacher can develop new learning modules for Korbit in a matter of hours. To facilitate learning across a widerange of STEM subjects, Korbit uses a mixed-interface, which includes videos, interactive dialogue-based exercises, question-answering, conceptual diagrams, mathematical exercises and gamification elements. Korbit has been built to scale to millions of students, by utilizing a state-of-the-art cloud-based micro-service architecture. Korbit launched its first course in 2019 on machine learning, and since then over 7,000 students have enrolled. Although Korbit was designed to be open-domain and highly scalable, A/B testing experiments with real-world students demonstrate that both student learning outcomes and student motivation are substantially improved compared to typical online courses

    Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration

    Full text link
    Testing in Continuous Integration (CI) involves test case prioritization, selection, and execution at each cycle. Selecting the most promising test cases to detect bugs is hard if there are uncertainties on the impact of committed code changes or, if traceability links between code and tests are not available. This paper introduces Retecs, a new method for automatically learning test case selection and prioritization in CI with the goal to minimize the round-trip time between code commits and developer feedback on failed test cases. The Retecs method uses reinforcement learning to select and prioritize test cases according to their duration, previous last execution and failure history. In a constantly changing environment, where new test cases are created and obsolete test cases are deleted, the Retecs method learns to prioritize error-prone test cases higher under guidance of a reward function and by observing previous CI cycles. By applying Retecs on data extracted from three industrial case studies, we show for the first time that reinforcement learning enables fruitful automatic adaptive test case selection and prioritization in CI and regression testing.Comment: Spieker, H., Gotlieb, A., Marijan, D., & Mossige, M. (2017). Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration. In Proceedings of 26th International Symposium on Software Testing and Analysis (ISSTA'17) (pp. 12--22). AC
    • …
    corecore