7 research outputs found

    A Practical Guide to Multi-Objective Reinforcement Learning and Planning

    Get PDF
    Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems

    Steering approaches to Pareto-optimal multiobjective reinforcement learning

    Get PDF
    For reinforcement learning tasks with multiple objectives, it may be advantageous to learn stochastic or non-stationary policies. This paper investigates two novel algorithms for learning non-stationary policies which produce Pareto-optimal behaviour (w-steering and Q-steering), by extending prior work based on the concept of geometric steering. Empirical results demonstrate that both new algorithms offer substantial performance improvements over stationary deterministic policies, while Q-steering significantly outperforms w-steering when the agent has no information about recurrent states within the environment. It is further demonstrated that Q-steering can be used interactively by providing a human decision-maker with a visualisation of the Pareto front and allowing them to adjust the agent’s target point during learning. To demonstrate broader applicability, the use of Q-steering in combination with function approximation is also illustrated on a task involving control of local battery storage for a residential solar power system

    A practical guide to multi-objective reinforcement learning and planning

    Get PDF
    Real-world sequential decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems. © 2022, The Author(s)

    Target State Optimization: Drivability Improvement for Vehicles with Dual Clutch Transmissions

    Get PDF
    Vehicles with dual clutch transmissions (DCT) are well known for their comfortable drivability since gear shifts can be performed jerklessly. The ability of blending the torque during gear shifts from one clutch to the other, making the type of automated transmission a perfect alternative to torque converters, which also comes with a higher efficiency. Nevertheless, DCT also have some drawbacks. The actuation of two clutches requires an immense control effort, which is handled in the implementation of a wide range of software functions on the transmission control unit (TCU). These usually contain control parameters, which makes the behavior adaptable to different vehicle and engine platforms. The adaption of these parameters is called calibration, which is usually an iterative time-consuming process. The calibration of the embedded software solutions in control units is a widely known problem in the automotive industry. The calibration of any vehicle subsystem (e.g., engine, transmission, suspension, driver assistance systems for autonomous driving, etc.) requires costly test trips in different ambient conditions. To reduce the calibration effort and the accompanying use of professionals, several approaches to automize the calibration process are proposed. Due to the fact that a solution is desired which can optimize different calibration problems, a generic metaheuristic approach is aimed. Regardless, the scope of the current research is the optimization of the launch behavior for vehicles equipped with DCT since, particularly at low speeds, the transmission behavior must meet the intention of the driver (drivers tend to be more perceptive at low speeds). To clarify the characteristics of the launch, several test subject studies are performed. The influence factors, such as engine sound, maximal acceleration, acceleration build-up (mean jerk), and the reaction time, are taken into account. Their influence on the evaluation of launch with relation to the criteria of sportiness, comfort, and jerkiness, are examined based on the evaluation of the test subject studies. According to the results of the study, reference values for the optimization of the launch behavior are derived. The research contains a study of existing approaches for optimizing driving behavior with metaheuristics (e.g., genetic algorithms, reinforcement learning, etc.). Since the existing approaches have different drawbacks (in scope of the optimization problem) a new approach is proposed, which outperforms existing ones. The approach itself is a hybrid solution of reinforcement learning (RL) and supervised learning (SL) and is applied in a software in the loop environment, and in a test vehicle

    Creating Systems and Applying Large-Scale Methods to Improve Student Remediation in Online Tutoring Systems in Real-time and at Scale

    Get PDF
    A common problem shared amongst online tutoring systems is the time-consuming nature of content creation. It has been estimated that an hour of online instruction can take up to 100-300 hours to create. Several systems have created tools to expedite content creation, such as the Cognitive Tutors Authoring Tool (CTAT) and the ASSISTments builder. Although these tools make content creation more efficient, they all still depend on the efforts of a content creator and/or past historical. These tools do not take full advantage of the power of the crowd. These issues and challenges faced by online tutoring systems provide an ideal environment to implement a solution using crowdsourcing. I created the PeerASSIST system to provide a solution to the challenges faced with tutoring content creation. PeerASSIST crowdsources the work students have done on problems inside the ASSISTments online tutoring system and redistributes that work as a form of tutoring to their peers, who are in need of assistance. Multi-objective multi-armed bandit algorithms are used to distribute student work, which balance exploring which work is good and exploiting the best currently known work. These policies are customized to run in a real-world environment with multiple asynchronous reward functions and an infinite number of actions. Inspired by major companies such as Google, Facebook, and Bing, PeerASSIST is also designed as a platform for simultaneous online experimentation in real-time and at scale. Currently over 600 teachers (grades K-12) are requiring students to show their work. Over 300,000 instances of student work have been collected from over 18,000 students across 28,000 problems. From the student work collected, 2,000 instances have been redistributed to over 550 students who needed help over the past few months. I conducted a randomized controlled experiment to evaluate the effectiveness of PeerASSIST on student performance. Other contributions include representing learning maps as Bayesian networks to model student performance, creating a machine-learning algorithm to derive student incorrect processes from their incorrect answer and the inputs of the problem, and applying Bayesian hypothesis testing to A/B experiments. We showed that learning maps can be simplified without practical loss of accuracy and that time series data is necessary to simplify learning maps if the static data is highly correlated. I also created several interventions to evaluate the effectiveness of the buggy messages generated from the machine-learned incorrect processes. The null results of these experiments demonstrate the difficulty of creating a successful tutoring and suggest that other methods of tutoring content creation (i.e. PeerASSIST) should be explored

    Single- and multiobjective reinforcement learning in dynamic adversarial games

    Get PDF
    This thesis uses reinforcement learning (RL) to address dynamic adversarial games in the context of air combat manoeuvring simulation. A sequential decision problem commonly encountered in the field of operations research, air combat manoeuvring simulation conventionally relied on agent programming methods that required significant domain knowledge to be manually encoded into the simulation environment. These methods are appropriate for determining the effectiveness of existing tactics in different simulated scenarios. However, in order to maximise the advantages provided by new technologies (such as autonomous aircraft), new tactics will need to be discovered. A proven technique for solving sequential decision problems, RL has the potential to discover these new tactics. This thesis explores four RL approaches—tabular, deep, discrete-to-deep and multiobjective— as mechanisms for discovering new behaviours in simulations of air combat manoeuvring. Itimplements and tests several methods for each approach and compares those methods in terms of the learning time, baseline and comparative performances, and implementation complexity. In addition to evaluating the utility of existing approaches to the specific task of air combat manoeuvring, this thesis proposes and investigates two novel methods, discrete-to-deep supervised policy learning (D2D-SPL) and discrete-to-deep supervised Q-value learning (D2D-SQL), which can be applied more generally. D2D-SPL and D2D-SQL offer the generalisability of deep RL at a cost closer to the tabular approach.Doctor of Philosoph

    On the Behaviour of Scalarization Methods for the Engagement of a Wet Clutch

    No full text
    corecore