9 research outputs found

    Quickest change detection approach to optimal control in Markov decision processes with model changes

    Get PDF
    Optimal control in non-stationary Markov decision processes (MDP) is a challenging problem. The aim in such a control problem is to maximize the long-term discounted reward when the transition dynamics or the reward function can change over time. When a prior knowledge of change statistics is available, the standard Bayesian approach to this problem is to reformulate it as a partially observable MDP (POMDP) and solve it using approximate POMDP solvers, which are typically computationally demanding. In this paper, the problem is analyzed through the viewpoint of quickest change detection (QCD), a set of tools for detecting a change in the distribution of a sequence of random variables. Current methods applying QCD to such problems only passively detect changes by following prescribed policies, without optimizing the choice of actions for long term performance. We demonstrate that ignoring the reward-detection trade-off can cause a significant loss in long term rewards, and propose a two threshold switching strategy to solve the issue. A non-Bayesian problem formulation is also proposed for scenarios where a Bayesian formulation cannot be defined. The performance of the proposed two threshold strategy is examined through numerical analysis on a non-stationary MDP task, and the strategy outperforms the state-of-the-art QCD methods in both Bayesian and non-Bayesian settings.Lincoln LaboratoryNorthrop Grumman Corporatio

    Incremental learning algorithms and applications

    Get PDF
    International audienceIncremental learning refers to learning from streaming data, which arrive over time, with limited memory resources and, ideally, without sacrificing model accuracy. This setting fits different application scenarios where lifelong learning is relevant, e.g. due to changing environments , and it offers an elegant scheme for big data processing by means of its sequential treatment. In this contribution, we formalise the concept of incremental learning, we discuss particular challenges which arise in this setting, and we give an overview about popular approaches, its theoretical foundations, and applications which emerged in the last years

    Aerial Vehicles

    Get PDF
    This book contains 35 chapters written by experts in developing techniques for making aerial vehicles more intelligent, more reliable, more flexible in use, and safer in operation.It will also serve as an inspiration for further improvement of the design and application of aeral vehicles. The advanced techniques and research described here may also be applicable to other high-tech areas such as robotics, avionics, vetronics, and space

    Facial Privacy Protection in Airborne Recreational Videography

    Get PDF
    PhDCameras mounted on Micro Aerial Vehicles (MAVs) are increasingly used for recreational photography and videography. However, aerial photographs and videographs of public places often contain faces of bystanders thus leading to a perceived or actual violation of privacy. To address this issue, this thesis presents a novel privacy lter that adaptively blurs sensitive image regions and is robust against di erent privacy attacks. In particular, the thesis aims to impede face recognition from airborne cameras and explores the design space to determine when a face in an airborne image is inherently protected, that is when an individual is not recognisable. When individuals are recognisable by facial recognition algorithms, an adaptive ltering mechanism is proposed to lower the face resolution in order to preserve privacy while ensuring a minimum reduction of the delity of the image. Moreover, the lter's parameters are pseudo-randomly changed to make the applied protection robust against di erent privacy attacks. In case of videography, the lter is updated with a motion-dependent temporal smoothing to minimise icker introduced by the pseudo-random switching of the lter's parameters, without compromising on its robustness against di erent privacy attacks. To evaluate the e ciency of the proposed lter, the thesis uses a state-of-the-art face recognition algorithm and synthetically generated face data with 3D geometric image transformations that mimic faces captured from an MAV at di erent heights and pitch angles. For the videography scenario, a small video face data set is rst captured and then the proposed lter is evaluated against di erent privacy attacks and the quality of the resulting video using both objective measures and a subjective test.This work was supported in part by the research initiative Intelligent Vision Austria with funding from the Austrian Federal Ministry of Science, Research and Economy and the Austrian Institute of Technology

    Adaptive Multi-objective Optimizing Flight Controller

    Get PDF
    The problem of synthesizing online optimal flight controllers in the presence of multiple objectives is considered. A hybrid adaptive-optimal control architecture is presented, which is suitable for implementation on systems with fast, nonlinear and uncertain dynamics subject to constraints. The problem is cast as an adaptive Multi-Objective Optimization (MO-Op) flight control problem wherein control policy is sought that attempts to optimize over multiple, sometimes conflicting objectives. A solution strategy utilizing Gaussian Process (GP)-based adaptive-optimal control is presented, in which the system uncertainties are learned with an online updated budgeted GP. The mean of the GP is used to feedback-linearize the system and reference model shaping Model Predictive Control (MPC) is utilized for optimization. To make the MO-Op problem online-realizable, a relaxation strategy that poses some objectives as adaptively updated soft constraints is proposed. The strategy is validated on a nonlinear roll dynamics model with simulated state-dependent flexible-rigid mode interaction. In order to demonstrate low probability of failure in the presence of stochastic uncertainty and state constraints, we can take advantage of chance-constrained programming in Model Predictive Control. The results for the single objective case of chance-constrained MPC is also shown to reflect the low probability of constraint violation in safety critical systems such as aircrafts. Optimizing the system over multiple objectives is only one application of the adapive-optimal controller. Another application we considered using the adaptive-optimal controller setup is to design an architecture capable of adapting to the dynamics of different aerospace platforms. This architecture brings together three key elements, MPC-based reference command shaping, Gaussian Process (GP)-based Bayesian nonparametric Model Reference Adaptive Control (MRAC) which both were used in the previous application as well, and online GP clustering over nonstationary (time-varying) GPs. The key salient feature of our architecture is that not only can it detect changes, but it uses online GP clustering to enable the controller to utilize past learning of similar models to significantly reduce learning transients. Stability of the architecture is argued theoretically and performance is validated empirically.Mechanical & Aerospace Engineerin

    Learning from Noisy and Delayed Rewards The Value of Reinforcement Learning to Defense Modeling and Simulation

    Get PDF
    Modeling and simulation of military operations requires human behavior models capable of learning from experi-ence in complex environments where feedback on action quality is noisy and delayed. This research examines the potential of reinforcement learning, a class of AI learning algorithms, to address this need. A novel reinforcement learning algorithm that uses the exponentially weighted average reward as an action-value estimator is described. Empirical results indicate that this relatively straight-forward approach improves learning speed in both benchmark environments and in challenging applied settings. Applications of reinforcement learning in the verification of the re-ward structure of a training simulation, the improvement in the performance of a discrete event simulation scheduling tool, and in enabling adaptive decision-making in combat simulation are presented. To place reinforcement learning within the context of broader models of human information processing, a practical cognitive architecture is devel-oped and applied to the representation of a population within a conflict area. These varied applications and domains demonstrate that the potential for the use of reinforcement learning within modeling and simulation is great.http://archive.org/details/learningfromnois1094517313Lieutenant Colonel, United States ArmyApproved for public release; distribution is unlimited

    Human aware UAS path planning in urban environments using nonstationary MDPs

    No full text
    corecore